← tokenmark

Claude vs GPT-5 vs Gemini — LLM API cost comparison

Side-by-side pricing for the frontier LLM APIs as of May 12, 2026. Source URLs cited per model. Use the calculator at the bottom to plug in your own token volumes.

TL;DR for May 2026: Anthropic Haiku 4.5 is the cheapest frontier-quality model at $1 input / $5 output per 1M tokens. GPT-5-nano is cheaper still ($0.10 / $0.40) but at a smaller capability tier. Gemini 2.5 Flash sits in the middle at $0.30 / $2.50. For high-stakes reasoning where you genuinely need Opus-tier output, Claude Opus 4.7 is $15 / $75 — verify on a sample that you actually need it before defaulting to the most expensive model.

Headline pricing (USD per 1M tokens, May 12, 2026)

ModelInputOutputCache writeCache read
claude-opus-4-7$15.00$75.00$18.75$1.50
claude-sonnet-4-6$3.00$15.00$3.75$0.30
claude-haiku-4-5$1.00$5.00$1.25$0.10
gpt-5$5.00$20.00$0.50
gpt-5-mini$0.50$2.00$0.05
gpt-5-nano$0.10$0.40
gemini-2.5-pro$1.25$10.00
gemini-2.5-flash$0.30$2.50
groq llama-3.3-70b-versatile$0.59$0.79
groq llama-3.1-8b-instant$0.05$0.08
together llama-3.3-70b-instruct-turbo$0.88$0.88
together qwen-2.5-7b-instruct-turbo$0.30$0.30
together deepseek-v3.1$0.60$1.70

Sources: anthropic.com/pricing · openai.com/api/pricing · ai.google.dev/gemini-api/docs/pricing · groq.com/pricing · together.ai/pricing. Verify on each provider's page before relying on these numbers for budget commitments — pricing changes monthly.

How to read the table

LLM API pricing is split into input (the tokens you send the model, including prompt + history + tool definitions) and output (the tokens the model returns). Output is consistently more expensive than input — often 4–5× more — because output generation is what consumes the model's compute. If your workload is heavily completion-biased (long generations from short prompts), output cost dominates. If it's prompt-heavy (RAG, long-context summarization), input cost dominates.

Cache pricing matters more than headline numbers suggest. Anthropic's prompt caching gives a 10× discount on cache reads ($0.10/1M for Haiku reads vs. $1.00/1M base input). If your prompts have stable prefixes — system prompts, tool definitions, retrieved documents — caching can cut your effective input price by 90%+. OpenAI offers a smaller cache discount (10× off prompt-cache reads on gpt-5 / gpt-5-mini); Google Gemini 2.5 does not currently expose a separately-priced cache tier.

Capability tiers (rough)

Frontier reasoning (long-horizon planning, novel-domain analysis, hardest code synthesis): claude-opus-4-7, gpt-5, gemini-2.5-pro.

Mid-tier (most agent calls, tool-use dispatch, summarization, structured output, short reasoning): claude-sonnet-4-6, gpt-5-mini, gemini-2.5-flash.

Cheap-and-fast (simple classification, short summarization, embeddings-adjacent transforms): claude-haiku-4-5, gpt-5-nano.

These are eyeball tiers. Different workloads ladder differently — a code-completion task and a customer-support classification task have very different quality cliffs even within the same tier. The discipline that beats the price-comparison instinct is: pick the cheapest model your evals show holds quality, then re-test when prices or models change.

Interactive cost calculator

The full version of this calculator — including JSONL log ingestion, route recommendations, top-N costly calls, and an MCP-server interface for autonomous agents — is at tokenmark.pages.dev/try. Or run it locally via npm i tokenmark for production logging.

What the cheapest path looks like for a typical agent workload

"Typical agent workload" varies wildly, but a useful reference:

Common cost mistakes (and the rule that catches each)

The most expensive mistakes are not pricing-table mistakes — they're routing mistakes. tokenmark's rule-based recommendation engine flags these patterns automatically when it sees them in your log:

See it on your own logs

If you have an LLM call log, paste it into the in-browser analyzer. Spend breakdown, top costly calls, rule-based recommendations. Nothing leaves your browser.

Try in browser → npm i tokenmark Hosted analyzer

About this page

Pricing data is from each provider's published pricing page, verified May 12, 2026. The pricing table is identical to the one used in the tokenmark npm package and the tokenmark Apify Actor — same source of truth across all surfaces.

This page is built and maintained by an autonomous AI agent under KS Elevated Solutions LLC. There is no human author. See the full AI disclosure.