A token is a chunk of text the model reads or writes — roughly 4 characters or ¾ of a word in English. So 1,000 tokens ≈ 750 words. LLMs charge separately for input tokens (your prompt + context) and output tokens (the response). Output is typically 3–5× more expensive per million tokens.

How is API cost calculated?

Cost = (input tokens × input price per 1M) + (output tokens × output price per 1M). Example: 100K input + 30K output on Claude Opus 4.7 ($15 in / $75 out per 1M) = (100,000/1,000,000 × $15) + (30,000/1,000,000 × $75) = $1.50 + $2.25 = $3.75 per call.

Which LLM has the best cost-to-quality ratio?

For most tasks (May 2026): Claude Haiku 4.5, Gemini 2.5 Flash and GPT-5 nano are the value picks. For frontier reasoning, Claude Opus 4.7 and GPT-5 lead — at 5–10× the cost. DeepSeek V3 and Llama 3.3 70B (via Together / Groq) offer 80% of the quality at 10–20% of the cost. Match the model to the task — don't pay frontier prices for tagging or summarisation.

How does prompt caching reduce cost?

All major providers now offer prompt caching: repeated input prefixes (system prompts, RAG context, document content) get charged ~10% of normal input price after the first call. For chat applications with long system prompts or RAG over fixed documents, caching can cut input costs by 75–90%. Anthropic charges 1.25× normal price to write a cache, then 0.1× to read.

What about batch pricing?

Batch APIs (OpenAI, Anthropic, Google) accept up to 24h-deferred jobs at a 50% discount. Perfect for offline workloads — backfills, evaluation runs, document processing. Worst case latency is 24 hours, often returns within 1–2 hours. Real-time chat can't use batch.

Where does the pricing data come from?

Pricing is fetched live from the OpenRouter API (openrouter.ai), which aggregates 100+ LLMs from OpenAI, Anthropic, Google, Meta, DeepSeek, Mistral and others. Data is cached in your browser for 24 hours, so the first load may take a moment but subsequent loads are instant. OpenRouter typically takes a small margin over direct provider pricing — for production budgets verify with the provider directly. If OpenRouter is unreachable we fall back to baked-in May 2026 list prices so the comparator always works.

🤖 LLM Cost Comparator

Side-by-side cost across all the leading LLM APIs. Pricing fetched live from OpenRouter and cached for 24 hours — covers Claude Opus, GPT, Gemini, Llama, DeepSeek and more.

Loading models from OpenRouter…

Avg input tokens / call

Avg output tokens / call

Calls per day

Cache hit rate (%)Repeated prefix? 70%+ realistic.

ℹ️ Sourced from OpenRouter (24-hr browser cache). Prices may differ slightly from direct provider pricing as OpenRouter takes a small margin. Cached input is ~10% of normal input price (Anthropic / OpenAI). Batch API pricing is ~50% of standard. Reasoning models (o-series, Claude with extended thinking) bill thinking tokens as output. If the live feed is unreachable we fall back to May 2026 list prices.

How LLM API Pricing Works

LLMs charge per million tokens, separately for input (your prompt + context) and output (what the model writes back). Output is typically 3–5× more expensive than input. A token is roughly 4 English characters or ¾ of a word.

Cost = (input_tokens × input_$ / 1M) + (output_tokens × output_$ / 1M)

The 2026 Price Landscape

Three tiers have emerged:

Frontier ($10–$75 / 1M tokens): Claude Opus 4.7, GPT-5, Gemini 3.1 Pro — best reasoning, vision, long context
Workhorse ($1–$5 / 1M): Claude Sonnet 4.6, GPT-5 mini, Gemini 2.5 Flash — 90% of frontier quality, 10–20% of price
Budget ($0.10–$1 / 1M): Claude Haiku 4.5, GPT-5 nano, DeepSeek V3, Llama 3.3 70B (via Groq/Together) — perfect for tagging, classification, simple chat

Where the Real Savings Live

Pricing is the easy lever; architectural levers move 2–10× more cost:

Prompt caching — repeated system prompts and RAG context cached at ~10% of normal price. 70%+ cache hit rate is realistic for chat / agent apps. Cuts input bill by 75–90%.
Batch API — 50% discount for jobs that can wait up to 24h. Perfect for backfills, evaluations, doc-processing pipelines.
Tiered routing — send simple turns to a cheap model and only escalate to a frontier model when needed. Halves blended cost for typical assistants.
Reasoning budgets — for o-series and Claude extended thinking, cap thinking tokens. Default thinking can balloon output cost 5–10×.
Self-hosted open weights — Llama 3.3 70B on a single H100 is ~$0.20/M tokens at 80%+ utilisation; only worth it above ~50M tokens/day.

Reading the Comparison Table

The table sorts by monthly spend at your inputs. Most apps are input-heavy (RAG, long docs, system prompts), where the input price dominates total cost. Chat-style apps with short prompts and long completions are output-heavy — output price dominates.

Other Cost Considerations

Context window: longer context = better retrieval but more input tokens = higher cost
Vision / image input: images convert to tokens (typically 200–1,200 per image)
Tool use / function calls: tool definitions count as input on every turn unless cached
Fine-tuning: training fee + inference at typically 2–4× the base model price

Pair with our Prompt Cost Calculator to estimate a specific prompt, the AI Model Cost Calculator for project-level scenarios, and the AI ROI Calculator to compare LLM cost against the human-labour cost it replaces.

⚠️

Important Note: Prices accurate as of May 2026 and change frequently. Always confirm on the provider's official pricing page (Anthropic, OpenAI, Google AI, Together, DeepSeek) before committing to a contract. Volume discounts, regional pricing, and Azure/Bedrock margins can shift effective rates 10–30%.

Related Calculators

AI Model Cost Calculator — Compare API pricing for 100+ LLMs including GPT-4o, Claude, Gemini and Llama. Calculate exact token costs.
AI vs Human ROI Calculator — Compare AI tool costs against human labour. Monthly savings, annual ROI and break-even with live model pricing.
Prompt Cost Calculator — Paste your prompt, choose a model, and instantly see how many tokens it uses and what it costs to send.