How LLM API Pricing Works
LLMs charge per million tokens, separately for input (your prompt + context) and output (what the model writes back). Output is typically 3–5× more expensive than input. A token is roughly 4 English characters or ¾ of a word.
Cost = (input_tokens × input_$ / 1M) + (output_tokens × output_$ / 1M)
The 2026 Price Landscape
Three tiers have emerged:
- Frontier ($10–$75 / 1M tokens): Claude Opus 4.7, GPT-5, Gemini 3.1 Pro — best reasoning, vision, long context
- Workhorse ($1–$5 / 1M): Claude Sonnet 4.6, GPT-5 mini, Gemini 2.5 Flash — 90% of frontier quality, 10–20% of price
- Budget ($0.10–$1 / 1M): Claude Haiku 4.5, GPT-5 nano, DeepSeek V3, Llama 3.3 70B (via Groq/Together) — perfect for tagging, classification, simple chat
Where the Real Savings Live
Pricing is the easy lever; architectural levers move 2–10× more cost:
- Prompt caching — repeated system prompts and RAG context cached at ~10% of normal price. 70%+ cache hit rate is realistic for chat / agent apps. Cuts input bill by 75–90%.
- Batch API — 50% discount for jobs that can wait up to 24h. Perfect for backfills, evaluations, doc-processing pipelines.
- Tiered routing — send simple turns to a cheap model and only escalate to a frontier model when needed. Halves blended cost for typical assistants.
- Reasoning budgets — for o-series and Claude extended thinking, cap thinking tokens. Default thinking can balloon output cost 5–10×.
- Self-hosted open weights — Llama 3.3 70B on a single H100 is ~$0.20/M tokens at 80%+ utilisation; only worth it above ~50M tokens/day.
Reading the Comparison Table
The table sorts by monthly spend at your inputs. Most apps are input-heavy (RAG, long docs, system prompts), where the input price dominates total cost. Chat-style apps with short prompts and long completions are output-heavy — output price dominates.
Other Cost Considerations
- Context window: longer context = better retrieval but more input tokens = higher cost
- Vision / image input: images convert to tokens (typically 200–1,200 per image)
- Tool use / function calls: tool definitions count as input on every turn unless cached
- Fine-tuning: training fee + inference at typically 2–4× the base model price
Pair with our Prompt Cost Calculator to estimate a specific prompt, the AI Model Cost Calculator for project-level scenarios, and the AI ROI Calculator to compare LLM cost against the human-labour cost it replaces.
Related Calculators
- AI Model Cost Calculator — Compare API pricing for 100+ LLMs including GPT-4o, Claude, Gemini and Llama. Calculate exact token costs.
- AI vs Human ROI Calculator — Compare AI tool costs against human labour. Monthly savings, annual ROI and break-even with live model pricing.
- Prompt Cost Calculator — Paste your prompt, choose a model, and instantly see how many tokens it uses and what it costs to send.