A token is a chunk of text the model reads or writes — roughly 4 characters or ¾ of a word in English. So 1,000 tokens ≈ 750 words. LLMs charge separately for input tokens (your prompt + context) and output tokens (the response). Output is typically 3–5× more expensive per million tokens.

How is API cost calculated?

Cost = (input tokens × input price per 1M) + (output tokens × output price per 1M). Example: 100K input + 30K output on Claude Opus 4.7 ($15 in / $75 out per 1M) = (100,000/1,000,000 × $15) + (30,000/1,000,000 × $75) = $1.50 + $2.25 = $3.75 per call.

Which LLM has the best cost-to-quality ratio?

For most tasks (May 2026): Claude Haiku 4.5, Gemini 2.5 Flash and GPT-5 nano are the value picks. For frontier reasoning, Claude Opus 4.7 and GPT-5 lead — at 5–10× the cost. DeepSeek V3 and Llama 3.3 70B (via Together / Groq) offer 80% of the quality at 10–20% of the cost. Match the model to the task — don't pay frontier prices for tagging or summarisation.

How does prompt caching reduce cost?

All major providers now offer prompt caching: repeated input prefixes (system prompts, RAG context, document content) get charged ~10% of normal input price after the first call. For chat applications with long system prompts or RAG over fixed documents, caching can cut input costs by 75–90%. Anthropic charges 1.25× normal price to write a cache, then 0.1× to read.

What about batch pricing?

Batch APIs (OpenAI, Anthropic, Google) accept up to 24h-deferred jobs at a 50% discount. Perfect for offline workloads — backfills, evaluation runs, document processing. Worst case latency is 24 hours, often returns within 1–2 hours. Real-time chat can't use batch.

Where does the pricing data come from?

Pricing is fetched live from the OpenRouter API (openrouter.ai), which aggregates 100+ LLMs from OpenAI, Anthropic, Google, Meta, DeepSeek, Mistral and others. Data is cached in your browser for 24 hours, so the first load may take a moment but subsequent loads are instant. OpenRouter typically takes a small margin over direct provider pricing — for production budgets verify with the provider directly. If OpenRouter is unreachable we fall back to baked-in May 2026 list prices so the comparator always works.

🤖 LLM Omkostningssammenligners

Side om side pris på tværs af alle de førende LLM API'er. Prissætning hentet live fra OpenRouter og cachet i 24 timer — dækker Claude Opus, GPT, Gemini, Llama, DeepSeek og mere.

Indlæser modeller fra OpenRouter…

Gns. input-tokens / kald

Avg output tokens / call

Calls per day

Cache-hitrate (%)Gentaget præfiks? 70%+ realistisk.

ℹ️ Hentet fra OpenRouter (24-timers browser-cache). Priser kan afvige lidt fra direkte udbyderpriser, da OpenRouter tager en lille margin. Cachet input er ~10% af normal inputpris (Anthropic / OpenAI). Batch API-priser er ~50% af standard. Reasoning-modeller (o-serien, Claude med udvidet tænkning) fakturerer thinking-tokens som output. Hvis live-feedet ikke er tilgængeligt, falder vi tilbage til maj 2026 listepriser.

How LLM API Pricing Works

LLM'er opkræver per million tokens, separat for input (din prompt + kontekst) og output (hvad modellen skriver tilbage). Output er typisk 3–5× dyrere end input. En token er ca. 4 engelske tegn eller ¾ af et ord.

Cost = (input_tokens × input_$ / 1M) + (output_tokens × output_$ / 1M)

The 2026 Price Landscape

Tre niveauer er opstået:

Frontier ($10–$75 / 1M tokens): Claude Opus 4.7, GPT-5, Gemini 3.1 Pro — bedste reasoning, vision, lang kontekst
Workhorse ($1–$5 / 1M): Claude Sonnet 4.6, GPT-5 mini, Gemini 2.5 Flash — 90% af frontier-kvalitet, 10–20% af prisen
Budget ($0.10–$1 / 1M): Claude Haiku 4.5, GPT-5 nano, DeepSeek V3, Llama 3.3 70B (via Groq/Together) — perfect for tagging, classification, simple chat

Where the Real Savings Live

Pricing is the easy lever; architectural levers move 2–10× more cost:

Prompt caching — gentagne systeminstruktioner og RAG-kontekst caches til ca. 10% af normalprisen. En cache-hitrate på 70%+ er realistisk for chat/agent-apps. Reducerer inputregningen med 75–90%.
Batch API — 50% discount for jobs that can wait up to 24h. Perfect for backfills, evaluations, doc-processing pipelines.
Tiered routing — send simple turns to a cheap model and only escalate to a frontier model when needed. Halves blended cost for typical assistants.
Reasoning budgets — for o-serien og Claude udvidet tænkning, begræns thinking-tokens. Standard tænkning kan ballonere output-omkostninger 5–10×.
Self-hosted open weights — Llama 3.3 70B på en enkelt H100 er ~$0,20/M tokens ved 80%+ udnyttelse; kun det værd over ~50M tokens/dag.

Reading the Comparison Table

Tabellen sorterer efter månedlig forbrug ved dine input. De fleste apps er input-heavy (RAG, lange dokumenter, systemprompts), hvor inputprisen dominerer den samlede omkostning. Chat-apps med korte prompts og lange completions er output-heavy — outputprisen dominerer.

Other Cost Considerations

Context window: longer context = better retrieval but more input tokens = higher cost
Vision / image input: images convert to tokens (typically 200–1,200 per image)
Tool use / function calls: tool definitions count as input on every turn unless cached
Fine-tuning: training fee + inference at typically 2–4× the base model price

Pair with our Prompt Cost Calculator to estimate a specific prompt, the AI Model Cost Calculator for project-level scenarios, and the AI ROI Calculator to compare LLM cost against the human-labour cost it replaces.

⚠️

Vigtig bemærkning: Prices accurate as of May 2026 and change frequently. Always confirm on the provider's official pricing page (Anthropic, OpenAI, Google AI, Together, DeepSeek) before committing to a contract. Volumen discounts, regional pricing, and Azure/Bedrock margins can shift effective rates 10–30%.

Relaterede lommeregnere

AI Model Cost Calculator — Sammenlign API-priser for 100+ LLM'er inkl. GPT-4o, Claude, Gemini og Llama. Beregn præcise tokenomkostninger.
AI vs Human ROI Calculator — Sammenlign AI-værktøjsomkostninger med menneskelig arbejdskraft. Månedlige besparelser, årligt ROI og break-even med live modelpriser.
Prompt Cost Calculator — Paste your prompt, choose a model, and instantly see how many tokens it uses and what it costs to send.