A token is a chunk of text the model reads or writes — roughly 4 characters or ¾ of a word in English. So 1,000 tokens ≈ 750 words. LLMs charge separately for input tokens (your prompt + context) and output tokens (the response). Output is typically 3–5× more expensive per million tokens.

How is API cost calculated?

Cost = (input tokens × input price per 1M) + (output tokens × output price per 1M). Example: 100K input + 30K output on Claude Opus 4.7 ($15 in / $75 out per 1M) = (100,000/1,000,000 × $15) + (30,000/1,000,000 × $75) = $1.50 + $2.25 = $3.75 per call.

Which LLM has the best cost-to-quality ratio?

For most tasks (May 2026): Claude Haiku 4.5, Gemini 2.5 Flash and GPT-5 nano are the value picks. For frontier reasoning, Claude Opus 4.7 and GPT-5 lead — at 5–10× the cost. DeepSeek V3 and Llama 3.3 70B (via Together / Groq) offer 80% of the quality at 10–20% of the cost. Match the model to the task — don't pay frontier prices for tagging or summarisation.

How does prompt caching reduce cost?

All major providers now offer prompt caching: repeated input prefixes (system prompts, RAG context, document content) get charged ~10% of normal input price after the first call. For chat applications with long system prompts or RAG over fixed documents, caching can cut input costs by 75–90%. Anthropic charges 1.25× normal price to write a cache, then 0.1× to read.

What about batch pricing?

Batch APIs (OpenAI, Anthropic, Google) accept up to 24h-deferred jobs at a 50% discount. Perfect for offline workloads — backfills, evaluation runs, document processing. Worst case latency is 24 hours, often returns within 1–2 hours. Real-time chat can't use batch.

Where does the pricing data come from?

Pricing is fetched live from the OpenRouter API (openrouter.ai), which aggregates 100+ LLMs from OpenAI, Anthropic, Google, Meta, DeepSeek, Mistral and others. Data is cached in your browser for 24 hours, so the first load may take a moment but subsequent loads are instant. OpenRouter typically takes a small margin over direct provider pricing — for production budgets verify with the provider directly. If OpenRouter is unreachable we fall back to baked-in May 2026 list prices so the comparator always works.

🤖 LLM成本比较器

所有主流LLM API的并排费用比较。定价从OpenRouter实时获取并缓存24小时——涵盖Claude Opus、GPT、Gemini、Llama、DeepSeek等更多模型。

正在从OpenRouter加载模型……

平均输入令牌/次调用

每次调用的平均输出 token

每天通话次数

缓存命中率（%）重复前缀？70%以上可信。

ℹ️ 来源:OpenRouter(24 小时浏览器缓存)。 价格可能与直接提供商定价略有不同，因为OpenRouter收取少量利润。缓存输入约为正常输入价格的10%（Anthropic / OpenAI）。批量API定价约为标准定价的50%。推理模型（o系列、具有扩展思考的Claude）将思考令牌计为输出。如果实时数据源不可用，我们将回退到2026年5月的标价。

大语言模型API定价方式

大语言模型按million tokens收费，分别计算input（您的提示+上下文）和output（模型回写的内容）。输出通常比输入贵3-5倍。一个令牌大约是4个英文字符或4/5个单词。

Cost = (input_tokens × input_$ / 1M) + (output_tokens × output_$ / 1M)

2026 年价格格局

已出现三个层级：

Frontier ($10–$75 / 1M tokens): Claude Opus 4.7、GPT-5、Gemini 3.1 Pro — 最强推理、视觉与长上下文
Workhorse ($1–$5 / 1M): Claude Sonnet 4.6、GPT-5 mini、Gemini 2.5 Flash — 达到顶级模型90%的质量，仅需10-20%的费用
Budget ($0.10–$1 / 1M): Claude Haiku 4.5、GPT-5 nano、DeepSeek V3、Llama 3.3 70B（通过Groq/Together）— 非常适合标注、分类、简单对话

真正的节省在哪里

定价是最容易调整的杠杆；architectural levers move 2–10× more cost：

Prompt caching — 重复的系统提示和 RAG 上下文以正常价格的约 10% 缓存。聊天/代理应用的 70%+ 缓存命中率是现实的。削减输入费用 75–90%。
Batch API — 可等待最多24小时的任务享受50%折扣。非常适合后台填充、评估和文档处理流水线。
Tiered routing — 将简单请求发送给廉价模型，仅在需要时升级到前沿模型。可将典型助手的综合成本降低一半。
Reasoning budgets — 对于o系列和Claude扩展思考模式，请限制思考令牌数。默认思考模式可能使输出成本增加5–10倍。
Self-hosted open weights — 在单台H100上运行Llama 3.3 70B，80%以上利用率约0.20美元/百万令牌；仅在每日超过约5000万令牌时才值得考虑。

读懂对比表

表格按您的输入条件下的月度支出排序。大多数应用程序是input-heavy（RAG、长文档、系统提示），其中输入价格主导总成本。具有简短提示和长补全的聊天类应用程序是output-heavy——输出价格主导。

其他费用考量

Context window: 较长的上下文 = 更好的检索,但更多的输入 token = 更高的成本
Vision / image input: 图片转换为tokens（通常每张图片200–1,200个）
Tool use / function calls: 工具定义在每次对话中均计为输入，除非已缓存
Fine-tuning: 训练费用 + 推理通常为基础模型价格的2–4倍

与我们的 Prompt Cost Calculator 配合使用以估算特定提示,AI Model Cost Calculator 用于项目级情景,AI ROI Calculator 用于比较 LLM 成本与其替代的人力成本。

⚠️

重要提示： 价格准确至2026年5月，变动频繁。在签订合同前，请务必在提供商官方定价页面（Anthropic、OpenAI、Google AI、Together、DeepSeek）确认最新价格。批量折扣、区域定价和Azure/Bedrock利润可能使实际费率变动10–30%。