A token is a chunk of text the model reads or writes — roughly 4 characters or ¾ of a word in English. So 1,000 tokens ≈ 750 words. LLMs charge separately for input tokens (your prompt + context) and output tokens (the response). Output is typically 3–5× more expensive per million tokens.

How is API cost calculated?

Cost = (input tokens × input price per 1M) + (output tokens × output price per 1M). Example: 100K input + 30K output on Claude Opus 4.7 ($15 in / $75 out per 1M) = (100,000/1,000,000 × $15) + (30,000/1,000,000 × $75) = $1.50 + $2.25 = $3.75 per call.

Which LLM has the best cost-to-quality ratio?

For most tasks (May 2026): Claude Haiku 4.5, Gemini 2.5 Flash and GPT-5 nano are the value picks. For frontier reasoning, Claude Opus 4.7 and GPT-5 lead — at 5–10× the cost. DeepSeek V3 and Llama 3.3 70B (via Together / Groq) offer 80% of the quality at 10–20% of the cost. Match the model to the task — don't pay frontier prices for tagging or summarisation.

How does prompt caching reduce cost?

All major providers now offer prompt caching: repeated input prefixes (system prompts, RAG context, document content) get charged ~10% of normal input price after the first call. For chat applications with long system prompts or RAG over fixed documents, caching can cut input costs by 75–90%. Anthropic charges 1.25× normal price to write a cache, then 0.1× to read.

What about batch pricing?

Batch APIs (OpenAI, Anthropic, Google) accept up to 24h-deferred jobs at a 50% discount. Perfect for offline workloads — backfills, evaluation runs, document processing. Worst case latency is 24 hours, often returns within 1–2 hours. Real-time chat can't use batch.

Where does the pricing data come from?

Pricing is fetched live from the OpenRouter API (openrouter.ai), which aggregates 100+ LLMs from OpenAI, Anthropic, Google, Meta, DeepSeek, Mistral and others. Data is cached in your browser for 24 hours, so the first load may take a moment but subsequent loads are instant. OpenRouter typically takes a small margin over direct provider pricing — for production budgets verify with the provider directly. If OpenRouter is unreachable we fall back to baked-in May 2026 list prices so the comparator always works.

🤖 LLMコスト比較ツール

主要なLLM APIのコストを並べて比較。価格はOpenRouterからリアルタイムで取得され、24時間キャッシュされます — Claude Opus、GPT、Gemini、Llama、DeepSeekなどを網羅しています。

OpenRouterからモデルを読み込み中…

1コールあたりの平均入力トークン数

1コールあたりの平均出力トークン

1日あたりの通話数

キャッシュヒット率（%）プレフィックスの繰り返し？70%以上が現実的。

ℹ️ OpenRouterより取得（24時間ブラウザキャッシュ）。 OpenRouterが小さなマージンを取るため、価格は直接プロバイダー価格とわずかに異なる場合があります。キャッシュされた入力は通常の入力価格の約10%です（Anthropic / OpenAI）。Batch API価格は標準の約50%です。推論モデル（oシリーズ、拡張思考を持つClaude）はthinkingトークンを出力として請求します。ライブフィードに到達できない場合は2026年5月のリスト価格にフォールバックします。

LLM APIの価格体系について

LLMはmillion tokensあたりで請求し、input（プロンプト + コンテキスト）とoutput（モデルが返す内容）に別々に課金されます。出力は通常、入力の3〜5倍高価です。トークンは約4つの英語文字または単語の¾に相当します。

Cost = (input_tokens × input_$ / 1M) + (output_tokens × output_$ / 1M)

2026年の価格情勢

3つの階層が現れました：

Frontier ($10–$75 / 1M tokens): Claude Opus 4.7、GPT-5、Gemini 3.1 Pro — 最高レベルの推論・ビジョン・長文コンテキスト
Workhorse ($1–$5 / 1M): Claude Sonnet 4.6、GPT-5 mini、Gemini 2.5 Flash — フロンティアクオリティの90%、価格の10〜20%
Budget ($0.10–$1 / 1M): Claude Haiku 4.5、GPT-5 nano、DeepSeek V3、Llama 3.3 70B（Groq/Together経由）— タグ付け、分類、シンプルなチャットに最適

本当の節約が生まれる場所

価格設定は簡単なレバーです；architectural levers move 2–10× more cost：

Prompt caching — 繰り返されるシステムプロンプトとRAGコンテキストは、通常価格の約10%でキャッシュされます。70%以上のキャッシュヒット率はチャット/エージェントアプリで現実的です。入力料金を75〜90%削減します。
Batch API — 最大24時間待てるジョブに対して50%割引。バックフィル、評価、文書処理パイプラインに最適。
Tiered routing — シンプルなやり取りは安価なモデルに送り、必要な時だけフロンティアモデルにエスカレーションします。一般的なアシスタントで混合コストが半減します。
Reasoning budgets — oシリーズとClaude拡張思考では、thinkingトークンを制限してください。デフォルトのthinkingで出力コストが5〜10倍に膨れ上がる可能性があります。
Self-hosted open weights — H100 1台でのLlama 3.3 70Bは80%以上の稼働率で〜$0.20/Mトークン；1日50Mトークン以上の場合のみ価値がある。

比較表の見方

テーブルはあなたの入力での月間支出でソートされます。ほとんどのアプリはinput-heavyです（RAG、長文書、システムプロンプト）、入力価格が総コストを支配します。短いプロンプトと長い補完のチャットスタイルのアプリはoutput-heavyです — 出力価格が支配します。

その他のコスト考慮事項

Context window: より長いコンテキスト = より良い検索だが、より多くの入力トークン = より高いコスト
Vision / image input: 画像はトークンに変換されます（通常、画像1枚あたり200〜1,200トークン）
Tool use / function calls: ツール定義は、キャッシュされない限りすべてのターンで入力としてカウントされます
Fine-tuning: トレーニング費用 + 通常ベースモデルの2〜4倍の推論費用

特定のプロンプトを推定するために当社の Prompt Cost Calculator、プロジェクトレベルのシナリオには AI Model Cost Calculator、LLMコストと置き換える人件費を比較するには AI ROI Calculator と組み合わせてください。

⚠️

重要な注意事項： 価格は2026年5月時点のものであり、頻繁に変動します。契約を締結する前に、プロバイダーの公式価格ページ（Anthropic、OpenAI、Google AI、Together、DeepSeek）で必ず確認してください。ボリュームディスカウント、地域別価格設定、Azure/Bedrockのマージンにより、実効レートが10〜30%変動する可能性があります。