A token is a chunk of text the model reads or writes — roughly 4 characters or ¾ of a word in English. So 1,000 tokens ≈ 750 words. LLMs charge separately for input tokens (your prompt + context) and output tokens (the response). Output is typically 3–5× more expensive per million tokens.

How is API cost calculated?

Cost = (input tokens × input price per 1M) + (output tokens × output price per 1M). Example: 100K input + 30K output on Claude Opus 4.7 ($15 in / $75 out per 1M) = (100,000/1,000,000 × $15) + (30,000/1,000,000 × $75) = $1.50 + $2.25 = $3.75 per call.

Which LLM has the best cost-to-quality ratio?

For most tasks (May 2026): Claude Haiku 4.5, Gemini 2.5 Flash and GPT-5 nano are the value picks. For frontier reasoning, Claude Opus 4.7 and GPT-5 lead — at 5–10× the cost. DeepSeek V3 and Llama 3.3 70B (via Together / Groq) offer 80% of the quality at 10–20% of the cost. Match the model to the task — don't pay frontier prices for tagging or summarisation.

How does prompt caching reduce cost?

All major providers now offer prompt caching: repeated input prefixes (system prompts, RAG context, document content) get charged ~10% of normal input price after the first call. For chat applications with long system prompts or RAG over fixed documents, caching can cut input costs by 75–90%. Anthropic charges 1.25× normal price to write a cache, then 0.1× to read.

What about batch pricing?

Batch APIs (OpenAI, Anthropic, Google) accept up to 24h-deferred jobs at a 50% discount. Perfect for offline workloads — backfills, evaluation runs, document processing. Worst case latency is 24 hours, often returns within 1–2 hours. Real-time chat can't use batch.

Where does the pricing data come from?

Pricing is fetched live from the OpenRouter API (openrouter.ai), which aggregates 100+ LLMs from OpenAI, Anthropic, Google, Meta, DeepSeek, Mistral and others. Data is cached in your browser for 24 hours, so the first load may take a moment but subsequent loads are instant. OpenRouter typically takes a small margin over direct provider pricing — for production budgets verify with the provider directly. If OpenRouter is unreachable we fall back to baked-in May 2026 list prices so the comparator always works.

🤖 LLM लागत तुलनाकर्ता

सभी प्रमुख LLM API में साइड-बाय-साइड लागत। मूल्य OpenRouter से लाइव प्राप्त किया जाता है और 24 घंटे के लिए कैश किया जाता है — Claude Opus, GPT, Gemini, Llama, DeepSeek और अधिक को कवर करता है।

OpenRouter से मॉडल लोड हो रहे हैं…

औसत इनपुट टोकन / कॉल

औसत आउटपुट टोकन / कॉल

प्रति दिन कॉल

कैश हिट दर (%)बार-बार उपसर्ग? 70%+ यथार्थवादी।

ℹ️ OpenRouter से प्राप्त (24-घंटे ब्राउज़र कैश)। OpenRouter एक छोटा मार्जिन लेता है, इसलिए कीमतें सीधे प्रदाता मूल्य निर्धारण से थोड़ी भिन्न हो सकती हैं। कैश किया गया इनपुट सामान्य इनपुट मूल्य का ~10% है (Anthropic / OpenAI)। बैच API मूल्य निर्धारण मानक का ~50% है। रीज़निंग मॉडल (o-series, Claude with extended thinking) सोच टोकन को आउटपुट के रूप में बिल करते हैं। यदि लाइव फ़ीड अनुपलब्ध है तो हम मई 2026 की सूची कीमतों पर वापस आते हैं।

LLM API मूल्य निर्धारण कैसे काम करता है

LLM प्रति million tokens चार्ज करते हैं, input (आपका प्रॉम्प्ट + संदर्भ) और output (मॉडल जो वापस लिखता है) के लिए अलग से। आउटपुट आमतौर पर इनपुट से 3-5× अधिक महंगा होता है। एक टोकन लगभग 4 अंग्रेजी अक्षर या ¾ शब्द होता है।

Cost = (input_tokens × input_$ / 1M) + (output_tokens × output_$ / 1M)

2026 मूल्य परिदृश्य

तीन स्तर उभरे हैं:

Frontier ($10–$75 / 1M tokens): Claude Opus 4.7, GPT-5, Gemini 3.1 Pro — सर्वोत्तम तर्क, दृष्टि, लंबा संदर्भ
Workhorse ($1–$5 / 1M): Claude Sonnet 4.6, GPT-5 mini, Gemini 2.5 Flash — फ्रंटियर गुणवत्ता का 90%, कीमत का 10-20%
Budget ($0.10–$1 / 1M): Claude Haiku 4.5, GPT-5 nano, DeepSeek V3, Llama 3.3 70B (Groq/Together के माध्यम से) — टैगिंग, वर्गीकरण, सरल चैट के लिए बिल्कुल सही

वास्तविक बचत कहाँ है

मूल्य निर्धारण आसान लीवर है; architectural levers move 2–10× more cost:

Prompt caching — दोहराए गए सिस्टम प्रॉम्प्ट और RAG संदर्भ सामान्य मूल्य के ~10% पर कैश किए जाते हैं। चैट / एजेंट ऐप्स के लिए 70%+ कैश हिट दर यथार्थवादी है। इनपुट बिल को 75-90% तक कम करता है।
Batch API — 24 घंटे तक प्रतीक्षा कर सकने वाले कार्यों के लिए 50% छूट। बैकफिल, मूल्यांकन, दस्तावेज़-प्रसंस्करण पाइपलाइन के लिए बिल्कुल सही।
Tiered routing — सरल कार्यों को सस्ते मॉडल पर भेजें और केवल जरूरत पड़ने पर फ्रंटियर मॉडल पर एस्केलेट करें। सामान्य सहायकों के लिए मिश्रित लागत आधी हो जाती है।
Reasoning budgets — o-series और Claude extended thinking के लिए, सोच टोकन को सीमित करें। डिफ़ॉल्ट सोच आउटपुट लागत को 5-10× बढ़ा सकती है।
Self-hosted open weights — एकल H100 पर Llama 3.3 70B 80%+ उपयोग पर ~$0.20/M टोकन है; केवल ~50M टोकन/दिन से ऊपर उचित है।

तुलना तालिका पढ़ना

तालिका आपके इनपुट पर मासिक खर्च के अनुसार क्रमबद्ध है। अधिकांश ऐप input-heavy (RAG, लंबे दस्तावेज़, सिस्टम प्रॉम्प्ट) हैं, जहाँ इनपुट मूल्य कुल लागत पर हावी है। छोटे प्रॉम्प्ट और लंबे पूर्णताओं वाले चैट-शैली ऐप output-heavy हैं — आउटपुट मूल्य हावी है।

अन्य लागत विचार

Context window: लंबा संदर्भ = बेहतर पुनर्प्राप्ति लेकिन अधिक इनपुट टोकन = उच्च लागत
Vision / image input: छवियाँ टोकन में परिवर्तित होती हैं (सामान्यतः 200–1,200 प्रति छवि)
Tool use / function calls: टूल परिभाषाएं कैश न होने तक प्रत्येक टर्न पर इनपुट के रूप में गिनी जाती हैं
Fine-tuning: प्रशिक्षण शुल्क + अनुमान पर सामान्यतः बेस मॉडल मूल्य का 2–4 गुना

एक विशिष्ट प्रॉम्प्ट का अनुमान लगाने के लिए हमारे Prompt Cost Calculator के साथ, प्रोजेक्ट-स्तरीय परिदृश्यों के लिए AI Model Cost Calculator के साथ, और मानव-श्रम लागत के मुकाबले LLM लागत की तुलना करने के लिए AI ROI Calculator के साथ जोड़ें जिसे यह बदलता है।

⚠️

महत्वपूर्ण नोट: मई 2026 तक सटीक कीमतें जो अक्सर बदलती हैं। किसी अनुबंध के लिए प्रतिबद्ध होने से पहले हमेशा प्रदाता के आधिकारिक मूल्य निर्धारण पृष्ठ (Anthropic, OpenAI, Google AI, Together, DeepSeek) पर पुष्टि करें। वॉल्यूम छूट, क्षेत्रीय मूल्य निर्धारण और Azure/Bedrock मार्जिन प्रभावी दरों को 10–30% तक बदल सकते हैं।