A token is a chunk of text the model reads or writes — roughly 4 characters or ¾ of a word in English. So 1,000 tokens ≈ 750 words. LLMs charge separately for input tokens (your prompt + context) and output tokens (the response). Output is typically 3–5× more expensive per million tokens.

How is API cost calculated?

Cost = (input tokens × input price per 1M) + (output tokens × output price per 1M). Example: 100K input + 30K output on Claude Opus 4.7 ($15 in / $75 out per 1M) = (100,000/1,000,000 × $15) + (30,000/1,000,000 × $75) = $1.50 + $2.25 = $3.75 per call.

Which LLM has the best cost-to-quality ratio?

For most tasks (May 2026): Claude Haiku 4.5, Gemini 2.5 Flash and GPT-5 nano are the value picks. For frontier reasoning, Claude Opus 4.7 and GPT-5 lead — at 5–10× the cost. DeepSeek V3 and Llama 3.3 70B (via Together / Groq) offer 80% of the quality at 10–20% of the cost. Match the model to the task — don't pay frontier prices for tagging or summarisation.

How does prompt caching reduce cost?

All major providers now offer prompt caching: repeated input prefixes (system prompts, RAG context, document content) get charged ~10% of normal input price after the first call. For chat applications with long system prompts or RAG over fixed documents, caching can cut input costs by 75–90%. Anthropic charges 1.25× normal price to write a cache, then 0.1× to read.

What about batch pricing?

Batch APIs (OpenAI, Anthropic, Google) accept up to 24h-deferred jobs at a 50% discount. Perfect for offline workloads — backfills, evaluation runs, document processing. Worst case latency is 24 hours, often returns within 1–2 hours. Real-time chat can't use batch.

Where does the pricing data come from?

Pricing is fetched live from the OpenRouter API (openrouter.ai), which aggregates 100+ LLMs from OpenAI, Anthropic, Google, Meta, DeepSeek, Mistral and others. Data is cached in your browser for 24 hours, so the first load may take a moment but subsequent loads are instant. OpenRouter typically takes a small margin over direct provider pricing — for production budgets verify with the provider directly. If OpenRouter is unreachable we fall back to baked-in May 2026 list prices so the comparator always works.

🤖 LLM 비용 비교기

모든 주요 LLM API의 병렬 비용 비교. 가격은 OpenRouter에서 실시간으로 가져와 24시간 캐시됨 — Claude Opus, GPT, Gemini, Llama, DeepSeek 등을 포함합니다.

OpenRouter에서 모델 로딩 중…

호출당 평균 입력 토큰

통화당 평균 출력 토큰

일일 통화 수

캐시 적중률 (%)접두사 반복? 70%+ 현실적.

ℹ️ OpenRouter에서 소스 (24시간 브라우저 캐시). 가격은 OpenRouter가 소정의 마진을 취하기 때문에 직접 제공업체 가격과 약간 다를 수 있습니다. 캐시된 입력은 일반 입력 가격의 ~10%입니다(Anthropic / OpenAI). 배치 API 가격은 표준의 ~50%입니다. 추론 모델(o 시리즈, 확장 사고가 있는 Claude)은 사고 토큰을 출력으로 청구합니다. 라이브 피드에 접근할 수 없으면 2026년 5월 정가로 대체됩니다.

LLM API 가격이 작동하는 방식

LLM은 million tokens당 요금을 부과하며, input(프롬프트 + 컨텍스트)와 output(모델이 생성하는 내용)을 별도로 청구합니다. 출력은 일반적으로 입력보다 3~5배 더 비쌉니다. 토큰은 대략 영어 4자 또는 단어의 ¾에 해당합니다.

Cost = (input_tokens × input_$ / 1M) + (output_tokens × output_$ / 1M)

2026년 가격 현황

세 가지 계층이 등장했습니다:

Frontier ($10–$75 / 1M tokens): Claude Opus 4.7, GPT-5, Gemini 3.1 Pro — 최고의 추론, 비전, 긴 컨텍스트
Workhorse ($1–$5 / 1M): Claude Sonnet 4.6, GPT-5 mini, Gemini 2.5 Flash — 최고 수준 품질의 90%, 가격의 10~20%
Budget ($0.10–$1 / 1M): Claude Haiku 4.5, GPT-5 nano, DeepSeek V3, Llama 3.3 70B (Groq/Together를 통해) — 태깅, 분류, 간단한 채팅에 적합

실제 절약이 이루어지는 곳

가격 책정은 쉬운 레버입니다; architectural levers move 2–10× more cost:

Prompt caching — 반복되는 시스템 프롬프트와 RAG 컨텍스트가 정상 가격의 약 10%로 캐시됩니다. 채팅/에이전트 앱에서는 70% 이상의 캐시 적중률이 현실적입니다. 입력 비용을 75-90% 절감합니다.
Batch API — 최대 24시간 대기 가능한 작업에 50% 할인. 백필, 평가, 문서 처리 파이프라인에 적합.
Tiered routing — 간단한 요청은 저렴한 모델로 보내고 필요한 경우에만 프론티어 모델로 에스컬레이션하십시오. 일반적인 어시스턴트의 혼합 비용을 절반으로 줄입니다.
Reasoning budgets — o 시리즈 및 Claude 확장 사고의 경우, 생각 토큰을 제한하십시오. 기본 사고는 출력 비용을 5–10배 증가시킬 수 있습니다.
Self-hosted open weights — 단일 H100에서 Llama 3.3 70B는 80%+ 이용률에서 ~$0.20/M 토큰입니다. 하루 ~5000만 토큰 이상에서만 가치가 있습니다.

비교 표 읽는 방법

표는 입력값에 따른 월간 지출을 기준으로 정렬됩니다. 대부분의 앱은 input-heavy(RAG, 긴 문서, 시스템 프롬프트)으로, 입력 가격이 총 비용을 지배합니다. 짧은 프롬프트와 긴 완성을 가진 채팅 스타일 앱은 output-heavy으로 출력 가격이 지배합니다.

기타 비용 고려 사항

Context window: 더 긴 컨텍스트 = 더 나은 검색이지만 더 많은 입력 토큰 = 더 높은 비용
Vision / image input: 이미지는 토큰으로 변환됩니다 (이미지당 일반적으로 200~1,200개)
Tool use / function calls: 도구 정의는 캐시되지 않는 한 모든 턴의 입력으로 계산됩니다
Fine-tuning: 훈련 비용 + 일반적으로 기본 모델 가격의 2~4배 수준의 추론 비용

특정 프롬프트를 추정하려면 Prompt Cost Calculator와 결합하고, 프로젝트 수준 시나리오는 AI Model Cost Calculator, LLM 비용을 대체하는 인건비와 비교하려면 AI ROI Calculator를 사용하세요.

⚠️

중요 참고: 가격은 2026년 5월 기준으로 자주 변경됩니다. 계약을 체결하기 전에 해당 공급업체의 공식 가격 페이지(Anthropic, OpenAI, Google AI, Together, DeepSeek)에서 반드시 확인하십시오. 볼륨 할인, 지역별 가격 및 Azure/Bedrock 마진에 따라 실제 요금이 10~30% 변동될 수 있습니다.