ModelPricing
Verify prices at provider pages before budgeting

AI Model Pricing
Comparison

Input, output, and cached token rates for every major LLM. Sourced from official provider documentation, May 2026.

OpenAI

OpenAI API

GPT-5.4

Current mainstream frontier. Best OpenAI model for production workloads — reasoning, vision, and multimodal tasks.

Input $2.50/ 1M tokens
Output $15.00/ 1M tokens
Cached input $1.25/ 1M tokens
Batch discount 50%

GPT-5.4 mini

Frontier-level intelligence at mid-tier cost. Replaces GPT-4o as the go-to balanced option for most production use cases.

Input $0.75/ 1M tokens
Output $4.50/ 1M tokens
Cached input $0.375/ 1M tokens
Batch discount 50%

GPT-4o mini

Still the cheapest OpenAI option per token. Ideal for high-volume classification, summarisation, or simple routing tasks.

Input $0.15/ 1M tokens
Output $0.60/ 1M tokens
Cached input $0.075/ 1M tokens
Batch discount 50%

GPT-4.1

Best choice for document-heavy workflows. 1M token context window for long codebases, contracts, or multi-document retrieval.

Input $2.00/ 1M tokens
Output $8.00/ 1M tokens
Cached input $0.50/ 1M tokens
Batch discount 50%

GPT-4.1 mini

1M context at budget price. Great for batch processing large documents where you don't need frontier quality.

Input $0.40/ 1M tokens
Output $1.60/ 1M tokens
Cached input $0.10/ 1M tokens
Batch discount 50%

Anthropic

Claude API

Claude Opus 4.7

Most capable Claude model. Step-change improvement in agentic coding over Opus 4.6. For complex reasoning and autonomous agents.

Input $5.00/ 1M tokens
Output $25.00/ 1M tokens
Cached input $0.50/ 1M tokens
Batch discount 50%

Claude Sonnet 4.6

Best balance of speed and intelligence. Production-ready for coding, analysis, and complex reasoning. Extended thinking available.

Input $3.00/ 1M tokens
Output $15.00/ 1M tokens
Cached input $0.30/ 1M tokens
Batch discount 50%

Claude Haiku 4.5

Fastest Claude with near-frontier intelligence. Built for customer-facing apps where latency and cost per call matter most.

Input $1.00/ 1M tokens
Output $5.00/ 1M tokens
Cached input $0.10/ 1M tokens
Batch discount 50%

Side-by-side comparison

All models ranked by input token price. May 2026 — verify at provider pages before committing to a budget.

Model Input / 1M Cached / 1M Output / 1M Context
GPT-4o mini OpenAI $0.15 $0.075 $0.60 128K Calculate →
GPT-4.1 mini OpenAI $0.40 $0.10 $1.60 1M Calculate →
GPT-5.4 mini OpenAI $0.75 $0.375 $4.50 128K Calculate →
Claude Haiku 4.5 Anthropic $1.00 $0.10 $5.00 200K Calculate →
GPT-4.1 OpenAI $2.00 $0.50 $8.00 1M Calculate →
GPT-5.4 OpenAI $2.50 $1.25 $15.00 128K Calculate →
Claude Sonnet 4.6 Anthropic $3.00 $0.30 $15.00 1M Calculate →
Claude Opus 4.7 Anthropic $5.00 $0.50 $25.00 1M Calculate →

Frequently asked questions

What is a token?

Tokens are the units LLMs use to process text. One token is roughly 4 characters or ¾ of an English word. "Hello world" is about 3 tokens. A 1,000-word document is roughly 1,300 tokens.

What's the difference between input and output tokens?

Input tokens are everything you send to the model — your prompt, system instructions, retrieved documents. Output tokens are what the model generates back. Output is typically 3–5× more expensive per token because it's computationally heavier.

What is prompt caching?

Prompt caching lets you reuse a fixed prefix (e.g. a long system prompt or knowledge base) across calls. Cached tokens cost 50–90% less than uncached. Both OpenAI and Anthropic support it. High cache hit rates dramatically reduce costs for document-heavy workflows.

What is the batch API?

Both OpenAI and Anthropic offer a batch API where jobs are processed asynchronously (within 24 hours) at a 50% discount. Ideal for background tasks, bulk evaluation, or overnight processing pipelines that don't need real-time responses.

How do I estimate my monthly token usage?

Multiply: (monthly active users) × (workflows per user) × (model calls per workflow) × (avg tokens per call). Don't forget retries, which typically add 5–15% overhead. Use the full calculator to model this with low/expected/high scenarios.

Why is the real cost higher than just tokens?

Token costs are rarely the biggest line item in a production AI system. Fixed infrastructure, embeddings, web search, and OCR often dominate. The full-stack calculator models all of these and shows you what's actually driving cost.

GPT-4o vs Claude Sonnet 4 — which is cheaper?

GPT-4o input is marginally cheaper ($2.50 vs $3.00/M). But Claude Sonnet 4 has a significantly lower cached rate ($0.30 vs $1.25/M) — so if your workflow has high cache hit rates (long system prompts, RAG pipelines), Sonnet 4 can be cheaper overall.

When should I use a mini / Haiku model?

For classification, routing, short-form generation, or any task where a smaller model gets the job done. GPT-4o mini and Claude Haiku 3.5 are 10–20× cheaper than their flagship counterparts. Run a quality eval on your specific task before assuming you need the bigger model.

Token price is just the start.

Model all your AI costs — search, embeddings, infrastructure — and find your real margin.

Open the full calculator