One token is roughly 4 characters or ¾ of an English word. A 1,000-word document is approximately 1,300 tokens.

What is the cheapest OpenAI model?

GPT-4o mini at $0.15 per million input tokens and $0.60 per million output tokens is the cheapest current OpenAI model.

How much does Claude Opus 4.7 cost?

Claude Opus 4.7 costs $5.00 per million input tokens and $25.00 per million output tokens, with cached input at $0.50 per million tokens.

AI Model Pricing Comparison (May 2026)

Q: What is prompt caching?

Prompt caching lets you reuse a fixed prefix across calls at 10–50% of the standard input rate, dramatically reducing costs for document-heavy workflows.

OpenAI

OpenAI API

GPT-5.4

Current mainstream frontier. Best OpenAI model for production workloads — reasoning, vision, and multimodal tasks.

Input $2.50/ 1M tokens

Output $15.00/ 1M tokens

Cached input $1.25/ 1M tokens

Batch discount 50%

128K context Calculate costs

GPT-5.4 mini

Frontier-level intelligence at mid-tier cost. Replaces GPT-4o as the go-to balanced option for most production use cases.

Input $0.75/ 1M tokens

Output $4.50/ 1M tokens

Cached input $0.375/ 1M tokens

Batch discount 50%

128K context Calculate costs

GPT-4o mini

Still the cheapest OpenAI option per token. Ideal for high-volume classification, summarisation, or simple routing tasks.

Input $0.15/ 1M tokens

Output $0.60/ 1M tokens

Cached input $0.075/ 1M tokens

Batch discount 50%

128K context Calculate costs

GPT-4.1

Best choice for document-heavy workflows. 1M token context window for long codebases, contracts, or multi-document retrieval.

Input $2.00/ 1M tokens

Output $8.00/ 1M tokens

Cached input $0.50/ 1M tokens

Batch discount 50%

1M context Calculate costs

GPT-4.1 mini

1M context at budget price. Great for batch processing large documents where you don't need frontier quality.

Input $0.40/ 1M tokens

Output $1.60/ 1M tokens

Cached input $0.10/ 1M tokens

Batch discount 50%

1M context Calculate costs

Anthropic

Claude API

Claude Opus 4.7

Most capable Claude model. Step-change improvement in agentic coding over Opus 4.6. For complex reasoning and autonomous agents.

Input $5.00/ 1M tokens

Output $25.00/ 1M tokens

Cached input $0.50/ 1M tokens

Batch discount 50%

1M context Calculate costs

Claude Sonnet 4.6

Best balance of speed and intelligence. Production-ready for coding, analysis, and complex reasoning. Extended thinking available.

Input $3.00/ 1M tokens

Output $15.00/ 1M tokens

Cached input $0.30/ 1M tokens

Batch discount 50%

1M context Calculate costs

Claude Haiku 4.5

Fastest Claude with near-frontier intelligence. Built for customer-facing apps where latency and cost per call matter most.

Input $1.00/ 1M tokens

Output $5.00/ 1M tokens

Cached input $0.10/ 1M tokens

Batch discount 50%

200K context Calculate costs

Side-by-side comparison

All models ranked by input token price. May 2026 — verify at provider pages before committing to a budget.

Model	Input / 1M	Cached / 1M	Output / 1M	Context
GPT-4o mini OpenAI	$0.15	$0.075	$0.60	128K	Calculate →
GPT-4.1 mini OpenAI	$0.40	$0.10	$1.60	1M	Calculate →
GPT-5.4 mini OpenAI	$0.75	$0.375	$4.50	128K	Calculate →
Claude Haiku 4.5 Anthropic	$1.00	$0.10	$5.00	200K	Calculate →
GPT-4.1 OpenAI	$2.00	$0.50	$8.00	1M	Calculate →
GPT-5.4 OpenAI	$2.50	$1.25	$15.00	128K	Calculate →
Claude Sonnet 4.6 Anthropic	$3.00	$0.30	$15.00	1M	Calculate →
Claude Opus 4.7 Anthropic	$5.00	$0.50	$25.00	1M	Calculate →

Frequently asked questions

What is a token?

Tokens are the units LLMs use to process text. One token is roughly 4 characters or ¾ of an English word. "Hello world" is about 3 tokens. A 1,000-word document is roughly 1,300 tokens.

What's the difference between input and output tokens?

Input tokens are everything you send to the model — your prompt, system instructions, retrieved documents. Output tokens are what the model generates back. Output is typically 3–5× more expensive per token because it's computationally heavier.

What is prompt caching?

Prompt caching lets you reuse a fixed prefix (e.g. a long system prompt or knowledge base) across calls. Cached tokens cost 50–90% less than uncached. Both OpenAI and Anthropic support it. High cache hit rates dramatically reduce costs for document-heavy workflows.

What is the batch API?

Both OpenAI and Anthropic offer a batch API where jobs are processed asynchronously (within 24 hours) at a 50% discount. Ideal for background tasks, bulk evaluation, or overnight processing pipelines that don't need real-time responses.

How do I estimate my monthly token usage?

Multiply: (monthly active users) × (workflows per user) × (model calls per workflow) × (avg tokens per call). Don't forget retries, which typically add 5–15% overhead. Use the full calculator to model this with low/expected/high scenarios.

Why is the real cost higher than just tokens?

Token costs are rarely the biggest line item in a production AI system. Fixed infrastructure, embeddings, web search, and OCR often dominate. The full-stack calculator models all of these and shows you what's actually driving cost.

GPT-4o vs Claude Sonnet 4 — which is cheaper?

GPT-4o input is marginally cheaper ($2.50 vs $3.00/M). But Claude Sonnet 4 has a significantly lower cached rate ($0.30 vs $1.25/M) — so if your workflow has high cache hit rates (long system prompts, RAG pipelines), Sonnet 4 can be cheaper overall.

When should I use a mini / Haiku model?

For classification, routing, short-form generation, or any task where a smaller model gets the job done. GPT-4o mini and Claude Haiku 3.5 are 10–20× cheaper than their flagship counterparts. Run a quality eval on your specific task before assuming you need the bigger model.

OpenAI

GPT-5.4

GPT-5.4 mini

GPT-4o mini

GPT-4.1

GPT-4.1 mini

Anthropic

Claude Opus 4.7

Claude Sonnet 4.6

Claude Haiku 4.5

Side-by-side comparison

Frequently asked questions

What is a token?

What's the difference between input and output tokens?

What is prompt caching?

What is the batch API?

How do I estimate my monthly token usage?

Why is the real cost higher than just tokens?

GPT-4o vs Claude Sonnet 4 — which is cheaper?

When should I use a mini / Haiku model?

Token price is just the start.