OpenAI

OpenAI API

GPT-5.5

OpenAI’s current flagship. Strongest reasoning in the lineup — priced for high-value tasks rather than volume workloads.

Input $5.00/ 1M tokens
Output $30.00/ 1M tokens
Cached input $0.50/ 1M tokens
Batch discount 50%

GPT-5.4

Current mainstream frontier. Best OpenAI model for production workloads — reasoning, vision, and multimodal tasks.

Input $2.50/ 1M tokens
Output $15.00/ 1M tokens
Cached input $0.25/ 1M tokens
Batch discount 50%

GPT-5.4 mini

Frontier-level intelligence at mid-tier cost. Replaces GPT-4o as the go-to balanced option for most production use cases.

Input $0.75/ 1M tokens
Output $4.50/ 1M tokens
Cached input $0.075/ 1M tokens
Batch discount 50%

GPT-4o mini

Still the cheapest OpenAI option per token. Ideal for high-volume classification, summarisation, or simple routing tasks.

Input $0.15/ 1M tokens
Output $0.60/ 1M tokens
Cached input $0.075/ 1M tokens
Batch discount 50%

GPT-4.1

Best choice for document-heavy workflows. 1M token context window for long codebases, contracts, or multi-document retrieval.

Input $2.00/ 1M tokens
Output $8.00/ 1M tokens
Cached input $0.50/ 1M tokens
Batch discount 50%

GPT-4.1 mini

1M context at budget price. Great for batch processing large documents where you don’t need frontier quality.

Input $0.40/ 1M tokens
Output $1.60/ 1M tokens
Cached input $0.10/ 1M tokens
Batch discount 50%

Anthropic

Claude API

Claude Opus 4.8

Anthropic’s current flagship (released May 2026). Same price as Opus 4.7 with stronger reasoning and agentic performance.

Input $5.00/ 1M tokens
Output $25.00/ 1M tokens
Cached input $0.50/ 1M tokens
Batch discount 50%

Claude Opus 4.7

Most capable Claude model. Step-change improvement in agentic coding over Opus 4.6. For complex reasoning and autonomous agents.

Input $5.00/ 1M tokens
Output $25.00/ 1M tokens
Cached input $0.50/ 1M tokens
Batch discount 50%

Claude Sonnet 4.6

Best balance of speed and intelligence. Production-ready for coding, analysis, and complex reasoning. Extended thinking available.

Input $3.00/ 1M tokens
Output $15.00/ 1M tokens
Cached input $0.30/ 1M tokens
Batch discount 50%

Claude Haiku 4.5

Fastest Claude with near-frontier intelligence. Built for customer-facing apps where latency and cost per call matter most.

Input $1.00/ 1M tokens
Output $5.00/ 1M tokens
Cached input $0.10/ 1M tokens
Batch discount 50%

Side-by-side comparison

All models ranked by input token price. June 2026 — verify at provider pages before committing to a budget.

Model Input / 1M Cached / 1M Output / 1M Context
GPT-4o mini OpenAI $0.15 $0.075 $0.60 128K Calculate →
GPT-4.1 mini OpenAI $0.40 $0.10 $1.60 1M Calculate →
GPT-5.4 mini OpenAI $0.75 $0.075 $4.50 128K Calculate →
Claude Haiku 4.5 Anthropic $1.00 $0.10 $5.00 200K Calculate →
GPT-4.1 OpenAI $2.00 $0.50 $8.00 1M Calculate →
GPT-5.4 OpenAI $2.50 $0.25 $15.00 128K Calculate →
Claude Sonnet 4.6 Anthropic $3.00 $0.30 $15.00 1M Calculate →
Claude Opus 4.8 Anthropic $5.00 $0.50 $25.00 1M Calculate →
GPT-5.5 OpenAI $5.00 $0.50 $30.00 256K Calculate →
Claude Opus 4.7 Anthropic $5.00 $0.50 $25.00 1M Calculate →

Frequently asked questions

What is a token?

Tokens are the units LLMs use to process text. One token is roughly 4 characters or ¾ of an English word. “Hello world” is about 3 tokens. A 1,000-word document is roughly 1,300 tokens.

What’s the difference between input and output tokens?

Input tokens are everything you send to the model — your prompt, system instructions, retrieved documents. Output tokens are what the model generates back. Output is typically 3–5× more expensive per token because it’s computationally heavier.

What is prompt caching?

Prompt caching lets you reuse a fixed prefix (e.g. a long system prompt or knowledge base) across calls. Cached tokens cost 50–90% less than uncached. Both OpenAI and Anthropic support it. High cache hit rates dramatically reduce costs for document-heavy workflows.

What is the batch API?

Both OpenAI and Anthropic offer a batch API where jobs are processed asynchronously (within 24 hours) at a 50% discount. Ideal for background tasks, bulk evaluation, or overnight processing pipelines that don’t need real-time responses.

How do I estimate my monthly token usage?

Multiply: (monthly active users) × (workflows per user) × (model calls per workflow) × (avg tokens per call). Don’t forget retries, which typically add 5–15% overhead. Use the full calculator to model this with low/expected/high scenarios.

Why is the real cost higher than just tokens?

Token costs are rarely the biggest line item in a production AI system. Fixed infrastructure, embeddings, web search, and OCR often dominate. The full-stack calculator models all of these and shows you what’s actually driving cost.

GPT-5.4 vs Claude Sonnet 4.6 — which is cheaper?

GPT-5.4 input is marginally cheaper ($2.50 vs $3.00/M) and both charge $15/M for output. Cached rates are also close ($0.25 vs $0.30/M) — so for most workloads the price difference is small, and quality on your specific task should decide. Note Anthropic adds a one-off 1.25× surcharge when writing the cache; OpenAI doesn’t.

When should I use a mini / Haiku model?

For classification, routing, short-form generation, or any task where a smaller model gets the job done. GPT-4o mini and Claude Haiku 4.5 are 5–30× cheaper than their flagship counterparts. Run a quality eval on your specific task before assuming you need the bigger model.

Token price is just the start.

Model all your AI costs — search, embeddings, infrastructure — and find your real margin.

Open the full calculator