GPT-5.4
Current mainstream frontier. Best OpenAI model for production workloads — reasoning, vision, and multimodal tasks.
Input, output, and cached token rates for every major LLM. Sourced from official provider documentation, May 2026.
Current mainstream frontier. Best OpenAI model for production workloads — reasoning, vision, and multimodal tasks.
Frontier-level intelligence at mid-tier cost. Replaces GPT-4o as the go-to balanced option for most production use cases.
Still the cheapest OpenAI option per token. Ideal for high-volume classification, summarisation, or simple routing tasks.
Best choice for document-heavy workflows. 1M token context window for long codebases, contracts, or multi-document retrieval.
1M context at budget price. Great for batch processing large documents where you don't need frontier quality.
Most capable Claude model. Step-change improvement in agentic coding over Opus 4.6. For complex reasoning and autonomous agents.
Best balance of speed and intelligence. Production-ready for coding, analysis, and complex reasoning. Extended thinking available.
Fastest Claude with near-frontier intelligence. Built for customer-facing apps where latency and cost per call matter most.
All models ranked by input token price. May 2026 — verify at provider pages before committing to a budget.
| Model | Input / 1M | Cached / 1M | Output / 1M | Context | |
|---|---|---|---|---|---|
| GPT-4o mini OpenAI | $0.15 | $0.075 | $0.60 | 128K | Calculate → |
| GPT-4.1 mini OpenAI | $0.40 | $0.10 | $1.60 | 1M | Calculate → |
| GPT-5.4 mini OpenAI | $0.75 | $0.375 | $4.50 | 128K | Calculate → |
| Claude Haiku 4.5 Anthropic | $1.00 | $0.10 | $5.00 | 200K | Calculate → |
| GPT-4.1 OpenAI | $2.00 | $0.50 | $8.00 | 1M | Calculate → |
| GPT-5.4 OpenAI | $2.50 | $1.25 | $15.00 | 128K | Calculate → |
| Claude Sonnet 4.6 Anthropic | $3.00 | $0.30 | $15.00 | 1M | Calculate → |
| Claude Opus 4.7 Anthropic | $5.00 | $0.50 | $25.00 | 1M | Calculate → |
Tokens are the units LLMs use to process text. One token is roughly 4 characters or ¾ of an English word. "Hello world" is about 3 tokens. A 1,000-word document is roughly 1,300 tokens.
Input tokens are everything you send to the model — your prompt, system instructions, retrieved documents. Output tokens are what the model generates back. Output is typically 3–5× more expensive per token because it's computationally heavier.
Prompt caching lets you reuse a fixed prefix (e.g. a long system prompt or knowledge base) across calls. Cached tokens cost 50–90% less than uncached. Both OpenAI and Anthropic support it. High cache hit rates dramatically reduce costs for document-heavy workflows.
Both OpenAI and Anthropic offer a batch API where jobs are processed asynchronously (within 24 hours) at a 50% discount. Ideal for background tasks, bulk evaluation, or overnight processing pipelines that don't need real-time responses.
Multiply: (monthly active users) × (workflows per user) × (model calls per workflow) × (avg tokens per call). Don't forget retries, which typically add 5–15% overhead. Use the full calculator to model this with low/expected/high scenarios.
Token costs are rarely the biggest line item in a production AI system. Fixed infrastructure, embeddings, web search, and OCR often dominate. The full-stack calculator models all of these and shows you what's actually driving cost.
GPT-4o input is marginally cheaper ($2.50 vs $3.00/M). But Claude Sonnet 4 has a significantly lower cached rate ($0.30 vs $1.25/M) — so if your workflow has high cache hit rates (long system prompts, RAG pipelines), Sonnet 4 can be cheaper overall.
For classification, routing, short-form generation, or any task where a smaller model gets the job done. GPT-4o mini and Claude Haiku 3.5 are 10–20× cheaper than their flagship counterparts. Run a quality eval on your specific task before assuming you need the bigger model.