Inputs

Project assumptions

Quick setup

Five inputs for a working estimate. Presets fill the rest.

AI system type Primary model Monthly active users Workflows per user / month Revenue per user / month ($)

Tokens, OCR, review rates, pricing overrides

Demand detail

Fine-tune how many calls and retries each workflow generates.

Model calls per workflow Retry overhead %

Core LLM workload

Per-call token assumptions and optimization levers.

Input tokens per call Output tokens per call Cache-hit share % Batch-processed share %

Tools and extras Optional

Add the workflow pieces beyond token inference. Set any field to 0 to exclude it.

Web searches per workflow Times the AI calls a web search tool (e.g. Bing, Brave, Tavily) per run. Set to 0 if no search tool used. Search content tokens / search Tokens from search results injected into the prompt per search call. Typically 500–3,000. Embedding tokens / month Total tokens sent to an embedding model per month (for RAG / semantic search). Set to 0 if no embeddings used. OCR pages per workflow Pages processed by an OCR service (e.g. AWS Textract, Google Document AI) per workflow run. Set to 0 if no document scanning.

Team and operating costs Optional

Infrastructure and overhead costs. Set any field to 0 to exclude it — token-only pipelines can skip this section entirely.

What is fixed infra?

Infrastructure costs that stay roughly the same regardless of how many requests you handle — hosting, monitoring (e.g. Datadog, Sentry), auth services, CI/CD pipelines. This is your actual monthly bill, not a calculated estimate. Enter your own number or set to 0 if unknown.

Fixed infra / month ($) Monthly cost of hosting, monitoring, auth, CI/CD — things that don't scale with usage. Set to 0 if unknown. Vector / storage / month ($) Pinecone, Weaviate, pgvector hosting, or similar. Set to 0 if not using a vector database. Contingency buffer % Safety margin for unexpected usage spikes. 10–20% is typical. Target gross margin % Used to calculate the minimum price you'd need to charge per user. Set to 0 to skip pricing analysis.

Pricing overrides

Model preset fills these. Override for custom plans or different architectures.

Model input $ / 1M tokens Cached input $ / 1M tokens Model output $ / 1M tokens Web search $ / 1K calls Search content $ / 1M tokens Embeddings $ / 1M tokens OCR $ / page Batch discount on core LLM %

Caching applies to cache-hit share of input tokens only. Batch discount applies to core model inference only.

Output

Cost estimate

Model preset

API / usage costs calculated from published rates × your volume estimates

API cost / month

Tokens + OCR + embeddings + search

API cost per workflow

Cost per request / run at expected volume

Full stack estimate includes your operating inputs — not calculated

+ Operating overhead / month

Infra + storage — your estimates

Total estimate / month

API cost + operating overhead

Cost per active user

API + infra ÷ users (excl. contingency)

Price for target gross margin

Based on variable API costs only

Yearly run-rate

Monthly × 12 — flat rate, no price decay

Scenario range

P10 / P50 / P90 — volume multipliers only, rates are fixed. Distribution is right-skewed: P90 is further from median than P10.

P10 $0 lean adoption

P50 $0 median case

P90 $0 heavy usage

Profitability

Based on your revenue per user assumption.

Monthly revenue

Revenue per user × active users

Monthly profit

Revenue minus total monthly cost

Gross margin

—

Net profit as % of revenue

Break-even users

—

Users needed to cover costs

Workload summary

Monthly volume your architecture needs to absorb.

Workflows / month 0

Model calls / month 0

Input tokens / month 0

Output tokens / month 0

Search calls / month 0

OCR pages / month 0

Cost breakdown

Which components dominate the estimate.

What's driving cost

Where to optimise first.

Pricing references

Sources for the editable defaults in this calculator.