Cost modeling for AI features
The long-form walkthrough this template collapses into four lines of arithmetic.
We run this template before every product engagement. Plug in your own token counts, model mix, and traffic. You will learn more about your feature in 30 minutes than a week of speculation will teach you.
Four tables below. Table 1 is the price sheet you copy verbatim. Table 2 is the formula (four lines of arithmetic). Table 3 walks through three real-shape scenarios so you can sanity-check your own numbers against ours. Table 4 shows caching and batch ROI.
Every number is copy-friendly - hover, select, paste. Dollar values assume US dollars. Token prices are per million (1M) tokens.
| Model | Input / 1M | Cached read / 1M | Cache write / 1M | Output / 1M | Batch (50% off) |
|---|---|---|---|---|---|
| Claude Opus 4.7 highest quality |
$15.00 | $1.50 | $18.75 | $75.00 | yes |
| Claude Sonnet 4.6 workhorse |
$3.00 | $0.30 | $3.75 | $15.00 | yes |
| Claude Haiku 4.5 high volume |
$0.80 | $0.08 | $1.00 | $4.00 | yes |
Cache reads are 0.1x input price. Cache writes are 1.25x input. Break-even is roughly 2 calls against the same cache. Re-check prices at the Anthropic pricing page before quoting a client - we update this sheet quarterly.
Four lines. Memorize them.
cost_per_action = (input_tokens / 1,000,000) * input_price
+ (cached_tokens / 1,000,000) * cache_read_price
+ (cache_write_tokens / 1,000,000) * cache_write_price
+ (output_tokens / 1,000,000) * output_price
cost_per_month = cost_per_action * actions_per_user_per_month * monthly_active_users
cost_per_user = cost_per_month / monthly_active_users
margin_per_user = plan_price_per_user - cost_per_user - (fixed_costs / monthly_active_users)
Use P90 token counts, not the mean. Averages hide the user who sends a 40-page PDF. The P90 tells you what the 10th-worst user looks like, and the 90th percentile user is who blows up your margin, not the median.
Each row is a real-shape product feature on a $49/month SaaS plan. Token counts are observed from instrumented production traffic, not guesses.
Answers product questions from a documentation knowledge base. Hot path. Cached system prompt with the entire docs corpus.
| Input | Value | Math |
|---|---|---|
| Model | Sonnet 4.6 | |
| Cached system prompt (docs) | 22,000 tokens | 22000 / 1M * $0.30 = $0.0066 |
| Uncached user message + history | 600 tokens | 600 / 1M * $3.00 = $0.0018 |
| Output | 350 tokens | 350 / 1M * $15.00 = $0.00525 |
| Cost per conversation turn | $0.0137 | $0.0066 + $0.0018 + $0.00525 |
| Turns per user per month (P90) | 8 | |
| MAU | 1,200 | |
| Cost per month | $131.52 | $0.0137 * 8 * 1200 |
| Cost per user | $0.11 | 0.22% of the $49 plan |
Ships. Margin impact is negligible even if volume doubles. No escape hatches needed.
User uploads a PDF (contract, meeting notes, spec). Model returns a structured summary. No caching because every document is unique.
| Input | Value | Math |
|---|---|---|
| Model | Sonnet 4.6 | |
| Document tokens (P90) | 9,500 tokens | 9500 / 1M * $3.00 = $0.0285 |
| System prompt (uncached, short) | 400 tokens | 400 / 1M * $3.00 = $0.0012 |
| Output | 320 tokens | 320 / 1M * $15.00 = $0.0048 |
| Cost per summary | $0.0345 | $0.0285 + $0.0012 + $0.0048 |
| Summaries per user per month (P90) | 25 | |
| MAU | 1,200 | |
| Cost per month | $1,035.00 | $0.0345 * 25 * 1200 |
| Cost per user | $0.86 | 1.76% of the $49 plan |
Still ships - under 2% COGS. But worth watching P99 (some users upload 40-page docs). Add a 20-page hard limit or a pro tier with higher limits.
User pastes a company URL. Agent fetches site, pulls LinkedIn, drafts outreach angles. Multi-step tool use, Opus for the final synthesis.
| Input | Value | Math |
|---|---|---|
| Model (research calls) | Haiku 4.5 | |
| Model (synthesis) | Opus 4.7 | |
| Haiku calls: 4 tool rounds | ~18,000 in, 3,500 out | $0.0144 + $0.014 = $0.0284 |
| Opus synthesis: 1 call | ~12,000 in, 1,800 out | $0.18 + $0.135 = $0.315 |
| Cost per research | $0.343 | $0.0284 + $0.315 |
| Researches per user per month (P90) | 60 | |
| MAU | 1,200 | |
| Cost per month | $24,696 | $0.343 * 60 * 1200 |
| Cost per user | $20.58 | 42% of the $49 plan |
Does not ship as-is. Options: (a) move the synthesis to Sonnet and eat a quality delta, (b) charge $149 or meter usage above 30/month, (c) cache the company dossier for downstream calls. We usually pick (a) + (c) after a proper eval harness.
Cache writes cost 1.25x input. Cache reads cost 0.1x input. So caching pays off after roughly 2 calls against the same cache. For support bots, RAG systems, agents with large system prompts - always cache.
| Calls against cache | No cache | With cache | Savings |
|---|---|---|---|
| 1 (break-even miss) | $0.0660 | $0.0825 | -25% |
| 2 | $0.1320 | $0.0891 | +32% |
| 5 | $0.3300 | $0.1089 | +67% |
| 10 | $0.6600 | $0.1419 | +78% |
| 50 | $3.3000 | $0.4059 | +88% |
Assumes 22,000-token cached block (our Scenario A system prompt), 600-token user input, 350-token output, on Sonnet 4.6. The cache TTL is 5 minutes by default - extend to 1 hour with the 2x multiplier if your hit rate benefits.
The batch API is 50% off list price and completes within 24 hours. Use it for anything that does not need to stream to a user in real time: nightly enrichment, content generation pipelines, backfills, eval runs, weekly reports.
| Use case | Real-time | Batch | Latency OK? |
|---|---|---|---|
| User chat turn | required | no | < 2s |
| Doc summary (user waiting) | required | no | < 10s |
| Nightly CRM enrichment | wasteful | half price | overnight |
| Weekly newsletter drafts | wasteful | half price | weekly |
| Eval harness regression run | wasteful | half price | nightly |
| Backfill of old content | wasteful | half price | over days |
The long-form walkthrough this template collapses into four lines of arithmetic.
Before you optimize cost, build the eval. Never trade quality you can not measure for pennies you can not recover.
We run this template with you in week 1 of every product engagement. No fluff.
We will do the cost model with you in a paid 90-minute working session. You leave with the spreadsheet, the eval plan, and a go/no-go.