S9 Sample · Cost model template

← Samples

Cost model you can paste.

We run this template before every product engagement. Plug in your own token counts, model mix, and traffic. You will learn more about your feature in 30 minutes than a week of speculation will teach you.

Use: copy into a spreadsheet Prices: 2026-04 snapshot Last updated: 2026-04-19

How to use this template

Four tables below. Table 1 is the price sheet you copy verbatim. Table 2 is the formula (four lines of arithmetic). Table 3 walks through three real-shape scenarios so you can sanity-check your own numbers against ours. Table 4 shows caching and batch ROI.

Every number is copy-friendly - hover, select, paste. Dollar values assume US dollars. Token prices are per million (1M) tokens.

1. Price sheet (2026-04 snapshot)

Model Input / 1M Cached read / 1M Cache write / 1M Output / 1M Batch (50% off)
Claude Opus 4.7
highest quality
$15.00 $1.50 $18.75 $75.00 yes
Claude Sonnet 4.6
workhorse
$3.00 $0.30 $3.75 $15.00 yes
Claude Haiku 4.5
high volume
$0.80 $0.08 $1.00 $4.00 yes

Cache reads are 0.1x input price. Cache writes are 1.25x input. Break-even is roughly 2 calls against the same cache. Re-check prices at the Anthropic pricing page before quoting a client - we update this sheet quarterly.

2. The formula

Four lines. Memorize them.

cost_per_action = (input_tokens / 1,000,000) * input_price
                + (cached_tokens / 1,000,000) * cache_read_price
                + (cache_write_tokens / 1,000,000) * cache_write_price
                + (output_tokens / 1,000,000) * output_price

cost_per_month  = cost_per_action * actions_per_user_per_month * monthly_active_users

cost_per_user   = cost_per_month / monthly_active_users

margin_per_user = plan_price_per_user - cost_per_user - (fixed_costs / monthly_active_users)

Use P90 token counts, not the mean. Averages hide the user who sends a 40-page PDF. The P90 tells you what the 10th-worst user looks like, and the 90th percentile user is who blows up your margin, not the median.

3. Three scenarios

Each row is a real-shape product feature on a $49/month SaaS plan. Token counts are observed from instrumented production traffic, not guesses.

Scenario A: Customer support bot

Answers product questions from a documentation knowledge base. Hot path. Cached system prompt with the entire docs corpus.

Input Value Math
ModelSonnet 4.6
Cached system prompt (docs)22,000 tokens22000 / 1M * $0.30 = $0.0066
Uncached user message + history600 tokens600 / 1M * $3.00 = $0.0018
Output350 tokens350 / 1M * $15.00 = $0.00525
Cost per conversation turn$0.0137$0.0066 + $0.0018 + $0.00525
Turns per user per month (P90)8
MAU1,200
Cost per month$131.52$0.0137 * 8 * 1200
Cost per user$0.110.22% of the $49 plan

Ships. Margin impact is negligible even if volume doubles. No escape hatches needed.

Scenario B: Document summarizer

User uploads a PDF (contract, meeting notes, spec). Model returns a structured summary. No caching because every document is unique.

Input Value Math
ModelSonnet 4.6
Document tokens (P90)9,500 tokens9500 / 1M * $3.00 = $0.0285
System prompt (uncached, short)400 tokens400 / 1M * $3.00 = $0.0012
Output320 tokens320 / 1M * $15.00 = $0.0048
Cost per summary$0.0345$0.0285 + $0.0012 + $0.0048
Summaries per user per month (P90)25
MAU1,200
Cost per month$1,035.00$0.0345 * 25 * 1200
Cost per user$0.861.76% of the $49 plan

Still ships - under 2% COGS. But worth watching P99 (some users upload 40-page docs). Add a 20-page hard limit or a pro tier with higher limits.

Scenario C: Sales research agent

User pastes a company URL. Agent fetches site, pulls LinkedIn, drafts outreach angles. Multi-step tool use, Opus for the final synthesis.

Input Value Math
Model (research calls)Haiku 4.5
Model (synthesis)Opus 4.7
Haiku calls: 4 tool rounds~18,000 in, 3,500 out$0.0144 + $0.014 = $0.0284
Opus synthesis: 1 call~12,000 in, 1,800 out$0.18 + $0.135 = $0.315
Cost per research$0.343$0.0284 + $0.315
Researches per user per month (P90)60
MAU1,200
Cost per month$24,696$0.343 * 60 * 1200
Cost per user$20.5842% of the $49 plan

Does not ship as-is. Options: (a) move the synthesis to Sonnet and eat a quality delta, (b) charge $149 or meter usage above 30/month, (c) cache the company dossier for downstream calls. We usually pick (a) + (c) after a proper eval harness.

4. Caching and batch ROI

4a. Caching break-even

Cache writes cost 1.25x input. Cache reads cost 0.1x input. So caching pays off after roughly 2 calls against the same cache. For support bots, RAG systems, agents with large system prompts - always cache.

Calls against cache No cache With cache Savings
1 (break-even miss)$0.0660$0.0825-25%
2$0.1320$0.0891+32%
5$0.3300$0.1089+67%
10$0.6600$0.1419+78%
50$3.3000$0.4059+88%

Assumes 22,000-token cached block (our Scenario A system prompt), 600-token user input, 350-token output, on Sonnet 4.6. The cache TTL is 5 minutes by default - extend to 1 hour with the 2x multiplier if your hit rate benefits.

4b. Batch API ROI

The batch API is 50% off list price and completes within 24 hours. Use it for anything that does not need to stream to a user in real time: nightly enrichment, content generation pipelines, backfills, eval runs, weekly reports.

Use case Real-time Batch Latency OK?
User chat turnrequiredno< 2s
Doc summary (user waiting)requiredno< 10s
Nightly CRM enrichmentwastefulhalf priceovernight
Weekly newsletter draftswastefulhalf priceweekly
Eval harness regression runwastefulhalf pricenightly
Backfill of old contentwastefulhalf priceover days

Common mistakes we catch

  • Averaging tokens. Use the P90. A handful of heavy users set your cost floor.
  • Forgetting retries. A 1% retry rate on Opus is not free. Multiply expected cost by your retry rate plus one.
  • Pricing the demo, not the product. Hello-world prompts are 200 tokens. Real production prompts with few-shot examples, tool schemas, and retrieved context are 5,000 to 50,000.
  • Ignoring output. Output is 5x input price on every Anthropic model. A chatty system prompt that tells the model to "explain your reasoning in detail" is a tax on every call.
  • Ignoring concurrency limits. Rate limits are a real ceiling. Factor in parallelism before quoting a latency SLO.
  • One model for everything. Haiku for classification, Sonnet for reasoning, Opus for the hardest 5%. Mix the tiers.

Related

SampleDashboard

Eval harness example

Before you optimize cost, build the eval. Never trade quality you can not measure for pennies you can not recover.

ServiceEngagement

Product development

We run this template with you in week 1 of every product engagement. No fluff.

Want us to run this for your feature?

We will do the cost model with you in a paid 90-minute working session. You leave with the spreadsheet, the eval plan, and a go/no-go.

Start a project