S9 Sample · Cost model template

Cost model you can paste.

We run this template before every product engagement. Plug in your own token counts, model mix, and traffic. You will learn more about your feature in 30 minutes than a week of speculation will teach you.

Use: copy into a spreadsheet Prices: 2026-04 snapshot Last updated: 2026-04-19

How to use this template

Four tables below. Table 1 is the price sheet you copy verbatim. Table 2 is the formula (four lines of arithmetic). Table 3 walks through three real-shape scenarios so you can sanity-check your own numbers against ours. Table 4 shows caching and batch ROI.

Every number is copy-friendly - hover, select, paste. Dollar values assume US dollars. Token prices are per million (1M) tokens.

1. Price sheet (2026-04 snapshot)

Model	Input / 1M	Cached read / 1M	Cache write / 1M	Output / 1M	Batch (50% off)
Claude Opus 4.7 highest quality	$15.00	$1.50	$18.75	$75.00	yes
Claude Sonnet 4.6 workhorse	$3.00	$0.30	$3.75	$15.00	yes
Claude Haiku 4.5 high volume	$0.80	$0.08	$1.00	$4.00	yes

Cache reads are 0.1x input price. Cache writes are 1.25x input. Break-even is roughly 2 calls against the same cache. Re-check prices at the Anthropic pricing page before quoting a client - we update this sheet quarterly.

2. The formula

Four lines. Memorize them.

cost_per_action = (input_tokens / 1,000,000) * input_price
                + (cached_tokens / 1,000,000) * cache_read_price
                + (cache_write_tokens / 1,000,000) * cache_write_price
                + (output_tokens / 1,000,000) * output_price

cost_per_month  = cost_per_action * actions_per_user_per_month * monthly_active_users

cost_per_user   = cost_per_month / monthly_active_users

margin_per_user = plan_price_per_user - cost_per_user - (fixed_costs / monthly_active_users)

Use P90 token counts, not the mean. Averages hide the user who sends a 40-page PDF. The P90 tells you what the 10th-worst user looks like, and the 90th percentile user is who blows up your margin, not the median.

3. Three scenarios

Each row is a real-shape product feature on a $49/month SaaS plan. Token counts are observed from instrumented production traffic, not guesses.

Scenario A: Customer support bot

Answers product questions from a documentation knowledge base. Hot path. Cached system prompt with the entire docs corpus.

Input	Value	Math
Model	Sonnet 4.6
Cached system prompt (docs)	22,000 tokens	22000 / 1M * $0.30 = $0.0066
Uncached user message + history	600 tokens	600 / 1M * $3.00 = $0.0018
Output	350 tokens	350 / 1M * $15.00 = $0.00525
Cost per conversation turn	$0.0137	$0.0066 + $0.0018 + $0.00525
Turns per user per month (P90)	8
MAU	1,200
Cost per month	$131.52	$0.0137 * 8 * 1200
Cost per user	$0.11	0.22% of the $49 plan

Ships. Margin impact is negligible even if volume doubles. No escape hatches needed.

Scenario B: Document summarizer

User uploads a PDF (contract, meeting notes, spec). Model returns a structured summary. No caching because every document is unique.

Input	Value	Math
Model	Sonnet 4.6
Document tokens (P90)	9,500 tokens	9500 / 1M * $3.00 = $0.0285
System prompt (uncached, short)	400 tokens	400 / 1M * $3.00 = $0.0012
Output	320 tokens	320 / 1M * $15.00 = $0.0048
Cost per summary	$0.0345	$0.0285 + $0.0012 + $0.0048
Summaries per user per month (P90)	25
MAU	1,200
Cost per month	$1,035.00	$0.0345 * 25 * 1200
Cost per user	$0.86	1.76% of the $49 plan

Still ships - under 2% COGS. But worth watching P99 (some users upload 40-page docs). Add a 20-page hard limit or a pro tier with higher limits.

Scenario C: Sales research agent

User pastes a company URL. Agent fetches site, pulls LinkedIn, drafts outreach angles. Multi-step tool use, Opus for the final synthesis.

Input	Value	Math
Model (research calls)	Haiku 4.5
Model (synthesis)	Opus 4.7
Haiku calls: 4 tool rounds	~18,000 in, 3,500 out	$0.0144 + $0.014 = $0.0284
Opus synthesis: 1 call	~12,000 in, 1,800 out	$0.18 + $0.135 = $0.315
Cost per research	$0.343	$0.0284 + $0.315
Researches per user per month (P90)	60
MAU	1,200
Cost per month	$24,696	$0.343 * 60 * 1200
Cost per user	$20.58	42% of the $49 plan

Does not ship as-is. Options: (a) move the synthesis to Sonnet and eat a quality delta, (b) charge $149 or meter usage above 30/month, (c) cache the company dossier for downstream calls. We usually pick (a) + (c) after a proper eval harness.

4. Caching and batch ROI

4a. Caching break-even

Cache writes cost 1.25x input. Cache reads cost 0.1x input. So caching pays off after roughly 2 calls against the same cache. For support bots, RAG systems, agents with large system prompts - always cache.

Calls against cache	No cache	With cache	Savings
1 (break-even miss)	$0.0660	$0.0825	-25%
2	$0.1320	$0.0891	+32%
5	$0.3300	$0.1089	+67%
10	$0.6600	$0.1419	+78%
50	$3.3000	$0.4059	+88%

Assumes 22,000-token cached block (our Scenario A system prompt), 600-token user input, 350-token output, on Sonnet 4.6. The cache TTL is 5 minutes by default - extend to 1 hour with the 2x multiplier if your hit rate benefits.

4b. Batch API ROI

The batch API is 50% off list price and completes within 24 hours. Use it for anything that does not need to stream to a user in real time: nightly enrichment, content generation pipelines, backfills, eval runs, weekly reports.

Use case	Real-time	Batch	Latency OK?
User chat turn	required	no	< 2s
Doc summary (user waiting)	required	no	< 10s
Nightly CRM enrichment	wasteful	half price	overnight
Weekly newsletter drafts	wasteful	half price	weekly
Eval harness regression run	wasteful	half price	nightly
Backfill of old content	wasteful	half price	over days

Common mistakes we catch

Averaging tokens. Use the P90. A handful of heavy users set your cost floor.
Forgetting retries. A 1% retry rate on Opus is not free. Multiply expected cost by your retry rate plus one.
Pricing the demo, not the product. Hello-world prompts are 200 tokens. Real production prompts with few-shot examples, tool schemas, and retrieved context are 5,000 to 50,000.
Ignoring output. Output is 5x input price on every Anthropic model. A chatty system prompt that tells the model to "explain your reasoning in detail" is a tax on every call.
Ignoring concurrency limits. Rate limits are a real ceiling. Factor in parallelism before quoting a latency SLO.
One model for everything. Haiku for classification, Sonnet for reasoning, Opus for the hardest 5%. Mix the tiers.

Guide28 min

Want us to run this for your feature?

We will do the cost model with you in a paid 90-minute working session. You leave with the spreadsheet, the eval plan, and a go/no-go.

Start a project →