Working with the Claude API
The engineer's companion. API, caching, tool use, batch, observability.
This is not a research paper on chain-of-thought. This is how operators write prompts that turn Claude into a reliable colleague. Short, structured, rebuildable, and boring in the best way.
Every prompt we run in production has the same five parts, in the same order. Once you see the pattern once, it is hard to unsee.
Miss any one of these and you have a prompt that will work sometimes and fail in ways you will not catch.
Here is a lightly anonymized prompt we run before writing cold outbound. It produces the 5-minute research pass described in the outbound guide.
<role>
You are a senior B2B sales researcher. You specialize in
finding the one or two specific, verifiable things about a
company and person that suggest they have the problem our
product solves, based only on public information.
</role>
<task>
Produce a research brief on this prospect for a cold outbound
message. The brief must help the sender write a message that
proves they did actual research, not a template blast.
</task>
<inputs>
- Company URL: {{company_url}}
- Company recent news (last 90 days): {{news_excerpts}}
- Person name + role: {{person_name}}, {{person_role}}
- Person public profile (LinkedIn summary + recent posts):
{{profile_excerpts}}
- Our ICP definition: {{icp_definition}}
- Our product positioning: {{positioning}}
</inputs>
<constraints>
- Only use information present in the inputs. Do not invent.
- If the inputs do not support a claim, say "insufficient
information" for that line.
- Do not speculate about unannounced products or personal life.
- Flag if the prospect is a poor fit (wrong segment, wrong role,
wrong stage) and recommend dropping.
</constraints>
<output_format>
Return JSON with these keys:
- company_signal: one sentence about the company, with source.
- person_signal: one sentence about the person, with source.
- timely_hook: a specific recent event (news, post, hire)
that makes now the time to reach out, with source. "none" if
none found.
- fit_assessment: "strong" | "medium" | "weak" | "wrong_fit"
- recommended_drop: boolean. True if fit is wrong_fit.
- reasoning: two sentences explaining the fit_assessment.
</output_format>
Notice what is here and what is not. There is a role. There is a task. The inputs are all passed in as variables, not baked in. The constraints are explicit (no invention, no speculation). The output format is a JSON schema Claude can be evaluated against.
What is missing on purpose: no fluffy “you are a world-class expert” preamble, no “think step by step” (the task structure forces it), no list of 14 things to do (five is enough).
Claude is trained to respect XML-tagged structure. It helps the model distinguish instructions from data, and it helps you extract pieces programmatically. We use XML tags for role, task, inputs, constraints, output_format, examples, context.
Alternatives work (markdown headings, triple-backtick blocks, plain labels), but XML tags are the most robust across model versions.
A prompt with 40 bullet points of “remember to” is harder to execute than a prompt with 5. Long instructions cause the model to drop individual items. If a rule matters, say it once, clearly, in the right section. Do not repeat.
A prompt that says “here is our ICP: (300 words of copy)” is hard to maintain and kills caching. Pull static data into the system prompt where it can be cached. Pull dynamic data into explicit input tags. Never mix them.
If your prompt says “summarize this document” and you expect “a 200-word executive summary in bullet points with three headers,” you need to say that. Claude is not a mind reader. Say the format you want.
For anything non-trivial, one or two examples of the output format you want beats a paragraph of description. Show a short input and the corresponding ideal output. Then ask for the real task.
Prompts are code. They need version control. A prompt that lives in a Google Doc and is copy-pasted into scripts is a ticking outage. Check prompts into git. Tag versions. Associate eval results with specific versions.
<role>
You are an editorial research assistant for an operator-focused
publication. You help writers prepare to write by finding the
specific claims, numbers, and examples the piece will need.
</role>
<task>
Given a content brief, produce a research kit the writer can
use to draft a piece without needing to hunt down facts mid-draft.
</task>
<inputs>
- Brief: {{brief}}
- Existing related content from this publication: {{our_content}}
- Topical map slot (pillar + cluster + sibling pages):
{{topical_slot}}
</inputs>
<constraints>
- Do not draft the piece. Do not produce a full outline.
- Only produce the research kit.
- Flag any claim in the brief that needs a citation we do not
have.
- Do not use statistics without a specific source and date.
- Prefer first-party data over secondary analysis when both are
available.
</constraints>
<output_format>
- key_claims: list of 5-8 claims the piece should make, each
with the source that supports it.
- gaps: list of claims that need primary evidence we do not yet
have, with suggested source types.
- quotable_numbers: list of 3-5 specific numbers that could
anchor the piece, with sources.
- internal_links: list of sibling pages in the topical map that
should be linked from this piece.
- risks: anything the writer should verify before publishing.
</output_format>
Same structure. Different task. Same five parts.
You will iterate. The first version of any prompt is a draft. Make it work on 3 examples. Then run it on 20. Then run it on 100 with an eval harness (see the evals primer).
When iterating:
The teams that succeed with Claude maintain a small library of good prompts, versioned and shared, not a list of heroes who happen to know the magic incantations. A prompt in a shared repo with a README, evals, and a version tag is infrastructure. A prompt in one person's notebook is a liability.
Minimum viable prompt library:
We set up prompt libraries as part of product-development engagements. If you are trying to move from ad-hoc prompt copy-paste to a real production pipeline, start a conversation.
The engineer's companion. API, caching, tool use, batch, observability.
Good prompts need evals. This is how we build them.
The operating model, not just the prompts.