G8.1 Guide · Operations

← Operations guides

Prompt engineering for ops.

This is not a research paper on chain-of-thought. This is how operators write prompts that turn Claude into a reliable colleague. Short, structured, rebuildable, and boring in the best way.

Length: 26 min Audience: Operator / non-technical Last updated: 2026-04-19

The five-part prompt

Every prompt we run in production has the same five parts, in the same order. Once you see the pattern once, it is hard to unsee.

  1. Role. Who Claude is in this context.
  2. Task. What Claude is being asked to do. One sentence.
  3. Inputs. The data Claude has to work with.
  4. Constraints. What Claude must not do, what format the output must take, what edge cases to handle.
  5. Output format. Precisely how the response should be structured.

Miss any one of these and you have a prompt that will work sometimes and fail in ways you will not catch.

A real example: outbound research prompt

Here is a lightly anonymized prompt we run before writing cold outbound. It produces the 5-minute research pass described in the outbound guide.

<role>
You are a senior B2B sales researcher. You specialize in
finding the one or two specific, verifiable things about a
company and person that suggest they have the problem our
product solves, based only on public information.
</role>

<task>
Produce a research brief on this prospect for a cold outbound
message. The brief must help the sender write a message that
proves they did actual research, not a template blast.
</task>

<inputs>
- Company URL: {{company_url}}
- Company recent news (last 90 days): {{news_excerpts}}
- Person name + role: {{person_name}}, {{person_role}}
- Person public profile (LinkedIn summary + recent posts):
  {{profile_excerpts}}
- Our ICP definition: {{icp_definition}}
- Our product positioning: {{positioning}}
</inputs>

<constraints>
- Only use information present in the inputs. Do not invent.
- If the inputs do not support a claim, say "insufficient
  information" for that line.
- Do not speculate about unannounced products or personal life.
- Flag if the prospect is a poor fit (wrong segment, wrong role,
  wrong stage) and recommend dropping.
</constraints>

<output_format>
Return JSON with these keys:
- company_signal: one sentence about the company, with source.
- person_signal: one sentence about the person, with source.
- timely_hook: a specific recent event (news, post, hire)
  that makes now the time to reach out, with source. "none" if
  none found.
- fit_assessment: "strong" | "medium" | "weak" | "wrong_fit"
- recommended_drop: boolean. True if fit is wrong_fit.
- reasoning: two sentences explaining the fit_assessment.
</output_format>

Notice what is here and what is not. There is a role. There is a task. The inputs are all passed in as variables, not baked in. The constraints are explicit (no invention, no speculation). The output format is a JSON schema Claude can be evaluated against.

What is missing on purpose: no fluffy “you are a world-class expert” preamble, no “think step by step” (the task structure forces it), no list of 14 things to do (five is enough).

Why XML tags

Claude is trained to respect XML-tagged structure. It helps the model distinguish instructions from data, and it helps you extract pieces programmatically. We use XML tags for role, task, inputs, constraints, output_format, examples, context.

Alternatives work (markdown headings, triple-backtick blocks, plain labels), but XML tags are the most robust across model versions.

The anti-patterns

Over-instructed prompts

A prompt with 40 bullet points of “remember to” is harder to execute than a prompt with 5. Long instructions cause the model to drop individual items. If a rule matters, say it once, clearly, in the right section. Do not repeat.

Baking data into the prompt

A prompt that says “here is our ICP: (300 words of copy)” is hard to maintain and kills caching. Pull static data into the system prompt where it can be cached. Pull dynamic data into explicit input tags. Never mix them.

The hidden expectation

If your prompt says “summarize this document” and you expect “a 200-word executive summary in bullet points with three headers,” you need to say that. Claude is not a mind reader. Say the format you want.

The example-less prompt

For anything non-trivial, one or two examples of the output format you want beats a paragraph of description. Show a short input and the corresponding ideal output. Then ask for the real task.

The unversioned prompt

Prompts are code. They need version control. A prompt that lives in a Google Doc and is copy-pasted into scripts is a ticking outage. Check prompts into git. Tag versions. Associate eval results with specific versions.

Second example: content research brief

<role>
You are an editorial research assistant for an operator-focused
publication. You help writers prepare to write by finding the
specific claims, numbers, and examples the piece will need.
</role>

<task>
Given a content brief, produce a research kit the writer can
use to draft a piece without needing to hunt down facts mid-draft.
</task>

<inputs>
- Brief: {{brief}}
- Existing related content from this publication: {{our_content}}
- Topical map slot (pillar + cluster + sibling pages):
  {{topical_slot}}
</inputs>

<constraints>
- Do not draft the piece. Do not produce a full outline.
- Only produce the research kit.
- Flag any claim in the brief that needs a citation we do not
  have.
- Do not use statistics without a specific source and date.
- Prefer first-party data over secondary analysis when both are
  available.
</constraints>

<output_format>
- key_claims: list of 5-8 claims the piece should make, each
  with the source that supports it.
- gaps: list of claims that need primary evidence we do not yet
  have, with suggested source types.
- quotable_numbers: list of 3-5 specific numbers that could
  anchor the piece, with sources.
- internal_links: list of sibling pages in the topical map that
  should be linked from this piece.
- risks: anything the writer should verify before publishing.
</output_format>

Same structure. Different task. Same five parts.

Iterating prompts

You will iterate. The first version of any prompt is a draft. Make it work on 3 examples. Then run it on 20. Then run it on 100 with an eval harness (see the evals primer).

When iterating:

  • Change one thing at a time. If you change 4 things, you do not know which one helped.
  • Keep the old version. Version the prompt file. A regression is a 5-minute git revert away.
  • Add examples before you add instructions. Examples teach faster than rules.
  • Tighten the output format before tightening the task. Most “Claude gave the wrong answer” bugs are “Claude gave the right answer in the wrong format and we failed to parse it.”

Prompt libraries beat prompt heroes

The teams that succeed with Claude maintain a small library of good prompts, versioned and shared, not a list of heroes who happen to know the magic incantations. A prompt in a shared repo with a README, evals, and a version tag is infrastructure. A prompt in one person's notebook is a liability.

Minimum viable prompt library:

  • One file per prompt. Name it for the task (outbound_research.xml, content_research.xml, pentest_finding_draft.xml).
  • A README at the root explaining when to use each.
  • A small eval set per prompt (5 to 20 examples).
  • A version tag (v1, v2, v3) in the file name or git tag, so older versions remain runnable.

When to talk to us

We set up prompt libraries as part of product-development engagements. If you are trying to move from ad-hoc prompt copy-paste to a real production pipeline, start a conversation.

Related guideMedium · 28 min

Evals: a primer

Good prompts need evals. This is how we build them.