E2 Essay · Security

The pentest report is a literary form.

We have read three hundred pentest reports in the last two years. Most of them are unreadable. A surprising number of them are almost unreadable by design. Here is the taxonomy, and here is what Claude specifically changes about the form.

Security 22 min read 2026-04-03 by the operator drafting assisted by Claude
Corrections log: none yet. If you find a factual error, email hello@nexcur.ai and we will log it here, dated.
1 A report is a document a human will read

A report is a document a human will read.

Start with this sentence, because it is the sentence most pentest reports have forgotten.

The average pentest report in 2026 is written as if nobody will read it. The sentences are a sequence of nouns separated by commas. The findings are copy-pasted from CVSS boilerplate. The screenshots are annotated in red arrows that land nowhere. The executive summary is four bullet points, one of which says "Additional findings in body of report." The body of the report is eighty pages of tables.

The client pays twenty-five thousand dollars for this document. The document is filed. The findings that the document names are fixed within sixty days at a rate of about fifty percent. The findings that the document buries are not fixed at all.

The theory of the pentest report, if a pentest vendor were asked to articulate it, is something like: the testers produce a formal record of findings, the client remediates the findings, the record exists for regulatory and insurance purposes, and for the client's own internal program. All of this is true. But the theory leaves out the part where a human being has to open the document and understand what it says.

A pentest report is a literary form. It has an audience. It has a purpose. It has conventions that emerged for reasons, and conventions that emerged for no reason and became tradition. The reports that actually change behavior in client organizations are the ones that remember they are documents, written for people, intended to provoke action. The reports that do not change behavior are the ones that were written as compliance artifacts first and documents second.

This essay is about closing the gap between those two.

2 Six ways pentest reports fail

Six ways pentest reports fail.

We have a taxonomy. It is not exhaustive but it covers about 90% of what we have seen.

2.1 The CVSS-is-the-analysis failure.

The report cites a CVSS base score for every finding and considers the analysis done. CVSS is a reasonable first-cut signal. It is not analysis. A "high" in CVSS can be unexploitable in context. A "medium" in CVSS can be a breach waiting to happen in a specific architecture. A report that lists CVSS scores without contextualizing them is a spreadsheet pretending to be a document.

2.2 The findings-without-narrative failure.

Each finding is a self-contained card with no reference to any other finding. In reality, findings are rarely independent. A weak IAM policy plus a privileged service account plus a misconfigured S3 bucket is not three findings - it is one attack path. Reports that present findings as atomic lose the narrative, and the reader loses the ability to prioritize.

2.3 The passive-voice failure.

"The application was found to permit unauthenticated access to the /admin endpoint." Who found it? What did they do? What happened next? Passive voice is a safety blanket. It sounds professional and it says nothing. The good report says "We sent a GET to /admin without any cookies or auth headers. The server returned the admin dashboard. We captured the request in reports/admin-open.har."

2.4 The remediation-as-ticket failure.

"Implement proper access controls." "Enforce input validation." "Apply defense in depth." These are not remediation steps. These are Jira ticket titles. The remediation paragraph should be an engineer talking to another engineer about how to fix the thing. "Add a middleware at the router level that rejects any request to /admin/* without a valid session cookie. Here is a seven-line Express snippet you can paste. Here is the corresponding test you should add to confirm the fix."

2.5 The screenshots-as-proof failure.

A screenshot of a Burp Suite pane is not reproducible evidence. It is a picture of reproducible evidence. Good reports include the actual request, the actual response, and the actual command that produced them. Screenshots are supplements, not primary sources.

2.6 The executive-summary-as-abstract failure.

Many executive summaries are just abstracts - two paragraphs that restate the scope and say "14 findings were identified, including 2 critical, 4 high, 5 medium, and 3 low." This is the least informative possible summary. It tells the CEO nothing about what happened, what is at stake, or what to do next. The executive summary should be the part of the report you would be willing to stake the engagement on. Most are the part you would be least willing.

3 What the shape of a good report looks like

What the shape of a good report looks like.

A good pentest report has four major sections and one minor one.

The minor one is the cover page, which carries the metadata - client, scope, dates, testers, authorship, version - and the four major sections are:

Executive summary. Eight to fifteen pages. Readable by a CEO with no security background in twenty minutes. Explains what happened, what is at risk, what needs to happen next, at what cost, and within what timeline. Does not hide behind severity tables.

Attack paths. Ten to thirty pages. This is the narrative section. Each attack path is a numbered story: starting position, observed behaviors, lateral moves, escalation, end state. Attack paths reference findings by ID but present them in context. A path may include five findings. A finding may appear in three paths.

Findings. Thirty to eighty pages depending on scope. One page per finding. Each finding has a stable ID, a title, severity (with justification), a description, reproduction steps, evidence references, remediation, and cross-references to attack paths. Findings are the reference section. Nobody reads them front-to-back.

Appendices. Raw artifacts - request/response dumps, exploit code (sanitized), scanner outputs, tooling notes, scope exceptions. This is where the reproducibility lives.

Four sections. A table of contents that reads like a table of contents, not a spreadsheet. A font size that does not require squinting. Margins that accommodate a reader making notes. This is a document, not a submission.

4 What Claude actually contributes

What Claude actually contributes.

People ask the wrong question here. The question is not "can Claude write a pentest report." The question is "what does a pentest report look like when writing is no longer the expensive part."

Claude's contribution to the pentest report, in our pipeline, is not in the finding of vulnerabilities. A model of any kind is bad at novel exploitation and we do not ask it to be good at it. Our human operators find the vulnerabilities. They write the initial finding stubs. They verify the exploits. Claude's contribution begins after the testing phase closes.1

Specifically:

Claude drafts the narrative sections - the attack paths, the executive summary, the cross-reference prose - from a structured finding database that the human testers maintain during the engagement. The testers provide a per-finding JSON blob with title, evidence, command, response, and context. Claude turns a cluster of findings into an attack path that reads like an incident report. This takes the testers about forty minutes to review per path, rather than the four to six hours it used to take to write one.

Claude rewrites the remediation paragraphs into operator-grade prose. The tester submits a terse remediation line ("enforce session cookie on /admin"). Claude expands this into the full engineer-to-engineer remediation paragraph, referencing the actual framework the client uses (this is in the engagement corpus), suggesting a specific middleware location, writing a candidate code snippet, and naming the test case that would verify the fix. The tester reviews, edits where Claude was wrong, and ships.

Claude enforces voice. Every finding in every report comes through an eval harness that checks for passive voice, CVSS-as-analysis, remediation-as-ticket, and the other failure modes above. If a draft fails an eval, it goes back to the drafter with annotations. We built this because our human reviewers missed these failures too often when reading fast. Claude catches them first, and then the reviewer gets to spend their attention on what matters.

None of this replaces the testers. The testers spend their time doing what they are expensive for - finding bugs, validating exploits, understanding architecture. The documentation tax that used to eat half the engagement is gone. The report is better because the testers are less tired when they write it.

5 The finding as a unit of prose

The finding as a unit of prose.

If you read nothing else in this essay, read this section, because the finding is where 80% of the document's failures live.

A finding should answer seven questions in order, every time:

  1. What is the finding? One sentence, present tense, active voice. "The /admin endpoint is accessible without authentication."
  2. Where is it? The file, the URL, the service, the commit SHA. Specific.
  3. What is the impact? Not CVSS alone. In-context impact: what can an attacker do with this, in this environment, against this client.
  4. How do I reproduce it? A command or a sequence of actions. Copy-pastable.
  5. What is the evidence? A file reference, a request dump, a log line. Not a screenshot of the evidence.
  6. How do I fix it? Engineer-grade remediation with a suggested implementation.
  7. How do I verify the fix? A test, a command, a check. Without this, the finding is not closeable.

Seven questions. One page. Every finding. When a finding is missing one of these, our eval harness rejects the draft. When a finding answers all seven in fewer than seven sentences, we have a pattern we recognize as: the finding is not deep enough to justify its inclusion, or it is not really a finding, it is a symptom of a deeper finding somewhere else.

The template looks like this in practice - here is a sanitized finding from the Cloudwrit engagement, the one we published in the handbook sample:

F-07: Session cookie lacks SameSite attribute on /billing/*

Severity: Medium. Exploitable via CSRF in a chained attack but requires user interaction.
Where: src/server/middleware/session.ts:42
Reproduce: curl -b "sid=VALID" -X POST https://app.cloudwrit.io/billing/refund -d "amount=500"
Evidence: see reports/F-07/request.har and reports/F-07/session-dump.txt
Fix: Add SameSite=Strict to the cookie opts in session.ts:42. See reports/F-07/patch.diff for the two-line change we tested.
Verify: Test reports/F-07/verify.test.ts asserts the Set-Cookie header contains SameSite=Strict. Merge into the CI suite.

That is one finding, one page. A client engineer can open ticket SEC-F07, implement the change, run the verify test, close the ticket. Seven minutes of work after the document lands, assuming the client's router architecture matches what we described. If it does not match, the remediation paragraph would have said so, because the remediation paragraph was written with the client's code in context.

6 The evidence problem

The evidence problem.

Every bad pentest report has an evidence problem. The evidence is either missing, unreadable, or unreproducible.

The fix for the evidence problem is to treat the evidence as a first-class artifact, not a supplement. Every finding gets an evidence directory. Every evidence directory contains the raw request, the raw response, the command that produced them, the scope exception (if any), and the metadata about when the test ran. The report references the directory. The directory ships with the report as a zip file.

Clients occasionally push back on this. They do not want the raw exploit request in the evidence because if the report leaks, the exploit is documented. We have two answers. First, if the report leaks, you have bigger problems than a documented exploit - you have a breach of confidentiality. Second, the raw evidence does not contain the novel exploit; it contains the reproduction. The exploit is in the tester's head and the tester's notes. The report documents what the client needs to verify and close.

Screenshots have a role. They are useful for showing a rendered page, a visual indicator, an admin panel the attacker reached. They are not useful for showing a request. For a request, ship the request.

The evidence zip structure we now ship with every engagement:

evidence/
  F-01/
    request.har
    response.txt
    command.sh
    notes.md
  F-02/
    ...
  paths/
    path-01-admin-takeover/
      timeline.md
      requests/
      session-dump.txt
  tooling/
    burp-state.xml
    nuclei-output.json
    custom-scripts/
  scope.md
  engagement-timeline.md
          
7 The remediation paragraph is not a ticket

The remediation paragraph is not a ticket.

The remediation paragraph is where pentest reports most often earn their bad reputation. Most remediation paragraphs could be produced by reading the OWASP cheat sheet for the vulnerability class. That is not remediation. That is citation.

A remediation paragraph should do four things. It should name the specific location in the client's code (or config, or infra) where the fix belongs. It should describe the fix at a level of detail that an engineer can start implementing within five minutes of reading. It should note any dependencies or side effects the fix has on other parts of the system. And it should tell you how to verify that the fix is correct.

The reason most remediation paragraphs fail this test is that the testers did not read the client's code. They found a vulnerability, wrote a generic fix description, and moved on. At NexcurAI, the testers read the code. When we find a vulnerability in a router handler, we open that handler, trace its callers, understand its auth flow, and write the remediation with that specific architecture in view. This is extra work. It is the work that makes the report actually fixable.

Claude's contribution here is significant. The client codebase is in the engagement corpus. When the tester writes a remediation stub, Claude can expand it with the specific file path, the specific framework conventions the client uses, a candidate patch, and a candidate test. The tester reviews for correctness. About 70% of the Claude-expanded remediations are correct as drafted. The remaining 30% need tester edits, which is fine - the cost of a correction is ten minutes, and the cost of writing from scratch would have been forty.

The client engineer, on the receiving end, opens the remediation section and finds a paragraph that reads like a code review comment written by someone who knows their codebase. This is the effect we are after. A code review comment, not a compliance artifact.

8 Executive summary, done properly

Executive summary, done properly.

The executive summary is the part a CEO will read. Maybe a board will read. Probably an auditor will read. The body of the report - the technical findings - will be read by the security engineer and maybe one of the application engineers. The executive summary has to carry the engagement for everyone else.

A good executive summary has five parts, in this order:

What happened. Two pages. What we tested, how we tested, what the most important observation is. Not "we found 14 findings." The important observation, at human-readable fidelity. "We found that an attacker with a reused password could reach administrative billing functions in four steps, using no novel exploits, in under two hours."

What is at stake. One page. The business impact, stated in the language the business uses. Not "confidentiality, integrity, availability." Dollars. Customer trust. Regulatory exposure. Specific scenarios the exec can imagine.

What needs to happen. Two to three pages. The recommended remediation path, sequenced, with costs. Not a list of 14 fixes. A prioritized plan: what to fix in the first two weeks, the first quarter, the first year. What blocks what. Where the architectural changes are, and where the quick wins are.

What it will cost. Half a page. Engineering hours. Vendor costs. Downtime. Consulting follow-on. A real number, not a handwave.

What we did not test. Half a page. The scope exceptions, the known gaps, the follow-up engagements we would recommend. An honest section. The CEO should know what their next pentest should look at.

Six to eight pages, in prose. Readable in twenty minutes. Staked to the point where the testers would defend every sentence. This is the executive summary we ship. This is the executive summary very few pentest vendors ship.

9 The report as the engagement

Closing. The report is the engagement.

Here is the thesis this essay defends.

The pentest engagement is not the testing phase. The pentest engagement is the document that survives it. Everyone treats the testing as the main event and the report as the writeup. The client's experience is the opposite. The client experiences two weeks of testing (during which they barely hear from you) and then a report (which they live with for a year). The report is what the engagement was.

If the report is unreadable, the engagement was unreadable. If the report is vague, the testers were vague. If the remediation paragraphs are generic, the testers did not take the time to learn the code. The report is the evidence of the work.

Claude changes the economics of the report in ways we have described. It does not change the obligation to do the work. It lowers the cost of producing the document that demonstrates the work was done. The testers who were spending half their engagement writing are now spending a fifth of it, and they are writing better because they are less exhausted when they start.

We published a full fictional sample at /samples/pentest-report-example.html - a complete Cloudwrit pentest report in the form this essay describes, with a bad-vs-good annotated comparison of two findings. Read the good side. Then read a report from your last pentest vendor. Ask which one you would prefer to hand to your CEO.

We think you already know the answer.

Notes Citations
  1. We wrote a companion piece on the boundaries of what Claude can and cannot do in a security engagement: "Claude is not a pentester - here is what it actually is". Summary: Claude finds hypotheses, humans find exploits. Claude drafts documentation, humans validate facts. Back to text
E2.X Related work
Companion sample
A full pentest report in the form this essay defends
Fictional Cloudwrit engagement. Executive summary, attack paths, annotated bad-vs-good finding comparison.
Service line
Cybersecurity & pentesting
Every engagement ends with a Signature Security Handbook - not a deck, not a spreadsheet.
Related essay
The handbook thesis, defended
Why every engagement produces a handbook, and why the slide deck is about to die.
E2.S Subscribe

One essay a week. No filler.

Four pillars, one email every Tuesday. If we have nothing worth sending, we skip the week.