Your first pentest with AI in the loop

The pentest as it was, the pentest as it is

The classical pentest is a five-day black-box engagement: scanners run, a couple of senior testers poke at the results, and a PDF arrives two weeks later with findings graded critical / high / medium / low. It was always more art than process, and under time pressure the art part got cut.

The AI-native pentest is a different shape. Claude does not replace the senior tester. It replaces the bottleneck around coverage and writeup: the surfaces a classical tester did not have time to probe, the findings that were real but did not get documented because the writeup took too long. What you buy, when you buy a pentest from a firm like ours, is the senior tester's attention plus the coverage that used to be cut for time.

This guide walks through what that actually looks like, in enough detail that you can ask sharp questions in a sales call.

Scope: how to think about it

A pentest scope answers three questions. What are we testing? From what attacker perspective? What are we explicitly not testing?

What are we testing

External attack surface (the usual). Public web app, API endpoints, any exposed admin consoles, DNS, cert hygiene, subdomain takeover risk, publicly readable cloud storage.
Authenticated application surface. What can a logged-in tenant user do? What about a logged-in but unprivileged support-role user? Test both.
Cloud architecture review. IAM graph, service-to-service trust paths, privilege boundaries, secrets management. (This is where Claude earns the coverage.)
AI pipeline surface (if applicable). Prompt injection, context poisoning, output exfiltration. Most classical firms will skip this, either by habit or because they lack the muscle. If your product uses an LLM in a sensitive path, this is where the real findings live now.
Supply chain. Dependency freshness, CI/CD trust paths, secret leakage in builds.

From what attacker perspective

External unauthenticated attacker (the default).
Authenticated attacker with a low-privilege account. Also default at a growth tier.
Assumed-breach. We start with a foothold and see how far we get. Useful if you have a mature external surface and want to know about post-compromise blast radius.
Insider threat. A disgruntled engineer with repo access and some prod credentials. Rarely the first pentest you buy, but worth knowing the option exists.

What we are not testing

Every pentest has out-of-scope items. Ours is explicit about them:

Social engineering of your staff, unless separately scoped.
Denial-of-service or resource-exhaustion testing against production. Separate engagement, separate tooling.
Physical security. We do not send anyone to your office.
Third-party SaaS vendors you use. We can review configuration and privilege; we cannot test the vendor's infrastructure.

What Claude does, what we do

This is the most common question in the sales call. The answer, per phase:

Reconnaissance. Claude expands the surface map: subdomains, open ports, exposed endpoints, known patterns in your tech stack. A human operator reviews the map before any testing begins.
Vulnerability enumeration. Claude proposes candidate findings from scanner output and manual inspection notes. Every candidate is labeled with evidence. No finding ships to you without a human operator having validated the evidence independently.
Exploit validation. A human operator. No exception. Claude does not run exploits against your systems. Claude can describe an exploit path; we run it. We sign every exploited finding with the operator's name.
Writeup. Claude drafts the finding prose per the literary-form standard. The operator edits. No generic “industry best practices” filler; every paragraph refers to your system specifically.
Remediation guidance. Claude drafts the first version using a library of remediation patterns. The operator adapts to your stack and, where the fix is non-trivial, proposes three options (quick patch, proper fix, architectural redesign) with tradeoffs.

The proof of concept (PoC) process

Every critical and high severity finding ships with a PoC. The PoC is a reproducible set of steps that demonstrates the vulnerability. For web vulnerabilities, that is usually a curl command or a short script; for architectural vulnerabilities, a diagram or a test-environment demonstration.

We follow a strict disclosure protocol on the PoC:

We demonstrate in a test or staging environment whenever possible. Production-only reproduction is a last resort, with explicit written consent for each case.
We never exfiltrate real customer data to verify a vulnerability. If the vulnerability allows reading arbitrary tenant data, we prove it by planting a test record and reading it back.
We notify the client inside four hours of confirming a critical finding. Same day for high. The writeup follows; the notification does not wait for the writeup.
PoCs are redacted in any version of the report shared outside the immediate security function (enterprise-prospect version, SOC 2 auditor version).

The ninety-day re-test window

Included in every Signature Security engagement: we re-test any finding you tell us is remediated, within ninety days of the original engagement close, for no additional fee. You remediate on your schedule; we validate when you ask.

The point of this is to close the feedback loop that most pentest firms leave open. In a classical engagement, you fix what you can, you tell the auditor you fixed it, and nobody re-tests until the next annual pentest. Half the “fixed” items turn out to be partially fixed or fixed only in the code path the original tester saw. The ninety-day window catches that.

How to read the report

A good pentest report has the following properties, in this order of importance:

Every finding is falsifiable. It names a system, a version, a code path, a configuration value. Nothing generic.
Every severity rating has an argument. You can disagree with it. You can ask us to re-rank. We will defend the rating or change it.
Every remediation is actionable. “Adopt defense in depth” is not a remediation. “Add HttpOnly, Secure, and SameSite=Strict on the session cookie set at auth/login/route.ts:142” is.
The executive summary names the two or three things you must do first. A flat list of 18 findings is not useful to a board. A sequenced wave is.
The report is signed. The operator puts their name at the foot of the findings section. If a finding later turns out to be wrong, you know whom to call.

Things that go wrong

In order of probability:

Scope creep. You discover mid-engagement that there is a whole second API service that was not in the scope. Either we renegotiate scope (we will; fair, fixed price) or we note the gap and put a finding on it that says “unscoped surface; recommend follow-on engagement.”
Environment issues. Your staging is down, or test accounts are locked out, or a firewall rule blocks our IPs. We bake two contingency days into every growth-tier engagement for this.
A finding turns out to be a design decision. You look at finding F-11 and say “that is how it is supposed to work.” Good. We discuss, rewrite the finding as an “accepted risk” with the rationale recorded, and move on. No arguing.
A severity rating is contested. Also good. We walk through the rating criteria, listen, sometimes agree, sometimes hold. Documented either way.
An exploit does not reproduce. Rare, but it happens - a transient state, a patched dependency. We record the non-reproduction, keep the finding as “conditionally exploitable” or downgrade.

Questions to ask us in the sales call

Who will be the lead operator on my engagement, and how many critical findings have they personally validated in the last year?
What does your writeup look like for a prompt-injection finding, if my product has an LLM pipeline?
What is your disclosure protocol if you find something truly critical on day one?
Can I see a redacted report from a previous engagement?
What does the ninety-day re-test cover, exactly, and how do I invoke it?

We answer all five in the first call. If any of these questions would be a problem for a firm you are evaluating, that is a signal.

What our engagement contains, specifically

For reference, the Signature Security tier at NexcurAI contains:

External pentest (2 weeks).
Authenticated / internal pentest (2 weeks).
Cloud architecture and IAM review (1 week).
AI pipeline review if applicable (additional, inline).
Threat model of one priority data flow (STRIDE plus LLM extensions).
Signature Handbook with findings, architecture notes, ninety-day roadmap.
Sanitized report suitable for sharing with enterprise prospects under NDA.
Handover call with the operator who ran the test.
Ninety-day re-test window.
Quarterly refresh option.

Fixed price. No hourly billing. No change orders for items inside the scope above.

Series A security readiness - the broader framework this pentest lives inside.
IAM hardening field manual - the tactical manual for the cloud architecture part of the engagement.
Essay: the pentest report as a literary form - why our report shape looks the way it does.
Essay: Claude is not a pentester - the division of labor, argued.
Sample: pentest report example - what the deliverable looks like.
Service page: cybersecurity
Case study: Cloudwrit (fictional template)

Your first pentest with AI in the loop.