E21 Essay · SEO & GEO

The llms.txt discipline.

Most llms.txt files we see in the wild are either empty, badly structured, or performative. The file is supposed to help answer engines understand your site at a glance. Here is the discipline we use, the exact structure, and the mistakes we keep seeing.

SEO & GEO 10 min read 2026-04-14 by the operator drafting assisted by Claude
Corrections log: none yet. If you find a factual error, email hello@nexcur.ai and we will log it here, dated.
1 What llms.txt is actually for

What llms.txt is actually for.

The proposal is simple. A root-level file at /llms.txt that gives answer-engine crawlers a map of the site in a format they can parse reliably, without having to traverse the full site tree.

It is not a ranking signal in the SEO sense. Answer engines do not "rank" llms.txt. But the file gives the crawler a clean summary of your site's structure, purpose, and authoritative pages. A crawler that reads llms.txt has a better prior for which URLs to visit, which sections are canonical, and what the site claims to cover.

The analogy is robots.txt, but for meaning rather than permissions. Robots.txt tells a crawler where it may go. llms.txt tells it what each place is for.

2 The four failure modes

The four failure modes.

We have audited around 120 llms.txt files from competitors and clients in the last six months. They fail in four consistent ways.

Failure 1: empty. The file exists, is 200 bytes, and lists three URLs. The team added the file because a SaaS plugin generated it. The crawler gets almost nothing. This is the most common failure, roughly 50 percent of files we audit.

Failure 2: sitemap clone. The file is a dump of every URL on the site with no context. It is a sitemap.xml in markdown format. The crawler already has the sitemap. This adds no information.

Failure 3: marketing copy. The file is a sales pitch: "We are the leading provider of X solutions for Y companies." This is noise. The crawler wants structure, not a landing page.

Failure 4: out of date. The file references pages that no longer exist, versions that have changed, and product names that have been renamed. This is worse than not having the file at all, because it gives the crawler stale information.

The discipline is to not fail in any of these four ways. That alone puts you ahead of roughly 90 percent of sites we audit.

3 Our file structure

Our file structure.

We use a six-section structure. It is opinionated; other structures work; this is the one that has stayed useful across a dozen sites.

Header. One H1 with the organization name. Two sentences underneath: what the organization does and who it serves. This is the summary a crawler gets in the first 200 bytes.

Core pages. A small list of 5 to 10 authoritative URLs. Home, about, services, pricing, contact. Each with a one-sentence description. These are the pages a crawler should treat as canonical for structural questions about the organization.

Reference library. A list of pages that function as citation surfaces (glossaries, benchmarks, checklists, FAQ collections). Each with a one-sentence description of what the page answers. These are the pages we want cited. They are listed first because they are the highest-value targets for answer engines.

Writing. A list of recent essays with titles and dates. Not every essay; the most recent 20 to 30. This lets the crawler see the pace and recency of publishing without having to crawl the full archive.

Policies. Privacy, terms, rules of engagement, security. Each with a one-sentence description. Answer engines sometimes cite these for compliance-adjacent questions, and having them clearly labeled improves the citation.

Provenance. A short statement about authorship, review, and how AI is involved in content production. This is our transparency section. It is unusual. It is also one of the reasons our content gets cited with higher confidence.

4 The discipline: a quarterly review

The discipline: a quarterly review.

The file goes stale. Sites change. The single most common mistake after "it is empty" is "it is a year old."

We review llms.txt every quarter. The review checklist:

Every URL resolves. Paste the file into a link checker and confirm. Stale URLs are actively harmful.

Descriptions match current reality. A page that used to be about "security handbook generation" may now be about "security audits." The description in llms.txt should match the page's current H1 and purpose.

The writing list is current. At least the last quarter's essays should be listed. If the list is older than six months, it signals an abandoned file.

The reference library grew. Every quarter you should be adding one or two new citation surfaces. If the list is unchanged for two quarters, either you are not publishing reference-grade content or you are forgetting to register it here.

Provenance statement is accurate. If the way you use AI in content has changed (more review, less review, different tools), the provenance section updates. Being wrong here corrodes the trust that makes the file useful at all.

The whole review takes 20 to 30 minutes. It is a calendar event, quarterly, not an ad-hoc task that never gets done.

5 What it does not do

What it does not do.

Let us be honest about the limits.

llms.txt does not make your content get cited. If your pages are thin, your file will not rescue them. Citations flow from claim density, original data, and clarity. llms.txt is a navigation aid for the crawler, not a content replacement.

It is not universally supported. Some crawlers read it, some do not, and the ones that do use it differently. The upside is that the file costs little to maintain and helps where it is read. The downside is that you will never see a clean attribution chain from "we shipped llms.txt" to "citations went up."

It is not a ranking signal on Google. Google's SERP rankings are driven by page quality, topical depth, and the other surfaces we wrote about in the dual-optimization essay. llms.txt does not touch that pipeline.

So why do it? Because the cost is low, the discipline is small, and the file functions as a forcing function for the site. The act of writing an honest llms.txt makes you notice which pages are not citation-worthy, which sections are under-covered, and which pages the organization should be able to stand behind. In that sense, the file is less a ranking tool and more a mirror.

E21.X Related work
E21.S Subscribe

One essay a week. No filler.

Four pillars, one email every Tuesday. If we have nothing worth sending, we skip the week.