Rule referenceaeo/llms-txt

llms.txt — A Draft Convention for Guiding AI Engines, Checked at Your Origin

llms.txt is a draft, low-adoption convention proposed in September 2023 and championed by Jeremy Howard at Answer.AI, so pseolint runs this as a low-confidence, informational site-level check that fetches /llms.txt once at your origin and verifies 3 shape rules, treating a missing file as a missed opportunity worth roughly 1 hour of work, never a defect.

Test your site for llms.txt — a draft convention for guiding ai engines, checked at your origin

Loading bot check… if this doesn't resolve in a few seconds, refresh the page.

We'll surface findings tagged with `aeo/llms-txt`.

What it detects

This is a site-level check, not a per-page one: it runs exactly once against your origin. pseolint takes the source URL, derives its origin, requests `${origin}/llms.txt` with a 10 second timeout, and only proceeds for http and https targets. If the request fails, times out, or returns a non-200 status, the file is treated as absent.

When the file is present, pseolint runs three deliberately lenient shape checks drawn from the llmstxt.org proposal. First, the opening non-empty line must be an `# ` H1 title (lines that start with `#` but carry no title text are skipped, not rejected). Second, the file must contain at least one `## ` section heading. Third, it must list at least one markdown link of the form `- [Title](https://...)` somewhere under a section. A file that satisfies all three passes silently.

A missing file and a malformed file both surface the same low-confidence, informational finding — one tells you nothing exists at the origin, the other names which of the three rules failed. The check is intentionally forgiving because the specification is still evolving; it rejects only obvious garbage.

Why it matters

Be candid about what this is: llms.txt is a draft convention with low industry adoption, not a ranking factor and not an established standard. That is exactly why pseolint reports it at low confidence and informational severity. An absent llms.txt is a missed opportunity, never a defect, and you can ship a perfectly healthy site without one.

The upside, where it applies, is editorial control. A well-formed llms.txt lets you hand an AI engine a curated map straight to your most authoritative, citable pages instead of leaving it to infer structure from a sprawling sitemap. For a project with deep, fast-moving content — release notes, an API reference, a migration guide — that curation can be the difference between an assistant quoting your current quickstart or an answer it stitched together from a 2 year old blog post.

No search engine is known to consume llms.txt as a ranking input, and pseolint makes no such claim. Treat a finding here as a 30 minute experiment worth trying, not a penalty to fix. The authoritative reference for the format is llmstxt.org.

A page that fails

An open-source CLI tool publishes docs at docs.example.dev and adds a /llms.txt that opens with a blockquote summary, then jumps straight into bare URLs: `> The official SDK for Example.` followed by `https://docs.example.dev/quickstart` and `https://docs.example.dev/api`. pseolint fetches it, finds no leading `# ` H1 title and no `## ` section headings, and emits a low-confidence finding naming the first failed rule — the file exists but does not match the llmstxt.org shape, so an AI engine reading it gets an unlabeled list with no hierarchy to reason about.

A page that passes

The same documentation site fixes it: `# Example SDK` as the H1, a one-line blockquote summary, then `## Getting Started` listing `- [Quickstart](https://docs.example.dev/quickstart): install and first call in 5 minutes`, followed by `## Reference` with `- [API Reference](https://docs.example.dev/api): every endpoint and type` and `## Releases` linking `- [Changelog](https://docs.example.dev/changelog): updated within the last 7 days`. All three shape checks pass — an H1 title, two-plus `## ` sections, and several markdown links — so pseolint stays silent and an assistant gets a clean, captioned map to the SDK's most citable pages.

How to fix it

  1. 1Create a plain-text file at the root of your origin, served as /llms.txt, that opens with a single `# Project Name` H1 title on the first non-empty line.
  2. 2Add a short blockquote summary under the title, then break your content into `## ` sections such as Getting Started, API Reference, Guides, and Releases.
  3. 3Under each section, list your most citable pages as markdown links in the form `- [Quickstart](https://...): one-line description` so an engine can read both the link and its purpose.
  4. 4Point the links at canonical, current pages — your live quickstart, API reference, SDK guides, and changelog — not deep-archived or redirecting URLs.
  5. 5Keep it in sync with releases: a stale llms.txt that omits a new major version or a renamed code sample misleads engines more than having none at all.
  6. 6Validate against the format described at llmstxt.org and re-run the audit; a passing file is silent, so no finding means the three shape checks are satisfied.

SpamBrain context

This rule sits apart from the spam-detection family. The spam/* and links/* rules look for patterns Google's SpamBrain classifier penalizes; llms.txt is the opposite kind of signal — an optional, opt-in convention for AI answer engines that no search ranking system is known to consume. pseolint will never tell you a missing llms.txt put you at risk of a penalty, because it cannot and does not.

That framing is why the finding is low confidence and informational. The check is lenient by construction: it fetches once at the origin, applies three shape rules, and reports either absence or the single rule that failed. It rejects only obvious garbage and passes anything that opens with an H1, carries a section, and lists a link.

If you maintain an open-source tool whose documentation site ships frequent release notes and a versioned API reference, an accurate llms.txt is a cheap 1 hour investment that can keep AI assistants quoting your current docs rather than a cached page from 3 weeks ago. If you don't, you are losing nothing pseolint scores against you. The format and its rationale are documented at llmstxt.org.

Frequently asked questions

Is llms.txt an official standard that affects my Google rankings?
No. llms.txt is a draft, low-adoption convention proposed at llmstxt.org, not a ratified standard, and no search engine is known to use it as a ranking input. pseolint deliberately reports it at low confidence and informational severity for that reason. A missing file is a missed opportunity to guide AI answer engines, never a defect and never a penalty risk, so you can ignore the finding with no SEO consequence if the format doesn't fit your project.
How does the rule decide my llms.txt is malformed?
It applies three lenient shape checks from the llmstxt.org proposal. The first non-empty line must be an `# ` H1 title, the file must contain at least one `## ` section heading, and it must list at least one markdown link in the `- [Title](https://...)` form under a section. If any one of those fails, the finding names that specific rule. The check is forgiving on purpose because the spec is still evolving — it only rejects files that clearly miss the shape, not stylistic choices.
I run an open-source tool's documentation site — what should my llms.txt actually contain?
Open with `# Your Tool Name`, a one-line blockquote summary, then group your highest-value pages under `## ` sections. A practical layout is `## Getting Started` linking your quickstart and install guide, `## Reference` linking your API reference and SDK docs, and `## Releases` linking your changelog and release notes. List each as `- [Page](https://...): short description`. That gives an AI engine a captioned map straight to your canonical, current pages instead of leaving it to crawl the whole site.
Why does the check only run once instead of per page?
Because llms.txt is an origin-level file, not a page attribute. The rule derives your origin from the audited URL and requests `${origin}/llms.txt` a single time with a 10 second timeout. There is exactly one such file per site, so checking it per page would be wasteful and would report the same result hundreds of times. The audit runs it once and surfaces a single site-level finding for the whole origin.
Does a missing or failed fetch count the same as a malformed file?
Both produce a low-confidence, informational finding, but the messages differ. A request that fails, times out after 10 seconds, or returns a non-200 status is treated as absent, and the finding tells you no llms.txt was found at the origin. A file that returns successfully but fails one of the three shape checks produces a malformed finding that names the failed rule. Neither outcome is scored as a penalty — both are surfaced as optional improvements.

Related rules

Want to know whether this rule actually fires on your site?

Run pseolint against your sitemap. The audit is free, takes about a minute, and returns a per-URL list of every rule that fired — including this one — with the exact metric values so you can prioritise the fix queue.