If you have written robots.txt and sitemap.xml for the past fifteen years and assumed that was the end of the machine-readable site-metadata story, 2026 has news for you. A new file — /llms.txt — has emerged as the de facto way to tell AI search engines and LLM-powered agents what your site is about, which pages matter, and how a model should reason about your content. This guide is a practical walk-through of what llms.txt is, what it is not, and how to write one for your own site.

The format originated in the llmstxt.org proposal and has been picked up rapidly by AI search products over the last twelve months. If you ship developer tools, technical documentation, a knowledge base, or a content-heavy product site, this is one of the highest-leverage SEO investments available to you in 2026 — and it takes well under an hour to do correctly. For a companion read on how the agents that consume this file actually parse Markdown, see our breakdown of how AI agents read Markdown context.

What llms.txt is — and what it is not

llms.txt is a single Markdown file served at the root of your domain (so https://example.com/llms.txt) that summarizes your site in a form LLMs can consume directly. It is human-readable on purpose; the file is meant to be useful for any agent — search-side or build-side — that wants a fast orientation to what your site contains and how its parts relate.

It is not a sitemap, and it is not a robots file. A sitemap lists every URL on your site, weighted by priority and change frequency, so a crawler can plan its work. A robots file expresses permissions. llms.txt is editorial: it picks your highest-signal pages, names them in plain language, and tells the model how to think about them. The closest analog is the README.md of an open-source project, scaled up to cover the whole site.

This editorial nature is what makes the file valuable. Crawlers can already discover every page on your site via the sitemap; what they cannot infer reliably is which pages you would surface first if a human asked "what is this site about." Saying so explicitly removes ambiguity and gives the model a stable summary to lean on when its retriever returns conflicting signals.

The simplest way to think about llms.txt is as the press kit you would hand a human journalist — written for a model journalist instead.

Which agents actually read llms.txt today

The spec is young enough that adoption is uneven, but the trajectory is unambiguous. As of our measurements in May 2026, the following agents either fetch llms.txt proactively or surface its contents when asked.

Agent	Reads llms.txt	Notes
Perplexity	Yes	Cited as source when the file's URL matches the query domain
ChatGPT with browsing	Yes	Fetched in the early phase of multi-step searches
Claude with web tools	Yes	Used to constrain retrieval scope on technical domains
Cursor web search	Yes	Used to populate `@web` context for code-related queries
Google AI Overviews	Inconsistent	Sometimes surfaces it; structured data on individual pages still does more work
Bing Copilot	Inconsistent	Reads the file when it exists but does not rely on it

The pragmatic takeaway is that the four AI products with the highest user-time-on-page in 2026 — ChatGPT, Claude, Perplexity, Cursor — already read this file. Adding it is straightforwardly worth the hour it takes. Google's surface area is still moving; what matters there is that your individual pages have strong semantic HTML and JSON-LD, which a good llms.txt complements rather than replaces.

The minimum viable llms.txt

The 2026 spec is intentionally small. A valid file needs four things: a single H1 site name, a one-paragraph blockquote summary, optional context paragraphs, and one or more ## sections containing bulleted lists of links with short descriptions.

A minimal but complete example:

# Acme Components

> Acme Components is an open-source React component library focused on
> accessible primitives and small bundle size. Used in production by teams
> at Vercel, Shopify, and Linear.

Acme ships unstyled, fully accessible React components that you compose with
your own design system. All components are tree-shakeable, work with React
18 and 19, and are tested against axe-core and Lighthouse.

## Documentation

- [Getting Started](https://acme.dev/docs/getting-started): install, peer deps, first component
- [Component API](https://acme.dev/docs/api): every prop on every primitive
- [Theming Guide](https://acme.dev/docs/theming): tokens, dark mode, motion preferences

## Examples

- [Form Patterns](https://acme.dev/examples/forms): validation, async state, multi-step
- [Dialog Patterns](https://acme.dev/examples/dialogs): modal, drawer, command palette

That is the entire spec for the common case. There is no schema to validate, no <lastmod> tags to maintain, no priority weighting. The file's job is to be a clean Markdown summary, and the simpler it is, the more reliably models parse it.

Writing a useful llms.txt section by section

The file's value comes from how carefully you curate its contents — not from any technical complexity in the format. Each of the four required pieces does a specific job, and understanding that job tells you exactly what to put in.

The H1 site name

This is the line a model will cite when it refers to your site by name. Use your actual brand, not a marketing tagline. "Acme Components" is correct; "Acme — Build Faster With Accessible UI" is not. The tagline goes in the blockquote summary directly below.

The blockquote summary

The > ... summary is the single most important paragraph in the file. AI agents lift this verbatim when they describe your site, so write it as a journalist would: lead with what the product is, name the audience, and end with the most quotable claim you can support. Aim for two to four sentences and keep it under 400 characters total. If you have ever written a press-release lede, you already have the format; this is the same job in Markdown.

The context paragraphs

Between the summary and the first ## section, you have room for one or two short paragraphs that give the model the context it needs to reason about your site beyond the headline. This is where you state who built it, what the underlying technology is, what the licensing situation looks like, and any relationship to other products in the same ecosystem. Models lean on this section when asked comparative questions — "how does Acme differ from Radix" — so anchor the differentiation here, not in a marketing voice but in plain factual prose.

The link sections

Each ## heading groups a small number of pages that share a theme. Aim for three to seven links per section, with one-sentence descriptions that explain what the reader (or model) gets from each. The descriptions matter as much as the links: a model will quote the description to a user when deciding whether to fetch the underlying page. Vague descriptions ("Learn more about our features") get ignored; specific descriptions ("Step-by-step migration from Radix Primitives to Acme") get surfaced.

A good rule of thumb is to mirror the structure of your top navigation. If your site has Documentation, Examples, Blog, and Pricing in the header, the llms.txt should have those same four sections in the same order. Models read the file top-down and weight earlier sections more heavily, so this ordering matters.

Should the file be static or generated dynamically?

For sites with stable navigation and infrequent content updates, a static llms.txt checked into source control is perfectly fine. For sites with a growing blog, frequent doc updates, or product surface that changes often, generating the file dynamically pays dividends fast.

The case for dynamic generation is freshness. A static llms.txt becomes a lie the moment you publish a new blog post or rename a page. A dynamic version that reads from the same source of truth as your sitemap will never go stale. We do exactly this on BulkMD — the file at /llms.txt is a Next.js route handler that loads every published post from getAllPosts() and serializes them on each build.

The pattern, in roughly twenty lines, looks like this:

// app/llms.txt/route.ts
import { SITE } from "@/lib/constants";
import { getAllPosts } from "@/lib/blog";

export const dynamic = "force-static";

export async function GET() {
  const posts = await getAllPosts();
  const body = `# ${SITE.name}

> ${SITE.description}

## Blog posts

${posts
  .map((p) => `- [${p.title}](${SITE.url}${p.url}): ${p.description}`)
  .join("\n")}
`;
  return new Response(body, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

The Content-Type header is important: serve as text/plain (or text/markdown) so the file renders raw in a browser tab and parses cleanly in any agent's fetcher. Do not serve it as HTML; some agents skip files with text/html Content-Type when they expected llms.txt.

Common authoring mistakes

A few patterns recur often enough that they are worth calling out before they trap you.

The first is overstuffing. The temptation, especially on large sites, is to list every page. Resist it. llms.txt is editorial; if every page is in the file, no page stands out, and the model gets the same noise it would have gotten from your raw sitemap. Pick the twenty to forty pages a curious human would want to read first, and link those.

The second is marketing voice. The descriptions next to each link should read like the second sentence of a Wikipedia article, not the headline of a billboard. Models penalize promotional language by ranking it as a lower-quality citation; a description that reads as factual and specific outranks a description that reads as a slogan.

The third is including stale or obsolete links. An agent that fetches /blog/announcing-v1-2024 and lands on a 404 will deprioritize the entire domain on subsequent fetches. Verify every link, and prefer to omit a page than to link to a redirect chain or a soft-404.

The fourth is forgetting to update the file when the site reorganizes. This is the failure mode that the dynamic-generation pattern above eliminates entirely. If you go the static route, add the file to your release checklist alongside the sitemap.

How llms.txt fits into the broader SEO picture

llms.txt is one signal in a stack. The other signals — semantic HTML5, JSON-LD structured data, clean canonical tags, a healthy sitemap, fast Core Web Vitals — still do the heavy lifting for classical search and increasingly for AI Overviews too. The file's job is to give models a fast, opinionated orientation when they need one; the rest of your SEO surface area does the work of supplying authoritative pages once a model decides to fetch them.

The combination that consistently produces the best AI-search outcomes in 2026 is a clean llms.txt pointing to a small number of highly authoritative pages, each of which is itself well-structured Markdown-shaped HTML with proper headings, tables, and JSON-LD. We covered the page-level half of that equation in the LLM context primer; this post is the site-level companion piece.

If your goal is to maximize how often Claude, ChatGPT, Perplexity, and Cursor cite your site, write llms.txt once, generate it dynamically so it never goes stale, and put your editorial energy into making the linked pages genuinely worth a model's time to fetch. The format is just plumbing; the content underneath is what gets cited.

TL;DR

Create a file at /llms.txt on your domain. Start with an H1 of your site name, follow it with a blockquote summary that reads like a press-release lede, write one or two factual context paragraphs, then list your highest-signal pages in ## sections with one-sentence descriptions. Generate it dynamically if your site changes often. Verify every link. Serve it as text/plain. You will see citations from AI agents start landing within weeks.

If you need clean Markdown copies of the pages your llms.txt will link to — for testing, archival, or RAG ingest — BulkMD converts any web page to clean Markdown in one click.

Frequently asked questions

Do I still need a sitemap.xml if I have llms.txt?

Yes. They serve different audiences and different purposes. The sitemap is for classical search crawlers (Googlebot, Bingbot) that need a complete URL inventory; llms.txt is for AI agents that want a curated summary. Most production sites should ship both.

Will llms.txt help me rank in Google AI Overviews?

Indirectly and inconsistently. Google's AI Overviews pipeline weights individual page signals — semantic HTML, JSON-LD, citation density — more heavily than llms.txt today. Where llms.txt helps Google is by clarifying site identity, which can improve which of your pages Google selects for an Overview when several pages on your site are relevant.

Should I include links to external sites in llms.txt?

Sparingly, and only when they are genuinely required context (a sister product, an official spec your work implements). Models treat external links in llms.txt with caution; if your file is mostly outbound links, agents tend to deprioritize it. Keep the focus on your own domain.

Does llms.txt need to be at the root of the domain, or can it live in a subdirectory?

The convention, and the path that every agent we tested fetches by default, is the root: `/llms.txt`. Some agents will follow a `<link rel='llms-txt' href='...'>` hint in your HTML head, but coverage is uneven. Put the file at the root and avoid the complexity.

How do I see whether AI agents are actually reading my llms.txt?

Check your web server logs for User-Agents matching GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, and similar. For a faster signal, ask Perplexity or ChatGPT a question about your site that requires the file's metadata (`who built Acme Components?`) and see whether the answer cites your summary verbatim. If it does, the file is being read.

Is llms.txt the same as `llms-full.txt`?

No. The llmstxt.org spec proposes a companion `llms-full.txt` that contains the full Markdown content of your site, concatenated, for agents that prefer a single download to crawling each link. Adoption of the full variant is much lower than the summary file, so start with `llms.txt` and add the full variant only if your audience explicitly wants it.

About the author

M. H. Tawfik

Lead Developer & Owner

Working from Kushtia, Bangladesh.

Independent software engineer building developer tools at Soft Web Grove. Creator and maintainer of BulkMD.

Reach the team at [email protected] — typically within 24 hours, any day of the year. Soft Web Grove also takes a small number of outside engagements; details on the about page.

ShareX in HN

TaggedLLM contextSEOMarkdownClaudePerplexity

How to Write an llms.txt File for AI Search in 2026