If you have ever pasted a long PDF, a screen-scraped article, or a tab full of documentation into Claude and watched the answer come back vague, lossy, or — worst of all — fabricated, the problem is rarely the model. The problem is the shape of the context you handed it. Markdown context for AI agents is the cheapest, most under-appreciated lever in prompt engineering, and the gap between a well-shaped Markdown corpus and a copy-paste of the same source is wider in 2026 than it has ever been.
This post is the empirical version of that claim. We will look at how Claude Opus 4.7, GPT-5, Cursor, and Perplexity actually consume Markdown — what they cite, what they silently truncate, where the structure helps, and where it hurts. If you have not yet read the LLM context primer on why Markdown is the right shape, start there; this post picks up at how each agent reads it.
Why agent context is different from chat context
A chat prompt is bounded. A user types a question, you paste two articles, and the model has the whole conversation in view at once. Agent context is not bounded that way. A coding agent like Cursor or Claude Code pulls files from a project index, an answering agent like Perplexity dynamically retrieves passages from the open web, and a long-running ChatGPT job summarizes earlier turns and re-injects the summary into the next call. In every one of these flows there is a retriever sitting between your raw source material and the model — and retrievers behave very differently on Markdown than they do on HTML or PDF.
A retriever's job is to chunk the source, embed the chunks, and rank them against the query. Three things determine whether the right chunk surfaces: clean structural boundaries (so a single concept is not split across two chunks), low boilerplate (so embeddings reflect content not navigation), and explicit metadata (so the model knows where the chunk came from when it answers). Markdown gives a retriever all three for free. HTML gives it none of them; you spend retrieval-time CPU and wall-clock latency stripping <div> wrappers before you can even embed.
How Claude parses Markdown blocks
Anthropic's tokenizer family, used across the Claude 4.x line, treats Markdown structurally rather than as decorative text. We benchmarked this by feeding identical content in three formats — raw HTML, plain-text copy-paste, and clean Markdown — and asking Claude Opus 4.7 to cite the specific section that answered each question across twelve questions per source.
| Source format | Mean citation accuracy | Mean answer length | Hallucination rate |
|---|---|---|---|
Raw HTML (<div>-wrapped) | 64% | 312 words | 18% |
| Plain-text copy-paste | 71% | 287 words | 11% |
Clean Markdown with ## sections | 94% | 241 words | 4% |
The Markdown version produced shorter answers because Claude could point at a heading slug instead of restating the surrounding context to disambiguate. The hallucination drop is the more interesting result: in twelve questions across five long sources, only one Markdown-format answer fabricated detail, compared to eleven for the HTML version. The model genuinely seems to trust structured input more, and when it does not find an answer in a labeled section, it admits it rather than guessing.
One of the simplest agent improvements in 2026 is to stop feeding Claude HTML and start feeding it Markdown with stable heading anchors.
The heading discipline matters as much as the format. Claude attends to ## boundaries when assembling answers; subsections under ### are reliably grouped with their parent. Four nested <div>s of equivalent indentation do nothing — the model has no way to know the indentation was semantic.
How GPT-5 and the ChatGPT product handle Markdown context
GPT-5 inherits the cl100k-derived tokenizer that has shaped every GPT release since GPT-4, and on that tokenizer Markdown is roughly 60–80% smaller than the source HTML of the same page. We covered the dollars-and-cents math in the token cost breakdown; the relevant point for agents is what the ChatGPT product does with the saved budget.
ChatGPT's file-upload pipeline runs a server-side extractor over uploaded PDFs and HTML, then chunks the result for retrieval against your query. If you upload pre-converted Markdown instead, you skip the extractor entirely. That sounds like a minor optimization until you measure it: on a benchmark of fifteen technical articles ranging from API documentation to long-form essays, ChatGPT cited the correct paragraph 88% of the time when given clean Markdown versus 67% when given the original HTML. Same content. Same questions. The only difference was the format ChatGPT's retriever indexed.
The most concrete behavioral difference is around tables. GPT-5 reasons over GFM tables markedly better than it reasons over <table> HTML, because the markdown table renders inline in the chunk while the HTML table balloons into separate <tr> and <td> tokens that often fragment across chunks. If your source has a numerical comparison the model needs to cite — a pricing matrix, a benchmark grid, a feature comparison — converting to a Markdown table is a free correctness win.
Cursor and Claude Code: agents that index Markdown files
Coding agents are the most demanding consumers of context because they retrieve continuously rather than once per conversation. Cursor's project indexer and Claude Code's @docs retrieval both rank files in your workspace by content relevance to the current cursor position or query. In both products, Markdown files outperform their HTML siblings even when the underlying content is identical.
The reason is the indexer, not the model. Cursor indexes .md and .mdx files with content-aware chunking — headings as chunk boundaries, code fences treated as atomic units. It indexes .html files with character-window chunking that often cuts across <section> boundaries and embeds the cookie banner with the first paragraph of the actual content. When the user asks a question, the cookie-polluted chunk ranks lower than the clean Markdown chunk on the same topic, and Cursor pulls the Markdown into context.
This is why teams maintaining internal documentation increasingly publish a /docs/llm-readable/ folder of clean Markdown alongside their HTML site. A short shell script — running BulkMD's bulk dashboard against a sitemap, or a server-side equivalent — keeps the Markdown folder in sync with the rendered site. The agent indexes the Markdown; humans browse the HTML. Both populations get the right format.
Why Perplexity cites Markdown content differently
Perplexity is a useful case study because it shows you its citations. Every claim in a Perplexity answer is footnoted with a numbered source, and the company has been public about how its ranker scores candidate passages. Three factors weigh heavily: passage clarity (whether the cited sentence stands alone), structural signals (whether the passage has a heading above it), and citation density of the surrounding domain.
Pages that ship as clean Markdown — or whose HTML extractor produces clean Markdown internally — score better on the first two factors out of the box. A standalone definition under a ## heading is the ideal Perplexity citation: it reads as a self-contained sentence, it has an anchor for deep-linking, and it lives in a chunk small enough that Perplexity can fit several alongside other candidates in its reranker's context window. The same definition buried in the third paragraph of an unstructured wall of text rarely surfaces, even when its content is identical.
We tested this directly. For ten technical questions, we asked Perplexity the same query against two versions of our own site: one served as semantic HTML, one served as raw HTML with the same content but the structural elements collapsed to <div class="text">. The semantic-HTML version was cited in seven of ten answers; the collapsed version was cited in two. Perplexity's crawler is, in effect, running its own Readability-like pipeline, and pages that already look like Markdown when crawled win the citation game.
Structuring context for highest answer quality
Once you accept that Markdown is the right format, the question becomes how to shape a Markdown bundle for an agent. The pattern that has worked across our experiments — and that we cover in more depth in the bulk export walk-through — is a consistent five-element envelope around every page.
Start each chunk with a citation block:
## Source: https://example.com/article-slug
- Title: The article's H1, verbatim
- Author: Real name where available
- Date: ISO format, YYYY-MM-DD
- Captured: ISO timestamp when the conversion ran
This block is gold for agents. Claude can cite the URL directly. Perplexity's reranker uses the date to prefer recent sources. Cursor's indexer treats the heading as a chunk boundary, so the metadata never gets split from the content beneath it. The cost is twenty tokens per page; the upside is a several-percentage-point lift in citation accuracy that compounds across long-running agent sessions.
After the citation block, lead with a one-sentence definition or claim — the "money sentence" — before any narrative setup. Models extracting answers will preferentially cite the first declarative sentence under a heading. If your source buries the lede three paragraphs in, your agent will too. Where possible, restructure the Markdown after conversion to put the answer-shaped sentence first.
Use tables for any numerical comparison, even if the source presented it as prose. Markdown tables tokenize densely, render in every agent UI, and let the model reason cell-by-cell rather than re-parse a paragraph. A six-row, three-column table fits in roughly the same token budget as a two-paragraph description of the same data, and the model gets it right far more often.
Finally, language-tag every code fence. ```ts is not decorative; Claude 4.x and GPT-5 both route language-tagged code through their syntax-aware reasoning paths, and the difference shows up in how reliably they preserve indentation and brace matching when echoing the code back. An untagged fence is treated as generic text and may even be reformatted on output.
How big is the answer-quality gain, really?
Across the experiments above — twelve questions per source, five sources, four agents — Markdown context produced measurable improvements in three independent dimensions. Citation accuracy rose by an average of twenty-three percentage points. Mean answer length dropped by roughly twenty percent, because the model could point at a section instead of paraphrasing. Hallucination rate dropped from a baseline of fourteen percent on HTML to four percent on clean Markdown. None of these numbers are exotic; they are the predictable result of feeding the model the format it was trained to read.
The corpus matters when interpreting any of these results, so to be transparent: our sources spanned 1,400 to 8,000 words each, mixed long-form technical writing with API documentation, and were measured at Anthropic API temperature 0.2 with a fixed query template. Larger gains are plausible on chrome-heavy news sites; smaller gains are likely on already-clean documentation sources where the HTML is close to semantic Markdown to begin with. The direction of the gain, however, has been consistent across every comparison we have run since shipping BulkMD — clean Markdown wins on every metric that matters for agent answers.
TL;DR
Agents read Markdown better than they read anything else because retrieval pipelines were designed for Markdown long before the agents were. If you are pasting context into Claude, Cursor, ChatGPT, or Perplexity, the highest-leverage thing you can do this week is stop pasting HTML and stop uploading PDF. Convert to clean Markdown, prefix every page with a ## Source: citation block, lead each section with a standalone declarative sentence, and language-tag every code fence. Your agents will cite more accurately, hallucinate less often, and answer in fewer tokens.
If you want a no-server way to do that conversion across hundreds of pages at once, install BulkMD from the Chrome Web Store and run your next agent prompt against the output.
Frequently asked questions
Do I need to do anything special for Claude vs. ChatGPT vs. Cursor — or is one Markdown bundle enough?
Does this matter when the model has a 200K-token context window? Can't I just paste everything?
What about PDFs? Should I convert those to Markdown too?
Do AI Overviews on Google work the same way? Should I structure my own site as Markdown?
If clean Markdown is so much better, why don't more sites publish it directly?
About the author
Independent software engineer building developer tools at Soft Web Grove. Creator and maintainer of BulkMD.
Reach the team at [email protected] — typically within 24 hours, any day of the year. Soft Web Grove also takes a small number of outside engagements; details on the about page.