If you're paying per million tokens, the format of your context window matters as much as the content. Most teams underestimate this because the obvious move — paste the URL into the chat UI — hides the cost behind a flat monthly bill. The moment you move to the API, the cost shows up immediately, and Markdown is the highest-leverage compression you'll do without losing information.
This post is the measured version of the claim. How much do you actually save, where do the gains plateau, and when is the conversion not worth it. If you want the conceptual case for Markdown as context first, the LLM context primer covers the why before getting to the how much.
The benchmark
We took twenty representative source pages and measured token counts using the cl100k_base tokenizer (GPT-4 / GPT-4o family) in three forms:
- Raw HTML — exactly what
view-source:shows. - Readability-only HTML — Mozilla Readability run on the page, no Markdown step.
- Clean Markdown — Readability + Turndown with GFM rules and our normalization pass.
The corpus mix:
- 6 long-form blog posts (Substack, personal blogs, company engineering posts)
- 5 technical docs pages (MDN, React, Vercel)
- 4 news articles (NYT, BBC, The Verge)
- 3 product landing pages
- 2 GitHub issue threads
The results
| Source format | Median size | Median tokens | Reduction vs. raw HTML |
|---|---|---|---|
| Raw HTML | 142 KB | 38,000 | — |
| Readability HTML | 71 KB | 19,000 | −50% |
| Clean Markdown | 18 KB | 7,800 | −79% |
A few things stand out:
- Readability alone saves about half the tokens. The boilerplate stripping is doing most of the heavy lifting up to that point.
- Markdown adds another ~30 percentage points on top. Attribute soup (
class="prose prose-invert",data-track-event="...") survives Readability; it doesn't survive Turndown. - The variance is wider on docs pages because they're already lean. A clean MDN article shrinks ~40% in Markdown form; a feature-bloated product landing page shrinks ~85%.
What it means for your bill
Plug those numbers into Claude or GPT API pricing as of mid-2026:
200 K-token context window costing ~$3 per million input tokens.
A 38,000-token raw-HTML page costs ~$0.114 per call. The same page as 7,800-token Markdown costs ~$0.023 per call. On a daily research workflow that ingests 100 source pages, the difference is $273/mo vs. $54/mo — a ~$220 swing for changing one step in the pipeline.
That math gets dramatically better the moment caching enters the picture. Claude's prompt caching prices cached tokens at 10% of base. A cached Markdown context is 1/50th the cost of an uncached HTML one. If you maintain a static knowledge base behind a chat, Markdown + caching is the difference between viable and a write-off.
Where the savings plateau
Markdown is not a magic compression algorithm. The reductions stop once the page is already mostly prose:
- A clean MDN page (already a sparse document) shrinks ~40%. There just isn't that much HTML chrome to strip.
- A short blog post can actually end up slightly longer in tokens because Markdown adds line breaks and
> Source:citation blocks. The gain is structural clarity, not raw count. - API reference docs with heavy
<table>structures convert well — Turndown emits GFM tables that the model reads cleanly.
The rule of thumb: the more chrome a page has, the more dramatic the Markdown win. Marketing pages and modern news sites with infinite-scroll widgets save the most. Hand-written technical posts save the least.
Beyond raw count: answer quality
Token count is the easy metric. The harder one is answer quality, and Markdown wins there too. We re-ran a set of 30 question-answering tasks across the same corpus, once with raw HTML context, once with Markdown:
- Citation accuracy improved measurably with Markdown context. Models cite Markdown headings (
## ...) far more reliably than they cite<h2 class="post-title">text, because the Markdown form is unambiguous about what is a heading. - Hallucination rate dropped on long-context questions. Our hypothesis: when the model spends fewer attention "slots" on
class=attribute noise, it has more capacity for the actual prose. - Latency improved on
claude-haikuandgpt-4o-minibecause the input is smaller. ~22% faster median first-token time on our test set.
We're not going to over-claim this — quality measurement on LLM tasks is noisy. But the direction is consistent across the runs we did, and the physics of it is straightforward: less noise in, less noise to reason around.
When not to convert
Three cases where you should leave the HTML alone:
- You need DOM-level fidelity. If you're building a tool that needs to know "this button has
aria-label='Buy now'," strip Markdown and stay in HTML. - The page is mostly tables of numeric data. Markdown's GFM table support is good, but some heavily-styled financial tables lose semantics. For those, CSV or JSON is a better target than Markdown.
- You're feeding a vision model. Multimodal models do better on rendered screenshots than on either HTML or Markdown for visually-laid-out content like dashboards.
For 95% of "I want to use a web page as LLM context" workflows, none of those apply, and Markdown wins.
Doing it without a server
The simplest implementation is a local browser extension that runs Readability + Turndown inside the page's content script. No upload, no API key, no per-call cost. BulkMD does exactly this; install it free from the Chrome Web Store and the conversion runs entirely in your tab. The bulk-export workflow post covers the patterns that let it survive long batch runs.
The TL;DR: if your monthly LLM API bill is in three figures or higher and you're feeding it web context, this is the cheapest optimization you'll do this quarter.
Frequently asked questions
Does the 79% saving hold for Claude as well as GPT-4?
What about output tokens? Does Markdown context affect generation cost too?
How does this compare with embedding-based RAG instead of putting the whole page in context?
Is the conversion step itself a meaningful cost?
What's the catch?
About the author
Independent software engineer building developer tools at Soft Web Grove. Creator and maintainer of BulkMD.
Reach the team at [email protected] — typically within 24 hours, any day of the year. Soft Web Grove also takes a small number of outside engagements; details on the about page.