BulkMD

Cut LLM Token Costs by 60–80% with Clean Markdown Context

A measured breakdown of how converting source pages to Markdown reduces prompt tokens, the math behind the savings, and where the gains plateau — with real numbers from twenty benchmark pages.

M. H. Tawfik8 min read

If you're paying per million tokens, the format of your context window matters as much as the content. Most teams underestimate this because the obvious move — paste the URL into the chat UI — hides the cost behind a flat monthly bill. The moment you move to the API, the cost shows up immediately, and Markdown is the highest-leverage compression you'll do without losing information.

This post is the measured version of the claim. How much do you actually save, where do the gains plateau, and when is the conversion not worth it. If you want the conceptual case for Markdown as context first, the LLM context primer covers the why before getting to the how much.

The benchmark

We took twenty representative source pages and measured token counts using the cl100k_base tokenizer (GPT-4 / GPT-4o family) in three forms:

  1. Raw HTML — exactly what view-source: shows.
  2. Readability-only HTML — Mozilla Readability run on the page, no Markdown step.
  3. Clean Markdown — Readability + Turndown with GFM rules and our normalization pass.

The corpus mix:

  • 6 long-form blog posts (Substack, personal blogs, company engineering posts)
  • 5 technical docs pages (MDN, React, Vercel)
  • 4 news articles (NYT, BBC, The Verge)
  • 3 product landing pages
  • 2 GitHub issue threads

The results

Source formatMedian sizeMedian tokensReduction vs. raw HTML
Raw HTML142 KB38,000
Readability HTML71 KB19,000−50%
Clean Markdown18 KB7,800−79%

A few things stand out:

  • Readability alone saves about half the tokens. The boilerplate stripping is doing most of the heavy lifting up to that point.
  • Markdown adds another ~30 percentage points on top. Attribute soup (class="prose prose-invert", data-track-event="...") survives Readability; it doesn't survive Turndown.
  • The variance is wider on docs pages because they're already lean. A clean MDN article shrinks ~40% in Markdown form; a feature-bloated product landing page shrinks ~85%.

What it means for your bill

Plug those numbers into Claude or GPT API pricing as of mid-2026:

200 K-token context window costing ~$3 per million input tokens.

A 38,000-token raw-HTML page costs ~$0.114 per call. The same page as 7,800-token Markdown costs ~$0.023 per call. On a daily research workflow that ingests 100 source pages, the difference is $273/mo vs. $54/mo — a ~$220 swing for changing one step in the pipeline.

That math gets dramatically better the moment caching enters the picture. Claude's prompt caching prices cached tokens at 10% of base. A cached Markdown context is 1/50th the cost of an uncached HTML one. If you maintain a static knowledge base behind a chat, Markdown + caching is the difference between viable and a write-off.

Where the savings plateau

Markdown is not a magic compression algorithm. The reductions stop once the page is already mostly prose:

  • A clean MDN page (already a sparse document) shrinks ~40%. There just isn't that much HTML chrome to strip.
  • A short blog post can actually end up slightly longer in tokens because Markdown adds line breaks and > Source: citation blocks. The gain is structural clarity, not raw count.
  • API reference docs with heavy <table> structures convert well — Turndown emits GFM tables that the model reads cleanly.

The rule of thumb: the more chrome a page has, the more dramatic the Markdown win. Marketing pages and modern news sites with infinite-scroll widgets save the most. Hand-written technical posts save the least.

Beyond raw count: answer quality

Token count is the easy metric. The harder one is answer quality, and Markdown wins there too. We re-ran a set of 30 question-answering tasks across the same corpus, once with raw HTML context, once with Markdown:

  • Citation accuracy improved measurably with Markdown context. Models cite Markdown headings (## ...) far more reliably than they cite <h2 class="post-title"> text, because the Markdown form is unambiguous about what is a heading.
  • Hallucination rate dropped on long-context questions. Our hypothesis: when the model spends fewer attention "slots" on class= attribute noise, it has more capacity for the actual prose.
  • Latency improved on claude-haiku and gpt-4o-mini because the input is smaller. ~22% faster median first-token time on our test set.

We're not going to over-claim this — quality measurement on LLM tasks is noisy. But the direction is consistent across the runs we did, and the physics of it is straightforward: less noise in, less noise to reason around.

When not to convert

Three cases where you should leave the HTML alone:

  1. You need DOM-level fidelity. If you're building a tool that needs to know "this button has aria-label='Buy now'," strip Markdown and stay in HTML.
  2. The page is mostly tables of numeric data. Markdown's GFM table support is good, but some heavily-styled financial tables lose semantics. For those, CSV or JSON is a better target than Markdown.
  3. You're feeding a vision model. Multimodal models do better on rendered screenshots than on either HTML or Markdown for visually-laid-out content like dashboards.

For 95% of "I want to use a web page as LLM context" workflows, none of those apply, and Markdown wins.

Doing it without a server

The simplest implementation is a local browser extension that runs Readability + Turndown inside the page's content script. No upload, no API key, no per-call cost. BulkMD does exactly this; install it free from the Chrome Web Store and the conversion runs entirely in your tab. The bulk-export workflow post covers the patterns that let it survive long batch runs.

The TL;DR: if your monthly LLM API bill is in three figures or higher and you're feeding it web context, this is the cheapest optimization you'll do this quarter.

Frequently asked questions

Does the 79% saving hold for Claude as well as GPT-4?

Yes, within ~3 percentage points. Both tokenizers are byte-pair encodings with similar vocabularies for Latin-script content. Claude's tokenizer is marginally less efficient on HTML attribute syntax, which makes the Markdown win slightly larger for Claude users.

What about output tokens? Does Markdown context affect generation cost too?

Indirectly. Output tokens are billed separately and depend on what you ask the model to produce. But cleaner input tends to produce shorter, more focused output because the model isn't trying to summarize away the noise — we see ~10–15% shorter median responses on the same prompts. That's a real but smaller win than the input-token saving.

How does this compare with embedding-based RAG instead of putting the whole page in context?

They're complementary, not competing. RAG is the right call when your corpus is too big to fit in any context window. But the chunks RAG retrieves are still better as Markdown than as HTML — you want the retrieved passage to be high signal-to-noise. We feed RAG indexes from clean Markdown for exactly this reason.

Is the conversion step itself a meaningful cost?

Browser-side conversion (Readability + Turndown) runs in tens of milliseconds and costs nothing — it happens in the user's tab. Server-side conversion APIs charge per call and add a network round-trip. For 100 pages a day, doing the conversion locally is the difference between $0 and tens of dollars in operational cost, on top of the inference savings.

What's the catch?

Three real ones: pages where layout encodes meaning (heavy financial tables, design-heavy dashboards) lose semantics in Markdown; SPA-rendered content needs the converter to wait for hydration; and aggressively scripted pages (Cloudflare-protected, infinite-scroll feeds) can take longer to settle than the conversion timeout. None of these affect the 95th-percentile use case of pasting articles or docs into an LLM.

About the author

M. H. Tawfik

Lead Developer & Owner

Working from Kushtia, Bangladesh.

Independent software engineer building developer tools at Soft Web Grove. Creator and maintainer of BulkMD.

Reach the team at [email protected] — typically within 24 hours, any day of the year. Soft Web Grove also takes a small number of outside engagements; details on the about page.

ShareXinHN
TaggedTokensCost optimizationPrompt engineeringRAG