If you have read the extractor comparison post, you know that the first half of HTML-to-Markdown is deciding what to throw away. The second half — turning the surviving HTML subtree into valid Markdown — is the serializer's job, and the three serializers that dominate in 2026 each behave differently on real-world content. This post is the benchmark counterpart for the serializer step.

The three tools we measured — Turndown, Pandoc, and the family of reverse-direction marked-style libraries — represent three different architectures: a focused JavaScript library, a universal document converter, and a tree-walker built atop a parser originally meant for the other direction. Each shines on a different axis, and the choice for any given pipeline depends as much on your runtime constraints as on your output preferences. We made this exact choice when building BulkMD and the decision tree below is the one we wish we had at the start.

What each serializer is, architecturally

Turndown is a 1,300-line JavaScript library that takes a Node (typically the article subtree returned by Readability) and walks it depth-first, emitting Markdown tokens as it goes. Its rules system is fully pluggable; the bundled GFM plugin adds tables, strikethrough, task lists, and fenced code blocks with language hints. It runs in any JavaScript environment — browser content script, Node, Deno, Bun — and ships as roughly 45 KB minified.

Pandoc is a universal document converter, written in Haskell, that supports forty-plus input and output formats. For HTML-to-Markdown, you invoke it on the command line: pandoc input.html -t gfm -o output.md. It is a separate binary, not a library, which makes integration a different shape: you spawn a process per document or batch them in a pipeline. The install is ~140 MB on disk and the cold-start overhead is the dominant per-document cost on small inputs.

marked-style reverse libraries — including turndown-rs (a Rust port of Turndown), node-html-markdown, and a handful of others — re-use parsers originally designed for Markdown-to-HTML and run them in reverse. They are typically faster than Turndown and smaller, at the cost of correctness on edge cases the original parser was never designed to handle.

Benchmark methodology

We used the same 50-page corpus and ground-truth annotations as the extractor benchmark. For each page, we ran the chosen extractor (Mozilla Readability with default settings) to produce a cleaned HTML subtree, then fed that subtree to each of the three serializers in turn. We compared the resulting Markdown against a hand-edited Markdown reference using a token-level F1 score that ignored trivial whitespace differences but penalized lost or extra structural tokens (headings, list markers, code-fence delimiters).

Runtime measurements ran on a 2024 M3 Pro single-threaded, with each tool warmed once before the timed run. Pandoc was invoked as a single process per page rather than via a long-running pipeline to keep the comparison apples-to-apples; in production you would amortize cold-start across batches.

How big is the difference, really?

Metric	Turndown + GFM	Pandoc (gfm)	node-html-markdown
Median F1 vs reference	0.96	0.97	0.91
Pages within 1% of reference	92%	94%	74%
Code-block language preservation	Yes (with rule)	Yes	Inconsistent
GFM table coverage	Full	Full	Partial
Definition list support	With plugin	Yes	No
Lazy-loaded image resolution	With rule	No (HTML-side)	No
Median runtime per page	9 ms	78 ms	4 ms
Cold-start cost	Negligible	~1.2 s (process spawn)	Negligible
Bundle / install size	~45 KB	~140 MB	~25 KB
Browser-compatible	Yes	No	Yes

Turndown and Pandoc are within two percentage points of each other on the headline F1 metric, which is closer than the discourse around them suggests. Pandoc's small edge comes from its handling of nested edge cases — multi-paragraph list items, footnotes, definition lists with multiple terms — that Turndown will miss unless you write custom rules for them.

The runtime gap is the more decisive difference. At 78 ms per page versus 9 ms for Turndown, Pandoc is roughly nine times slower in like-for-like single-document conversion, and the gap widens when cold-start is included. For pipelines processing thousands of pages this is acceptable in a batch job; for interactive use it is not.

Where each serializer breaks

Every tool has a failure mode, and a benchmark average will not reveal which one bites you. We catalog the worst patterns we encountered below.

Turndown's failure modes

The default Turndown rules will drop a code block's language hint unless you explicitly add a fence rule that reads the class="language-ts" convention. This is the single most common Turndown bug in shipped extensions; the fix is one ~15-line rule, and once you have it, code blocks come through with their syntax marker intact.

Turndown also struggles with <figure>/<figcaption> pairs. By default it emits the image and the caption as separate paragraphs, losing the semantic linkage. Again, a custom rule fixes it — but you have to write the rule. The bundled GFM plugin covers tables, strikethrough, and task lists; everything beyond that is in your hands.

Finally, Turndown does not resolve relative image sources or srcset attributes. If a page lazy-loads images with data-src, you get a broken link in the output unless you preprocess the HTML before handing it to Turndown. The BulkMD pipeline does exactly that preprocessing pass, which is why our shipped output has no broken image references; downstream consumers of Turndown who do not preprocess often complain about this and blame the serializer.

Pandoc's failure modes

Pandoc's most common failure is over-translation. Pandoc tries to map every HTML construct to its closest Markdown equivalent, which means a <details> element becomes a definition list, a <kbd> becomes literal text, and inline <span> styles become explicit emphasis. Sometimes this is what you want; sometimes you wanted the raw Markdown to preserve the HTML structure for a downstream renderer. There is no middle setting.

Pandoc also rewrites your line breaks to match its preferred output style, which complicates diff-based testing. If you are using Pandoc inside a CI pipeline that compares outputs across versions, a Pandoc version bump can produce a clean noisy diff across every file even when no content changed.

The cold-start cost is its own failure mode at small batch sizes. A single-document conversion that takes 1.2 seconds is fine; a thousand single-document conversions take twenty minutes that Turndown would have done in nine seconds. The fix is to batch — Pandoc can read a multi-document input — but the integration shape is markedly different from a per-page Turndown call.

marked-style serializer failure modes

The reverse-direction libraries lose two things consistently. The first is code-block language hints — most of them emit ``` fences with no language, regardless of what class the source <code> element had. The second is anything that requires DOM-aware handling: lazy-loaded image sources, ARIA-derived emphasis, embedded SVG. They treat the HTML as a string of characters rather than a tree, and they get away with it on simple inputs.

For corpora that are uniformly clean (a single source, predictable shape) these serializers can be the right answer because they are fast and small. For a generic pipeline that has to handle the variety of real-world HTML, they sit at the lower end of fidelity.

Which serializer belongs in a browser extension?

For an MV3 Chrome extension, the choice is again forced by environment. Pandoc cannot run in a browser; it is a native binary. Reverse-direction libraries are size-competitive with Turndown but lose fidelity in the places that matter most for LLM context — code-block languages, image sources, table structure.

This is why BulkMD pairs Mozilla Readability with Turndown plus the GFM plugin plus a stack of custom rules (code-block language detection, figure/caption joining, lazy-image resolution, link absolutization, heading demotion, and markdownlint-compliant output). The full pipeline ships as roughly 90 KB minified — small enough to fit alongside the rest of the extension's logic in a single content-script bundle, and small enough that the cold-start of injecting the script into a page is negligible compared to the page's own JavaScript.

The deeper point is that the serializer rarely matters in isolation. The combination of (Readability + Turndown + GFM + custom rules) we ship is the result of measuring each layer and choosing what gets fixed where. A rule that resolves lazy-loaded image sources, for instance, could equally live in the extractor step or the serializer step; we put it in the serializer because it has the DOM node in scope and the regex is simpler there.

A decision tree

If you are picking a serializer today, the tree is short.

You ship a browser extension or any JavaScript-only environment that needs HTML-to-Markdown — choose Turndown plus the GFM plugin. Add custom rules for the patterns your corpus requires; do not expect the defaults to cover code-block languages or lazy-loaded images. The runtime, bundle size, and ecosystem make it the only practical choice for browser-side work.

You run server-side conversion in Python or as a native command and you care about edge cases — choose Pandoc. The runtime cost is the price you pay for the broadest GFM coverage in the field, and batching mitigates the cold-start penalty.

You run server-side conversion in Node, your corpus is uniform, and per-page speed matters — pick one of the marked-style reverse libraries, but add a code-block language rule manually. The fidelity gap closes if you have predictable input.

You run a mixed pipeline with hand-tuning per document — combine Pandoc for the universal cases with a per-document rule overlay. This is the architecture you arrive at when you have a year of production data to fix.

TL;DR

Turndown is the right answer for browser-side work, Pandoc is the right answer for server-side batch conversion when edge cases matter, and marked-style libraries fill a fast-but-uniform niche. The differences are smaller in F1 than the discourse suggests; the gaps are largest in runtime, bundle size, and which environments each tool can actually run in.

If you want a shipped pipeline that has already paid the cost of building the custom rules around Turndown — code-block languages, image resolution, markdownlint compliance — BulkMD is the Chrome extension that runs that pipeline in your browser, free, with no setup. For pages it cannot reach (server-side batch jobs against a fixed URL list), Pandoc remains the right tool.

Frequently asked questions

Do I need both an extractor and a serializer, or does one tool do both?

You need both, logically, but some tools bundle them. Trafilatura outputs Markdown directly (its own extractor plus an internal serializer). Pandoc has no extractor — give it the full page HTML and it tries to convert everything, including nav and ads, which is rarely what you want. The cleanest separation is to run an extractor (Readability) and a serializer (Turndown) as separate steps so each can be tuned independently.

Why does code-block language preservation matter for LLM context?

Models route language-tagged code through syntax-aware reasoning paths. A ` ```ts ` fence is materially more useful to Claude or GPT than a bare ` ``` ` fence; the model preserves indentation and brace matching more reliably when it knows the language. Losing the tag costs you correctness on code-heavy content.

Can I run Pandoc in a browser via WebAssembly?

Yes, there are WASM builds of Pandoc, but the runtime is closer to 25 MB compiled and the cold-start cost is meaningful. For occasional in-browser conversion of pasted HTML it can work; for an extension that converts on every popup click, the latency is noticeable enough that we would not ship it. Turndown remains the practical choice.

How do I customize Turndown's output to match my downstream renderer?

Turndown's rules API lets you intercept any HTML node and decide how to serialize it. The shape is `turndownService.addRule('name', { filter: 'node-selector', replacement: (content, node) => ... })`. A typical custom-rules file is 50–150 lines for a serious pipeline and lives next to the extension's content script. The Turndown docs include a worked example for tables that generalizes to any custom output.

What's the right way to test serializer output regressions?

Hand-curate 10–20 reference Markdown files for known-tricky pages — code-heavy, table-heavy, image-heavy, multilingual — and run the serializer against them on every change. Use a diff that ignores trailing whitespace and line-ending style, but flags any token differences. This catches Turndown rule regressions instantly and avoids the trap of relying on the per-page F1 average (which can stay constant while individual outputs degrade).

About the author

M. H. Tawfik

Lead Developer & Owner

Working from Kushtia, Bangladesh.

Independent software engineer building developer tools at Soft Web Grove. Creator and maintainer of BulkMD.

Reach the team at [email protected] — typically within 24 hours, any day of the year. Soft Web Grove also takes a small number of outside engagements; details on the about page.

ShareX in HN

TaggedTurndownPandocMarkdownReadabilityChrome extension

Turndown vs Pandoc vs marked: Serializer Benchmark