If you are building a tool that puts web content into an LLM context window, you eventually want to show the user how many tokens their text is and what it will cost — before they paste it into ChatGPT or fire an API call. The obvious way is to run the real tokenizer. The problem is that bundling a full tokenizer into a browser extension or client-side app is expensive: the tiktoken WASM build and its cl100k_base vocabulary run to several megabytes, which is a lot to ship for a number that only needs to be approximately right. This post covers how to estimate LLM token count client-side with a character-based heuristic, where that heuristic is accurate, where it breaks, and a JavaScript implementation you can drop into a content script.

We will start with why the naive "divide by four" rule is wrong for technical content, build a piecewise estimator that accounts for content shape, bound its error against a real tokenizer, and tie it back to the live token and cost counter that BulkMD shows on every converted page.

Why not just bundle the real tokenizer

The accurate answer to "how many tokens is this string" is to run the exact byte-pair encoder the model uses. For the GPT-3.5 and GPT-4 family that is cl100k_base; for GPT-4o and the o-series it is o200k_base; Claude ships its own tokenizer. Each is a deterministic algorithm plus a learned vocabulary of merges, and the vocabulary is the heavy part — a hundred thousand or more entries that map character sequences to integer IDs.

In a server context this is free to run — pip install tiktoken, encode, count, done. In a browser the calculus changes:

The tiktoken WASM bundle plus the cl100k_base ranks file is on the order of several megabytes uncompressed. For a Manifest V3 extension whose entire job is converting pages, that can dwarf the rest of the code, and Chrome Web Store reviewers and users alike notice an extension that ships an order of magnitude more than its features justify.
Loading and initializing the WASM module adds startup latency and memory that you pay for on every tab, for a feature most users glance at. A content script that lazy-loads megabytes of vocabulary before it can render a number is a poor trade for a readout that just needs to be in the right ballpark.
You would still only have one model's tokenizer. The moment you want to show GPT-4o and Claude estimates side by side, you are bundling multiple vocabularies, and the size problem multiplies.

For a live UI readout — a number that updates as the user toggles options or edits text — being within roughly 10% is good enough to inform a decision ("this is ~8K tokens, comfortably inside a 200K window") and a near-zero bundle cost is worth far more than the last few percent of precision. The rule we apply throughout: estimate in the browser for feedback, verify exact counts server-side when money depends on it. The estimate answers "will this fit and roughly what does it cost"; the exact count answers "what will I be invoiced", and only the second one needs the real encoder.

The chars-per-token heuristic and why one divisor fails

Every popular tokenizer is a byte-pair encoder. It greedily merges frequent character sequences into single tokens, so the average number of characters represented by one token depends entirely on how well the text matches the sequences the tokenizer learned during training. That training corpus was overwhelmingly English web text, so common English words and word fragments merge efficiently, while unusual punctuation, mixed-case identifiers, and tag soup do not. A word like "the" is a single token; an identifier like getUserById splits into several because the tokenizer never saw that exact run often enough to merge it.

That single fact is why the widely-repeated "~4 characters per token" rule is only right for one kind of content. Here is the spread you actually see on the cl100k_base tokenizer, drawn from the per-content-type math in our token-by-content-type breakdown:

Content type	Typical chars / token	Implied divisor
English prose (Markdown)	~3.6	3.6
Bullet / numbered lists	~3.7	3.7
Markdown tables	~3.2	3.2
Code (Python)	~2.4	2.4
Code (TypeScript)	~2.2	2.2
Minified JSON	~3.0	3.0
URLs and hashes	~2.0	2.0

A global divisor of 4 under-counts almost everything technical. Apply it to a page that is half TypeScript and you will report roughly 40% fewer tokens than the model will actually charge for — exactly the wrong direction for a cost warning, because it tells the user a payload is cheaper and smaller than it is. The fix is not a better single number; it is a divisor that adapts to the shape of the text.

A piecewise estimator that reads content shape

The strategy: split the input into spans by content type, apply the right characters-per-token ratio to each span, and sum. For Markdown — which is what a web-to-Markdown tool produces — the spans are easy to detect with cheap regular expressions, because Markdown marks its own structure. Fenced code blocks start with three backticks. Tables have pipe characters and a separator row. Everything else is prose-like. This is one of the underrated advantages of working in Markdown rather than raw HTML: the format you get from a web-page-to-Markdown converter already labels its own structure, so the estimator does not have to parse a DOM to know what it is looking at.

The estimator below does exactly that. It carves out fenced code blocks first (the highest-divergence content), then treats the remainder as a blend of prose, list, and table lines, applying a calibrated ratio to each line class. It is intentionally small, dependency-free, and synchronous, so it can run on every keystroke or option toggle without blocking.

// token-estimate.js — heuristic LLM token estimator, no tokenizer required.
// Tuned against cl100k_base; o200k_base and Claude run within a few percent.

const RATIOS = {
  code: 2.4,   // fenced code blocks: identifiers + punctuation soup
  table: 3.2,  // markdown pipe tables
  list: 3.7,   // bullet / numbered list lines
  prose: 3.6,  // ordinary English paragraphs
  url: 2.0,    // bare URLs, hashes, long tokens with no spaces
};

const FENCE = /^```/;
const TABLE_ROW = /^\s*\|.*\|\s*$/;
const LIST_ITEM = /^\s*(?:[-*+]|\d+\.)\s+/;
const URL_LIKE = /https?:\/\/\S+|[A-Za-z0-9+/=]{32,}/g;

// Estimate tokens for one line of NON-code text.
function estimateTextLine(line) {
  let chars = line.length;
  let tokens = 0;

  // Pull out URL-like runs first; they tokenize densely.
  const urls = line.match(URL_LIKE) || [];
  for (const u of urls) {
    tokens += u.length / RATIOS.url;
    chars -= u.length;
  }
  if (chars < 0) chars = 0;

  let ratio = RATIOS.prose;
  if (TABLE_ROW.test(line)) ratio = RATIOS.table;
  else if (LIST_ITEM.test(line)) ratio = RATIOS.list;

  tokens += chars / ratio;
  return tokens;
}

export function estimateTokens(text) {
  if (!text) return 0;
  const lines = text.split("\n");
  let tokens = 0;
  let inCode = false;

  for (const line of lines) {
    if (FENCE.test(line)) {
      inCode = !inCode;
      tokens += line.length / RATIOS.prose; // the fence line itself
      continue;
    }
    if (inCode) {
      tokens += line.length / RATIOS.code;
    } else {
      tokens += estimateTextLine(line);
    }
    tokens += 1 / RATIOS.prose; // account for the newline character
  }
  return Math.round(tokens);
}

The logic worth calling out: code is detected by fence state, not by guessing per line, so an indented comment inside a code block is still costed at the code ratio. URLs and long base64-like runs are extracted before the prose ratio is applied, because a 60-character URL costed at 3.6 chars/token would badly under-count — those runs have almost no learned merges. Everything else collapses to a prose, list, or table ratio per line. No vocabulary, no async load, no WASM. The whole function is a single pass over the lines, so it is effectively free to call on input.

Turning tokens into a cost figure

Once you have a token estimate, cost is a lookup and a multiply. Keep the price table as plain data so it is easy to update as model pricing changes, and always label the result as an estimate.

// Prices are USD per 1M input tokens. Update as vendors change them.
const INPUT_PRICE_PER_M = {
  "gpt-4o": 2.5,
  "gpt-4o-mini": 0.15,
  "claude-sonnet": 3.0,
  "claude-haiku": 0.8,
};

export function estimateCost(tokens, model) {
  const price = INPUT_PRICE_PER_M[model];
  if (price == null) return null;
  const usd = (tokens / 1_000_000) * price;
  return usd; // format with toFixed at the call site
}

For a 7,800-token Markdown article — a representative clean-Markdown size from our token-cost benchmark — this reports about 0.0195 dollars on gpt-4o input pricing. That is the kind of number a user can act on: it tells them a single page is a fraction of a cent and a 100-page batch is roughly two dollars, which is the decision the readout exists to support. Keeping the prices as a plain object also means the table is the only thing you touch when a vendor changes pricing; the estimator itself never has to know.

How accurate is this, really

The honest framing is bounds, not a single accuracy figure, because the error depends on content shape. Tested against tiktoken's cl100k_base on representative inputs, the piecewise estimator above lands in these ranges:

Input type	Typical estimator error vs. cl100k_base
Long-form prose article	within ~5%
Mixed article (prose + a table + one code block)	within ~10%
Code-heavy doc / README	within ~10–15%
Pathological: dense non-English, emoji, or unusual symbols	can exceed 25%

A few things drive the residual error. The heuristic ignores that token boundaries do not align with line boundaries, so very short lines accumulate small rounding errors. It assumes English; CJK and other scripts that were sparsely represented in the tokenizer's training run at roughly 1.5–2 chars/token and will be under-counted by a prose ratio of 3.6. Emoji and rare Unicode can each cost multiple tokens despite being one or two characters, because they encode as several UTF-8 bytes and the tokenizer has no merge for them. For the workflow this estimator targets — English-language web pages converted to Markdown — none of those pathological cases dominate, which is why mixed articles stay within roughly 10%.

If you need to tighten the bounds for your own corpus, the calibration loop is straightforward: encode a sample of your real documents with the actual tokenizer once, offline, then adjust the per-class ratios until the estimator's totals match. You are not re-deriving a tokenizer; you are fitting four or five constants to the kind of text you actually process. That one-time, server-side calibration is the only place a real encoder needs to appear in the whole pipeline.

There is also a cross-model question. The ratios above are calibrated to cl100k_base. The newer o200k_base (GPT-4o, o-series) has a larger vocabulary and tends to tokenize slightly more efficiently, so a cl100k-tuned estimate runs a few percent high for those models — a harmless direction for a cost warning, since it over-warns rather than under-warns. Claude's tokenizer is closely related for Latin-script text and stays within a few percent as well. For English prose the three are approximately comparable, which is what lets one set of ratios serve a multi-model readout.

The single most quotable result: a dependency-free, sub-millisecond character heuristic estimates token count within ~10% of a real byte-pair tokenizer for typical English Markdown articles, at roughly zero bundle cost — close enough to drive live UI and budget warnings, while exact billing is still verified server-side.

Wiring it into a live counter

In a content script or popup, the estimator is cheap enough to call directly on input. The only discipline that matters is debouncing in editable fields so you are not re-estimating a 50KB document on every keystroke, and clearly labeling the output as approximate so nobody treats it as an invoice.

import { estimateTokens, estimateCost } from "./token-estimate.js";

const PRICE_LABEL = "~"; // signal that the number is an estimate

function renderReadout(markdown, model) {
  const tokens = estimateTokens(markdown);
  const usd = estimateCost(tokens, model);
  const cost = usd == null ? "n/a" : `$${usd.toFixed(4)}`;
  return `${PRICE_LABEL}${tokens.toLocaleString()} tokens · ${PRICE_LABEL}${cost} (${model})`;
}

// Debounce so large documents do not re-estimate on every keypress.
function debounce(fn, ms) {
  let t;
  return (...args) => {
    clearTimeout(t);
    t = setTimeout(() => fn(...args), ms);
  };
}

const update = debounce((md, model, el) => {
  el.textContent = renderReadout(md, model);
}, 120);

This is the pattern BulkMD uses. The conversion already produces clean Markdown locally, so the token estimate runs on the converted output — the number the user actually cares about, since that is what they will send — and it runs entirely offline. No string ever leaves the tab to count its tokens. The token readout, the cost figure, and the conversion itself are all client-side, which keeps the tool consistent with being local-only with no telemetry: there is no network call hiding behind the counter.

A small but real benefit of estimating on the Markdown rather than the source HTML: it lets the readout double as a savings indicator. Show the estimated tokens for the cleaned Markdown next to a rough estimate for the original, and the user sees the 60–80% reduction directly, in the unit they pay in. For a boilerplate-heavy page that figure can climb toward 90%, and showing it in tokens rather than bytes keeps the comparison honest, because tokens are what the model bills.

TL;DR

To estimate LLM token count and cost client-side, skip the multi-megabyte tokenizer and use a piecewise characters-per-token heuristic: detect code fences, tables, lists, and URLs, apply a calibrated ratio to each, and sum. Calibrated to cl100k_base, it lands within roughly 10% for typical English Markdown articles and within a few percent of o200k_base and Claude for prose — close enough to power a live counter and budget warnings at near-zero bundle cost. The actionable next step: drop the estimateTokens function above into your content script or popup, wire it to a debounced readout, and label the output as an estimate. When real money depends on the number, verify the exact count with tiktoken server-side. If you would rather not build it, install BulkMD free from the Chrome Web Store — it shows a live, offline token and cost readout on every page it converts.

Frequently asked questions

How accurate is a character-based token estimate compared to a real tokenizer?

For typical English Markdown articles the piecewise heuristic in this post lands within about 10% of cl100k_base, and within about 5% for pure prose. Code-heavy documents drift to roughly 10–15%. The large errors come from non-English scripts, emoji, and unusual symbols, which a prose ratio under-counts. That precision is fine for live UI feedback and budget warnings, but you should verify exact counts server-side before billing on them.

Why not just use a single 'four characters per token' rule?

Because four chars per token is only roughly right for English prose. Code runs closer to 2.2–2.4 chars/token, URLs and hashes near 2.0, and tables around 3.2. A global divisor of 4 under-counts technical content substantially, which is the wrong direction for a cost warning. A divisor that adapts to content shape fixes most of that error.

Does this estimator work for GPT-4o and Claude, not just GPT-4?

The ratios are calibrated to cl100k_base, which powered GPT-3.5 and GPT-4. GPT-4o and the o-series use o200k_base, which tokenizes English a few percent more efficiently, so a cl100k-tuned estimate runs slightly high for them — it over-warns rather than under-warns. Claude uses its own tokenizer that stays within a few percent for Latin-script prose. One set of ratios is adequate across all three for English text.

Why estimate in the browser instead of calling an API to count tokens?

Two reasons. First, a network round-trip to count tokens defeats the point of an instant, on-keystroke readout and adds latency. Second, for a privacy-respecting, local-only tool, sending the user's content to a server just to count it would introduce a network call where there was none. A local heuristic keeps the counter offline and instant.

How do I get the exact token count when I need it?

Run OpenAI's tiktoken library server-side: load the cl100k_base or o200k_base encoding, encode the string, and take the length of the result. Anthropic ships a token-counting endpoint and tokenizer for Claude. Both are the authoritative source; use them for reconciliation and billing, and use the browser heuristic only for live feedback.

About the author

M. H. Tawfik

Lead Developer & Owner

Working from Kushtia, Bangladesh.

Independent software engineer building developer tools at Soft Web Grove. Creator and maintainer of BulkMD.

Reach the team at [email protected] — typically within 24 hours, any day of the year. Soft Web Grove also takes a small number of outside engagements; details on the about page.

ShareX in HN

TaggedTokensCost optimizationChrome extensionMarkdown

Estimating LLM Token Count and Cost Client-Side