BulkMD

Obsidian Frontmatter for Web Clipping, Done Right

Clip web pages into Obsidian with clean YAML frontmatter and Properties: which fields to keep, the obsidian://new URI scheme, and Dataview-friendly metadata.

M. H. Tawfik13 min read

If you clip web pages into Obsidian, the quality of your vault is decided less by what you capture than by the metadata you attach to it. A clean Markdown body with no frontmatter is a dead note: searchable on full text but invisible to Properties, Dataview, and every query you will eventually want to run. Getting Obsidian frontmatter for web clipping right — the exact fields, the YAML types Properties expects, and a folder convention that survives a thousand clips — is what turns a pile of saved articles into something you can actually query. This post is the field-by-field spec we use, plus the BulkMD export that produces it automatically.

We will cover which frontmatter fields earn their place, how Obsidian Properties map YAML types to its UI, the obsidian://new URI scheme for programmatic capture, a folder and template convention, and the Dataview patterns that pay all of this off. The companion piece on building an Obsidian knowledge base from web pages covers folder discipline and migration; this one goes deep on the metadata itself.

What is frontmatter, and why Properties changed the rules

Frontmatter is a block of YAML at the very top of a Markdown file, fenced by three dashes above and below. Obsidian has read it for years, but the 1.4 release reframed it as Properties — a typed, editable UI that sits above the note body. The YAML is still the source of truth on disk; Properties is just a structured editor over it. The practical consequence is that the types you write now matter, because Obsidian infers a property's type from its YAML shape and remembers that type across the whole vault.

There are four property types you will use for clipped notes: text, list, number, and date (plus checkbox and datetime). A scalar string becomes text; a YAML sequence becomes list; an ISO date becomes date. Get the shape wrong once — write tags: reference, llm as a comma string instead of a list — and Obsidian types that property as text vault-wide, which quietly breaks tag-based queries everywhere. Consistency is not a style preference here; it is a correctness requirement.

The reserved property tags deserves special care. Obsidian treats it as a first-class concept and merges YAML tags with inline #tags in the same note. Keep it a YAML list, keep the values lowercase and hyphenated, and your tag pane and Dataview stay in agreement.

Which frontmatter fields actually earn their place

The temptation with a fresh clipper is to capture everything the page exposes — Open Graph image, reading time, word count, canonical URL, twelve meta tags. Most of it is noise you will never query. The minimal set that pays for itself across a real vault is small.

FieldYAML typeSourceWhy it earns its place
sourcetextPage URLThe join key — answers "do I already have this?" and anchors backlinks
titletextPage heading / page title tagHuman-readable label; survives renames of the file itself
authortextbyline / meta authorLets you query everything by one writer
publisheddatearticle dateThe page's own date, distinct from when you clipped it
captureddatetimeclip timeWhen it entered your vault; the timeline of your own reading
tagslistyou, at clip timeThe query surface for Dataview and the tag pane

source is the single load-bearing field. Everything else is recoverable from it later if you ever re-fetch the page, but source is what makes deduplication and backlinking tractable. The split between published and captured is the one people skip and regret: the article's publication date and the date you read it answer completely different questions, and collapsing them into one date field destroys both.

Here is the canonical frontmatter block, with the YAML shapes Properties expects:

---
source: "https://example.com/an-article"
title: "The article's headline, verbatim"
author: "Real name when available"
published: 2026-05-30
captured: 2026-06-02T14:32:00
tags:
  - reference
  - llm-context
  - obsidian
---

Note that published is a bare ISO date (Properties reads it as date) while captured carries a time component (read as datetime). The source URL is quoted because a raw URL with a colon can confuse some YAML parsers; quoting is the safe default for any value containing :, #, or a leading [.

The obsidian://new URI scheme for programmatic capture

Obsidian registers an obsidian:// URI handler, and the new action creates a note from a URL. A clipper, a shortcut, or a shell script can construct one of these URIs and hand it to the OS, and Obsidian opens with the note created. The relevant parameters:

obsidian://new
  ?vault=<vault name>
  &file=<folder/Note Title>
  &content=<URL-encoded markdown, including frontmatter>
  &overwrite=    (optional: replace if the file exists)
  &append=       (optional: append instead of replace)
  &silent=       (optional: create without focusing the note)

The key detail for frontmatter: there is no dedicated frontmatter parameter. You put the YAML block at the start of the content value, exactly as it appears on disk, and URL-encode the whole thing. A worked example in JavaScript:

function buildObsidianUri({ vault, folder, title, source, captured, tags, body }) {
  const frontmatter = [
    "---",
    `source: "${source}"`,
    `title: "${title.replace(/"/g, '\\"')}"`,
    `captured: ${captured}`, // e.g. 2026-06-02T14:32:00
    "tags:",
    ...tags.map((t) => `  - ${t}`),
    "---",
    "",
  ].join("\n");

  const content = frontmatter + body;
  const file = `${folder}/${sanitizeFilename(title)}`;

  const params = new URLSearchParams({
    vault,
    file,
    content,
    silent: "true",
  });
  return `obsidian://new?${params.toString()}`;
}

function sanitizeFilename(name) {
  // Obsidian/OS-illegal characters in note titles
  return name.replace(/[\\/:*?"<>|#^[\]]/g, "").trim().slice(0, 200);
}

Two gotchas. First, content must be URL-encoded; URLSearchParams handles that for you, but if you build the query string by hand you must call encodeURIComponent on each value or newlines and ampersands will corrupt the note. Second, very long pages can exceed the OS limit on URI length (Windows is the tightest, around the 2,000-character mark in some shells, with the registry handler more generous). For full-article clips, writing the .md file directly to the vault folder is more reliable than the URI scheme; reserve obsidian://new for short captures, highlights, and quick notes where you want Obsidian to open the result immediately.

A folder and template convention that scales

A clipper that drops every note into the vault root produces chaos by clip fifty. The convention that holds up: every web clip lands in a single inbox folder, with a Templater-style template that stamps the frontmatter shape so it is identical on every note.

vault/
├── Clippings/        # every web clip lands here first
├── Notes/            # your own atomic writing
├── Reference/        # clips you have cited and curated
└── _templates/
    └── web-clip.md   # the frontmatter skeleton

The template itself is just the frontmatter block with placeholders your clipper or Templater fills:

---
source: "{{url}}"
title: "{{title}}"
author: "{{author}}"
published: {{published}}
captured: {{date:YYYY-MM-DDTHH:mm:ss}}
tags:
  - clipping
---

> Clipped from [{{title}}]({{url}}) on {{date:YYYY-MM-DD}}.

{{content}}

The leading blockquote is a deliberate choice: it gives every clip a visible citation line at the top of the rendered note, separate from the frontmatter, so the provenance is obvious even when Properties is collapsed. The single starting tag clipping is enough; resist the urge to auto-tag by topic at capture time, because automated topic tags are almost always wrong and they pollute the namespace you will later want to curate by hand.

The discipline that makes the inbox work is migration: a clip moves from Clippings/ to Reference/ only when you actually link to it from another note. The act of citing is the signal that a clip has earned a permanent place, and it means the high-signal subset curates itself without you deciding up front what matters.

Making frontmatter Dataview-friendly

Frontmatter that nobody queries is just decoration. The payoff arrives through Dataview, the plugin that treats every note's Properties as a queryable database. Once a few hundred clips share a consistent schema, you can write queries that surface work you forgot you had.

TABLE author, published, captured
FROM "Clippings"
WHERE contains(tags, "llm-context")
SORT captured DESC

That query returns every clip tagged llm-context, newest first, with author and both dates as columns — and it only works because tags is a real YAML list (so contains can search it), captured is a real date (so SORT orders it chronologically rather than alphabetically), and the field names are byte-for-byte identical across every note. A single note that wrote Author instead of author, or tag instead of tags, silently drops out of the result set.

This is the concrete reason the type discipline from the Properties section matters. Dataview's operators are type-aware: date math (captured >= date(today) - dur(30 days)) needs a real date, list membership (contains(tags, ...)) needs a real list, and numeric comparisons need a real number. A vault where every clip obeys the same six-field schema is a database; a vault where each clip improvises its own fields is a folder of text you can only grep.

The same metadata travels if you ever move to Notion, though the mechanics differ — Notion uses database columns rather than YAML, so the field names become column headers on import. We cover that mapping in the guide to importing Markdown into Notion; the conceptual schema is identical, only the storage layer changes.

How BulkMD shapes its Obsidian export

When you export to Obsidian from BulkMD, the Markdown arrives with the frontmatter block already built to the spec above: source quoted, title from the page heading, published as a bare date when the page exposes one, captured stamped at conversion time as a datetime, and tags as a YAML list seeded with a single clipping tag for you to refine. The body below is the Readability-extracted article converted with Turndown, so boilerplate navigation and ad chrome are gone before the note ever reaches your vault — which is also why a clipped note costs 60–80% fewer tokens than the raw HTML page if you later feed it to an LLM.

Because BulkMD runs entirely in your browser with no account, no server, and no telemetry, clips of paywalled or logged-in pages work — the extension reads the authenticated tab you already have open, which a server-side fetch never can. For batch work, the bulk dashboard converts up to 10 tabs in parallel and retains roughly the last 500 results, so clearing a research session of thirty tabs into a consistently-formatted set of vault-ready notes is one operation rather than thirty. The trade-off between this and server fetchers is covered in server scrapers versus browser extensions.

The money paragraph

A web clip is only as queryable as its weakest frontmatter field. Across a vault of a few hundred notes, six consistently-typed fields — source, title, author, published, captured, and a tags list — are enough to answer every question worth asking, while a single inconsistent field name (Author for author, a comma string for a tags list) silently removes a note from every Dataview query that should have found it. The schema is small on purpose; its value comes from being identical across all of it.

TL;DR and next step

Treat frontmatter as the structured layer of every web clip, not an afterthought. Use the six-field schema — quote source, separate published from captured, and keep tags a real YAML list — so Obsidian Properties type each field correctly and Dataview can query them. Land every clip in a flat Clippings/ inbox with one shared template, and migrate to Reference/ only when you cite a note. The next step is to make the capture itself produce this shape automatically: install BulkMD from the Chrome Web Store, point its Obsidian export at your vault, and your clips arrive frontmatter-first and Dataview-ready from the first one.

Frequently asked questions

What is the difference between Obsidian frontmatter and Properties?

They are two views of the same data. Frontmatter is the YAML block stored at the top of the .md file on disk; Properties is the typed UI Obsidian renders above the note body to edit that YAML. Since Obsidian 1.4, the type you write in YAML (a list, a date, a number) determines how Properties displays and validates the field across the whole vault.

Why should I separate 'published' and 'captured' dates?

They answer different questions. 'published' is the article's own publication date, useful for judging how current the source is. 'captured' is when the note entered your vault, useful for reconstructing the timeline of your own reading and research. Collapsing both into one 'date' field destroys both signals, and you cannot recover them later.

Can the obsidian://new URI scheme set frontmatter directly?

Not through a dedicated parameter. You include the YAML frontmatter block at the start of the 'content' parameter, exactly as it appears on disk, and URL-encode the whole value. There is no separate frontmatter argument, so the clipper is responsible for assembling the YAML before encoding.

Why do my tags not show up in Dataview queries?

Almost always a type mismatch. If 'tags' was written as a comma-separated string instead of a YAML list, Obsidian types the property as text vault-wide, and Dataview's contains() cannot search it. Rewrite the field as a YAML sequence (each tag on its own line with a leading dash) and the query starts matching.

Does BulkMD's Obsidian export work with the Dataview plugin?

Yes. BulkMD emits frontmatter with correctly-typed fields: quoted source URL, bare-date published, datetime captured, and a real YAML list for tags, which is exactly what Dataview's type-aware operators need. The clips drop into your vault already queryable, with no manual reshaping of the metadata.

About the author

M. H. Tawfik

Lead Developer & Owner

Working from Kushtia, Bangladesh.

Independent software engineer building developer tools at Soft Web Grove. Creator and maintainer of BulkMD.

Reach the team at [email protected] — typically within 24 hours, any day of the year. Soft Web Grove also takes a small number of outside engagements; details on the about page.

ShareXinHN
TaggedObsidianMarkdownBulk exportNotion