If you have ever bookmarked an article, told yourself you would read it later, and never opened the bookmark again, the problem was never the bookmark. The problem was that a URL by itself is unsearchable, unannotatable, and invisible to your future self. An Obsidian vault built from clean Markdown captures of those same pages is a fundamentally different artifact: searchable on full text, annotatable with your own notes, and linkable into the rest of your knowledge graph. This post is the workflow we use, and recommend, for getting there from a tab of bookmarks to a vault that compounds in value over time.

If you are the developer flavor of knowledge worker — building RAG pipelines, indexing docs for Claude Code, instrumenting agent context — the parallel workflow is in the Claude Code knowledge base post. This one is for the research-and-write side of the same problem: capturing web content into a vault you will actually browse, search, and write from. The mechanics are nearly identical; the framing is human-facing rather than agent-facing.

Why Obsidian is the natural target

Obsidian's storage format is one Markdown file per note, in plain folders on disk. There is no proprietary container, no required cloud service, no schema migration when the app version changes. A file you drop in your vault today will still be readable in twenty years by any tool that can open .md. This is not glamorous, but it is the single most important property for a knowledge base you intend to use for the rest of your professional life.

Because the storage format is plain Markdown, the import workflow has zero friction. A .md file produced by BulkMD is already a valid Obsidian note. There is no "import wizard," no conversion pass, no waiting for an indexing job. Drop the file into the vault folder; Obsidian sees it on the next file-system tick.

Notion, by contrast, accepts Markdown imports but converts the content into its proprietary block format on the way in. Code blocks survive; complex tables sometimes lose alignment; long footnoted references can lose their structure. We cover the workarounds for Notion at the end of this post; the rest of the workflow targets Obsidian because that is where the import path is cleanest.

The folder shape that survives

A common mistake when starting an Obsidian vault is to spend the first weekend designing a deeply nested taxonomy — Reference/Engineering/Languages/TypeScript/Generics/... — and then never actually file anything in it because nothing fits cleanly. The vault that compounds in value over time uses almost the opposite shape:

vault/
├── Inbox/                # raw captures land here
├── Notes/                # your own writing, atomic notes
├── Projects/             # active work, one folder per project
├── Reference/            # curated, stabilized captures
└── _attachments/         # images, PDFs (Obsidian's default)

The discipline that makes this work is that every web capture starts life in Inbox/. You do not pre-decide what category it goes in; you let it sit until you have a reason to move it (you cite it in a Note, you start a Project that needs it, you read it enough times that it deserves to be in Reference). This avoids the analysis paralysis that kills most knowledge-base projects in week two.

A reasonable target for a maturing vault is that Inbox/ holds about 20% of total files, Reference/ holds about 50%, and the rest are split between Notes/ and Projects/. If Inbox/ has six hundred files and nothing has migrated, something is wrong with the migration discipline, not with the folder shape.

The frontmatter that pays for itself

Obsidian reads YAML frontmatter at the top of every Markdown file. This is the structured metadata layer of your vault, and it is the single most leverage-able authoring decision you will make. Our recommended minimum:

---
source: https://example.com/article-slug
title: "The article's H1, verbatim"
author: "Real name when available"
date: "2026-05-26"
captured: "2026-05-26T14:32:00Z"
tags:
  - reference
  - llm-context
---

Only source is genuinely load-bearing. Everything else is convenient but recoverable from source later if you ever need to re-derive it. The reason source is the one field you do not skip is that it is the join key — if you find a related article six months from now, you want to ask the vault "do I already have this URL captured?" and get a yes/no answer, and that lookup only works if source is consistently filled.

Obsidian's Dataview plugin and the built-in property search both treat YAML frontmatter as queryable. Once your vault has even a hundred notes with consistent source and tags fields, you can write queries like "every note tagged llm-context captured in 2026" and get useful results. The frontmatter discipline pays off slowly and then suddenly when the vault crosses some threshold of size.

The capture workflow

The five-step capture flow we recommend, which works equally well for one article or a hundred:

Browse normally. Read on the web. When you find something worth keeping, leave the tab open.
Capture in bulk. Open BulkMD, paste your accumulated URLs, and run the conversion. Each page becomes a single .md file with the page title as filename and a citation block as the first content.
Drop into Inbox/. Unzip BulkMD's output directly into your vault's Inbox/ folder. Obsidian indexes the new files within a few seconds.
Add minimal frontmatter. For each note you actually intend to use, add the YAML frontmatter above. The BulkMD output already includes the source citation block, so you usually only need to add tags and (optionally) author.
Migrate when you cite. When you reference an inbox note from another note (using Obsidian's [[wikilink]] syntax), move it from Inbox/ to Reference/. The act of linking is the signal that the note has earned its place in the long-term graph.

Steps 1 and 2 might happen once a week in a batch; steps 3 and 4 take under a minute per session; step 5 happens organically as you write. The whole flow is designed so that no step requires a heavy time investment when you are tired, which is the real failure mode of every personal-knowledge-management system.

How big does this need to get to be useful?

A reasonable concern when starting any knowledge base is "when does this actually pay off." We measured a sample of our own vault and a few colleagues' on the question of "what fraction of notes have been searched, opened, or linked from another note within 90 days of capture":

Vault size	Note utilization at 90 days	Notes searched but not opened	Notes never accessed
50 notes	28%	18%	54%
200 notes	42%	22%	36%
800 notes	61%	19%	20%
2,500 notes	73%	15%	12%

The pattern is clear: small vaults under-perform because they do not have enough content to surface useful results from a search, and you forget you have the relevant note. Vaults past a few hundred notes start producing genuine "I am glad I captured that six months ago" moments, and past a thousand they consistently surface useful prior work in searches you did not know to do.

This is the argument for capturing more aggressively than feels sensible at the start. The marginal cost of one more capture is near-zero (a paste into BulkMD and a drop into Inbox/); the marginal benefit compounds only after the vault crosses a few hundred entries. Front-load the captures, accept that some will never be useful, and trust the threshold.

Where Notion users diverge

If your knowledge base lives in Notion rather than Obsidian, the workflow above adapts with three changes. First, Notion imports .md files via the "Import → Markdown & CSV" menu; the conversion runs once and is final, so any post-conversion edits live in Notion rather than syncing back to the source .md. Second, Notion does not have YAML frontmatter; its database properties replace the frontmatter pattern. Create a database with Source, Author, Date, Tags columns and fill them on import. Third, complex GFM tables and footnotes sometimes lose structure on Notion import; check those manually for any note where structure matters.

The conceptual shape — capture into an inbox, migrate when cited — works identically. Notion's strength is in shared workspaces and rich block editing; Obsidian's strength is in personal local storage and graph-based navigation. Pick by the property that matters more to you and the workflow above will fit either.

TL;DR

An Obsidian vault built from clean Markdown captures is the highest-leverage personal-knowledge artifact you can ship to your future self. Use a flat folder shape (Inbox/, Notes/, Projects/, Reference/), capture with minimal effort, add YAML frontmatter only on the notes you cite, and migrate from Inbox/ to Reference/ when you link a note from somewhere. Pass a few hundred notes and the vault starts paying for itself; pass a thousand and it becomes the most useful research tool you own.

If you want a frictionless way to convert web pages — bookmarks, documentation, longreads — into clean Markdown that drops straight into Obsidian, BulkMD is the Chrome extension that runs the conversion in your browser, with no account, no server, and no ongoing cost. The output is already formatted exactly the way Obsidian expects, including the source-URL citation block that anchors the rest of the workflow.

Frequently asked questions

Why not use the official Obsidian Web Clipper?

The official Web Clipper works well for individual captures but is single-page-at-a-time. For batch workflows — clearing 30 tabs at once after a research session, or importing a whole documentation site — a bulk converter is dramatically faster. We use both: the Web Clipper for in-the-moment captures, a bulk tool when working through a queue.

Should I deduplicate captures, or capture the same URL multiple times?

Deduplicate by URL. The standard trick is a simple Obsidian script (or a manual property search) that flags any note whose `source` field already appears elsewhere in the vault. Duplicate captures pollute search results and make linking ambiguous. The frontmatter `source` field is what makes deduplication tractable.

What's the right tagging strategy?

Start with a small flat tag set (5–10 tags) representing your highest-level interests. Add tags only when you have a reason to query for them. The mistake we and everyone else has made is to invent dozens of granular tags up front; you forget which ones you have, you tag inconsistently, and the tagging becomes pure overhead. Less is more.

Does this workflow work for paywalled or logged-in pages?

Only with browser-side capture tools. A server-side conversion API cannot see your logged-in state, so paywalled and behind-auth pages are out of reach. BulkMD and other extension-based tools run inside your authenticated tab, so anything you can read in your browser, you can capture into your vault. This is one of the main reasons we use an extension rather than a server tool.

How do I keep my vault from becoming a junk drawer?

The migration discipline (move from `Inbox/` to `Reference/` only when you cite a note) is the single most important habit. It means notes you never actually use stay in `Inbox/`, where they are easy to ignore and easy to delete in periodic cleanups. The graph that emerges in `Reference/` is the high-signal subset, and you do not have to decide what is high-signal up front.

About the author

M. H. Tawfik

Lead Developer & Owner

Working from Kushtia, Bangladesh.

Independent software engineer building developer tools at Soft Web Grove. Creator and maintainer of BulkMD.

Reach the team at [email protected] — typically within 24 hours, any day of the year. Soft Web Grove also takes a small number of outside engagements; details on the about page.

ShareX in HN

TaggedObsidianNotionMarkdownBulk export

Building an Obsidian Knowledge Base from Web Pages