markfetch

Reader View for AI agents and your shell. Fetch any URL, get back clean markdown — with a real Chrome's request fingerprint, not curl's.

The built-in fetch tools that ship with AI coding agents return raw HTML, broken markdown, or 403 from Cloudflare more often than you'd like. markfetch sends HTTP/2 with a coherent Chrome header set so bot-detection systems see a real browser, then runs the response through the same Reader View pipeline your browser uses (Mozilla's Readability → turndown). The output is markdown indistinguishable from a human running "Save as Markdown" — on sites that would block a naive curl.

One command, two surfaces:

CLI — pass a URL. Print to stdout or -o to a file.

npm i -g markfetch

markfetch https://en.wikipedia.org/wiki/Markdown

MCP stdio server — bare invocation. Drop into Claude Desktop / Claude Code / Cursor / Goose / any stdio-MCP client.

{
  "mcpServers": {
    "markfetch": {
      "command": "npx",
      "args": ["-y", "markfetch"]
    }
  }
}

That snippet is the whole MCP setup — or jump to CLI usage to drive the same command from a shell.

MCP install commands

Claude Code

claude mcp add --scope user markfetch -- npx -y markfetch

Codex

codex mcp add markfetch -- npx -y markfetch

Gemini CLI

gemini mcp add -s user markfetch npx -y markfetch

Why markfetch?

	Real-browser fingerprint	Reader-View extraction	Structured errors	Zero config
Built-in agent fetch tools	–	–	–	✓
Generic Playwright / Puppeteer	✓	–	–	–
`mcp-server-fetch` (Python)	–	basic	–	–
CloudFlare `/markdown`	✓	✓	–	paid
`markfetch`	✓	✓	✓ (8 codes)	✓

Real-browser HTTP/2 + Chrome fingerprint. ALPN-negotiated h2, User-Agent, Sec-CH-UA-*, Sec-Fetch-*, Accept-*. A Chrome UA with no client hints is a stronger automation signal than curl — markfetch sends the full coherent set, derived from the UA at startup so an override stays internally consistent.
Reader-View-quality extraction. linkedom → @mozilla/readability → turndown with GFM tables, strikethrough, and task lists. Code fences preserve language-X hints. Sphinx-style bare <pre> blocks render as code, not escaped prose. Intraword underscores stay un-escaped — no more list\_tools.
One tool, one shape (MCP). fetch_markdown(url, savePath?) returns markdown in content[0].text. No structuredContent, no frontmatter, no metadata fields. Several major MCP clients (Claude Code CLI, VS Code/Copilot) forward only structuredContent to the model and drop content[] when both are present — markfetch deliberately stays on the channel your LLM can actually read.
savePath / -o escape valve. Pass an absolute path (MCP savePath) or -o <path> (CLI) and the markdown lands on disk instead of the response channel. Use it when your client's inline tool-result cap would truncate large responses, or to redirect output from a shell pipeline. The file is only ever the markdown of the URL — fetch errors return a [code] string and never touch the disk.
Whole document or honest failure. No pagination, no truncation. If the document doesn't fit in MARKFETCH_MAX_BYTES, you get too_large — never a half-truth.
Stdio-clean. Stdout is reserved for MCP frames. Stderr is fatal-only. No log spam, no ANSI escapes — keeping stderr parseable for shell consumers.
Pure Node, no subprocesses. No Playwright, no headless Chromium, no Python hop. Single Node process — one Node process whether you invoke it as an MCP server or from the shell.

CLI usage

markfetch doubles as a shell tool: when invoked with at least one argument it parses argv as a CLI instead of starting the MCP server. Bare invocation (zero args) keeps the existing MCP-server behavior — every MCP client config in the wild keeps working unchanged.

# Print clean markdown to stdout
npx -y markfetch https://example.com/article

# Save to a file (absolute or relative path)
npx -y markfetch https://example.com/article -o article.md

# Pipe into another tool
npx -y markfetch https://example.com/article | pandoc -o article.pdf

For repeat use, install once:

npm i -g markfetch         # then anywhere: markfetch <url>
# or, as a project devDependency
npm i -D markfetch         # then in package.json scripts: "markfetch <url>"

Flags:

Flag	Purpose
`-o, --output <path>`	Save markdown to a file (absolute or relative path). Default is stdout.
`-V, --version`	Print version and exit.
`-h, --help`	Print usage and exit.

Errors go to stderr with the same [code] message shape the MCP tool returns (see the table below), and the process exits with a non-zero status. The same env vars (MARKFETCH_TIMEOUT_MS, MARKFETCH_MAX_BYTES, MARKFETCH_USER_AGENT) apply in both modes. MARKFETCH_ALLOWED_WRITE_ROOTS is MCP-only — see Write sandbox.

Errors carry one of eight deterministic codes:

Code	Meaning
`network_error`	DNS / TCP / TLS failure, or an unexpected internal error from the fetcher.
`http_error`	Upstream returned a non-2xx status.
`timeout`	Per-request budget `MARKFETCH_TIMEOUT_MS` exceeded.
`unsupported_content_type`	Response was not `text/html` or `application/xhtml+xml`.
`extraction_failed`	Readability returned no article content (typical for pure client-rendered SPAs).
`too_large`	Response body or extracted markdown exceeded `MARKFETCH_MAX_BYTES`.
`save_failed`	`savePath` was given but `writeFile` failed (parent directory missing, permission denied, etc.).
`save_forbidden`	`savePath` resolves outside the allowed write roots — see Write sandbox. MCP-only; the CLI has no sandbox.

What it is not

Not a crawler. No recursion, no robots.txt parsing, no rate-limit orchestration. One URL in, one document out.
Not authenticated. Anonymous fetch only — no cookie jar, no auth headers, no session reuse. Pages behind login walls return whatever the public response is, usually surfaced as http_error.
Not a JS renderer. Pure client-rendered SPAs with no static content return extraction_failed. SPAs with server-rendered or SEO-prerendered HTML will extract whatever static content they ship.

Configuration

Variable	Default	Purpose
`MARKFETCH_TIMEOUT_MS`	`30000`	Per-request timeout in ms
`MARKFETCH_MAX_BYTES`	`5000000`	Cap on response body and extracted markdown
`MARKFETCH_USER_AGENT`	Pinned Chrome 130 string	Override the UA. Must be a Chrome UA — `Sec-CH-UA-*` client hints are derived from it at startup; non-Chrome strings fail fast
`MARKFETCH_ALLOWED_WRITE_ROOTS`	`os.tmpdir()` + `process.cwd()`	MCP-only. Path-delimiter-separated list of absolute paths permitted as MCP `savePath` write roots. Replaces the defaults entirely — see Write sandbox

Pass overrides via the env block of your MCP client config:

{
  "mcpServers": {
    "markfetch": {
      "command": "npx",
      "args": ["-y", "markfetch"],
      "env": {
        "MARKFETCH_TIMEOUT_MS": "60000"
      }
    }
  }
}

Write sandbox

MCP savePath writes are confined to a set of allowed root directories. By default the allowed set is os.tmpdir() ∪ process.cwd() (each resolved via fs.realpath once at startup). A savePath outside that set returns save_forbidden and no file is created.

Override the default set with MARKFETCH_ALLOWED_WRITE_ROOTS — a list of absolute paths separated by the platform's path delimiter (: on POSIX, ; on Windows). When set, the override replaces the defaults entirely — it does not merge. To keep os.tmpdir() or process.cwd() accessible, list them yourself; the example below shows /tmp for that reason. A malformed value (non-absolute entry, or a directory that doesn't exist) fails fast on stderr at startup.

{
  "mcpServers": {
    "markfetch": {
      "command": "npx",
      "args": ["-y", "markfetch"],
      "env": {
        "MARKFETCH_ALLOWED_WRITE_ROOTS": "/Users/me/markfetch-out:/tmp"
      }
    }
  }
}

On Windows, use backslashes and ; as the delimiter:

{
  "mcpServers": {
    "markfetch": {
      "command": "npx",
      "args": ["-y", "markfetch"],
      "env": {
        "MARKFETCH_ALLOWED_WRITE_ROOTS": "C:\\Users\\me\\markfetch-out;C:\\Users\\me\\AppData\\Local\\Temp"
      }
    }
  }
}

Notes:

The sandbox is MCP-only by design. The CLI is unrestricted — a human at the shell is the security boundary, and the markfetch CLI doesn't run any sandbox check at all. The asymmetry exists because the MCP tool is driven by a language model, which may be steered by content from a page it just fetched.
Symlinks pointing outside are blocked. Each candidate savePath is resolved via fs.realpath to its real destination before the containment check, so a symlink planted inside the sandbox cannot be used to escape.
Containment is case-insensitive on Windows (C:\Users\Bob and c:\users\bob are the same path).

Develop

Requires Node.js ≥ 24. Tested on Linux, macOS, and Windows in CI.

When iterating on CLI changes, tsx src/index.ts <url> and tsx src/index.ts --help route through the same argv-discriminated dispatcher as the built dist/index.js — no rebuild needed between edits.

To point an MCP client at a local source build, swap npx for node + an absolute path to dist/index.js:

{
  "mcpServers": {
    "markfetch": {
      "command": "node",
      "args": ["/absolute/path/to/markfetch/dist/index.js"]
    }
  }
}

Responsible use

markfetch is a per-call fetch tool, not a crawler. Use it on URLs whose targets you have permission to fetch, and respect the terms of service of any site you query. The maintainer assumes no liability for misuse — see LICENSE.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
scripts		scripts
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.mcp.json		.mcp.json
.npmignore		.npmignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

markfetch

MCP install commands

Claude Code

Codex

Gemini CLI

Why markfetch?

CLI usage

What it is not

Configuration

Write sandbox

Develop

Responsible use

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

markfetch

MCP install commands

Claude Code

Codex

Gemini CLI

Why markfetch?

CLI usage

What it is not

Configuration

Write sandbox

Develop

Responsible use

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages