fix(wiki-explorer): prevent path-traversal SSRF bypass in URL validation#648
Open
sebastiondev wants to merge 1 commit intomodelcontextprotocol:mainfrom
Open
fix(wiki-explorer): prevent path-traversal SSRF bypass in URL validation#648sebastiondev wants to merge 1 commit intomodelcontextprotocol:mainfrom
sebastiondev wants to merge 1 commit intomodelcontextprotocol:mainfrom
Conversation
Replace raw regex matching on the URL string with parsed URL validation using the URL constructor. This prevents path-traversal attacks where URLs like `/wiki/../../w/api.php` pass the regex but resolve to paths outside `/wiki/` after URL normalization. Also set `redirect: "error"` on fetch() to prevent following redirects to non-Wikipedia domains. CWE-918
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
get-first-degree-linkstool inexamples/wiki-explorer-servervalidates the user-suppliedurlargument with a regex applied to the raw string:A URL like
https://en.wikipedia.org/wiki/../../w/api.php?action=query&list=allusersmatches this regex, butfetch()performs WHATWG URL normalization which collapses the../..segments. The request that actually leaves the server targetshttps://en.wikipedia.org/w/api.php?...— the MediaWiki API, which the/wiki/filter was meant to keep out of reach.Confirmed in a Node REPL:
examples/wiki-explorer-server/server.ts,get-first-degree-linkstool handlerarguments.url(attacker-controlled) → regex check (string-level only) →fetch(url)(URL-normalized) → response body returned to the callerImpact (honest scoping)
The host regex itself is correct and unchanged, so the request still lands on
*.wikipedia.org. This is not a classic SSRF that reaches127.0.0.1or cloud metadata. The realistic impact is that an LLM-supplied argument can cause the tool to hit MediaWiki API endpoints (e.g.action=query&list=allusers) and surface their JSON/HTML response as if it were a Wikipedia article — bypassing the namespace allowlist the tool tries to enforce, and producing unexpected data in the model's context. It's a defensive hardening fix rather than a critical SSRF, but the bypass of the explicit/wiki/filter is real.Fix
Two small, targeted changes in
server.ts:isValidWikipediaUrl) that validatesparsed.protocol,parsed.hostname, and cruciallyparsed.pathname.startsWith("/wiki/")after normalization. This closes the path-traversal bypass.{ redirect: "error" }tofetch()so a Wikipedia redirect cannot be used to send the request to a different host or path that the validator never saw.The host allowlist (
<lang>.wikipedia.org) is preserved exactly.Tests
Added
examples/wiki-explorer-server/server.test.tswith cases that:https://evil.com/wiki/Test)https://en.wikipedia.org/wiki/../../w/api.php?...)/wiki/paths on a valid hostThe traversal test fails against the old code and passes against the patched code.
Adversarial review
Before submitting we tried to talk ourselves out of this. The host check is unchanged and correct, so reachable targets are still limited to public Wikipedia mirrors — there's no path to internal services or cloud metadata. The server is an example MCP server typically run locally over stdio, so the attacker model is "a malicious tool-call argument coming from the LLM/client", not a remote network attacker. We still think it's worth fixing: the
/wiki/namespace filter is a security boundary the tool explicitly tries to enforce, the bypass is trivial to trigger, and the fix is two lines plus tests with no behaviour change for legitimate URLs.cc @lewiswigmore