Skip to content

feat(docs): serve per-page markdown variants for LLMs#22712

Draft
critesjosh wants to merge 3 commits into
nextfrom
critesjosh/docs-markdown-variants
Draft

feat(docs): serve per-page markdown variants for LLMs#22712
critesjosh wants to merge 3 commits into
nextfrom
critesjosh/docs-markdown-variants

Conversation

@critesjosh
Copy link
Copy Markdown
Contributor

Summary

  • Generate clean .md siblings for every versioned developer/operate docs page during build.
  • Serve them two ways: append .md to the URL, or send Accept: text/markdown on the canonical URL.
  • Content negotiation follows acceptmarkdown.com and RFC 9110.

What's included

  • docs/scripts/generate_markdown_variants.js — walks developer_versioned_docs/ and network_versioned_docs/, strips frontmatter/imports, resolves partial imports, flattens <Tabs>/<TabItem> to ## headings, rewrites <Image img={require(...)} /> to markdown image syntax, renders <General.*>/<Fees.*>/top-level snippets via esbuild + react-dom/server, and drops <DocCardList/>. A drift check fails the build if any unhandled <CapitalizedTag> remains, so new MDX components surface immediately.
  • docs/netlify/edge-functions/accept-markdown.ts — RFC 9110 content negotiation. Passes through by default; rewrites to the .md sibling only when text/markdown is strictly preferred over text/html; returns 406 when the client explicitly rejects both.
  • docs/netlify.toml — sets Content-Type: text/markdown; charset=utf-8, Vary: Accept, and X-Robots-Tag: noindex for /*.md (avoids duplicate-content penalty while keeping CDN caching correct).
  • docs/package.json — runs the generator after docusaurus build, before append_api_docs_to_llms.js.
  • docs/docs-developers/ai_tooling.md — documents both access paths for LLM tools.
  • docs/docs-words.txt — adds acceptmarkdown for cspell.

Test plan

  • yarn build completes without drift-check failures
  • build/developers/<slug>.md exists for every mainnet developer doc; same for build/operate/
  • Testnet variants land under build/developers/testnet/ and build/operate/testnet/
  • Deploy preview: curl https://<preview>/developers/<slug>.md returns markdown with Content-Type: text/markdown
  • Deploy preview: curl -H 'Accept: text/markdown' https://<preview>/developers/<slug> returns the same markdown
  • Deploy preview: curl https://<preview>/developers/<slug> still returns HTML (default browser request)
  • Deploy preview: curl -H 'Accept: image/png' https://<preview>/developers/<slug> returns 406

Generate clean .md siblings for every versioned developer/operate page at
build time and serve them either via `.md` URL suffix or `Accept: text/markdown`
content negotiation (per acceptmarkdown.com / RFC 9110), so LLMs and CLI tools
can consume docs without scraping rendered HTML.
- Strip trailing /index from slugs so /developers/docs/aztec-js.md resolves
  to the section landing page instead of being missing.
- Honour CONTEXT env for the operate instance: mainnet routes to /operate/ in
  production and /operate/alpha/ elsewhere, matching docusaurus.config.js.
- Rewrite Markdown images that use the @site/static alias so the generated
  .md doesn't leak a Docusaurus-only path.
- Make snippet replacement context-aware: inline inside list items, padded
  with blank lines only when the tag stands alone.
- Fix snippet HTML -> markdown so <p>...<ul>...</ul></p> doesn't collapse
  into "text:- item".
- Skip snippet exports that throw during render instead of failing the build.
- Tidy dead code (KNOWN_BENIGN_TAGS, unused args in outputPathFor) and
  align the Accept tie-breaker comment with the RFC-compatible behaviour.
react-dom/server emits `&#x27;` for apostrophes, which the prior literal
entity table missed. Handle any decimal or hex numeric reference so snippets
don't leak `protocol&#x27;s` into generated markdown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant