feat(docs): make docs.rapidata.ai discoverable to AI agents#637
Conversation
The ora Agent Readiness scan scored docs.rapidata.ai 15/100 because the site exposes nothing at its root for crawlers/agents: /robots.txt and /llms.txt both 404, and / is a contentless JS redirect to /latest/. mike publishes everything under per-version subdirectories, so the llms.txt the build already generates is buried at /latest/llms.txt where agents never look. Publish curated root files (robots.txt, llms.txt, a content+JSON-LD landing page) and copy the generated llms-full.txt to the site root via a post-mike step in the deploy workflow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Co-Authored-By: lino <lino@rapidata.ai>
Code ReviewThis PR addresses a real discoverability gap: 🔴 Bug —
|
Why
ora's Agent Readiness scan gave docs.rapidata.ai 15/100 (F) — Discovery 1/22, Identity 2/22, Access 8/34. The agent review said it "couldn't locate any public API endpoints, documentation, or a developer portal."
Root cause is discoverability, not missing content. The docs, examples and auth instructions all exist — but nothing is reachable where a crawler/agent looks:
docs.rapidata.ai/robots.txt→ 404docs.rapidata.ai/llms.txt→ 404 (the build does generate one, butmikeburies it at/latest/llms.txt)docs.rapidata.ai/→ a contentless JS redirect to/latest/(no title, description, links or structured data) — which is why Identity/Discovery score near zero.What
Adds a
site_root/directory with the files that must live at the site root, plus a post-mikestep indeploy_doc.ymlthat copies them onto the gh-pages root (mikeonly manages per-version subdirs, so root files are otherwise lost):robots.txt— allows all crawlers, lists AI crawlers (GPTBot, ClaudeBot, PerplexityBot, …) explicitly, points to the sitemap.llms.txt— curated llms.txt index: what Rapidata is, how to install/authenticate the SDK, and links to every guide/example/the API reference.index.html— root landing page with a real<h1>, description, link list and JSON-LD (Organization+WebSite+SoftwareApplication) for crawlers, keeping the JS redirect to/latest/for humans.llms-full.txt— the per-version generated file is mirrored to the root by the workflow.Scope / deliberately not done
llms.txtpoints agents there instead of leaking internal endpoints.llms.txtand the landing page now make the token/auth path discoverable.Effect & verification
Takes effect on the next Deploy Documentation run (manual
workflow_dispatch). After deploy,/robots.txt,/llms.txtand/llms-full.txtresolve at the root and/serves crawlable content. YAML and JSON-LD validated locally. All linked/latest/…URLs verified to return 200.🔗 Session: https://session-b4f9bfe9.poseidon.rapidata.internal/