RapidataAI · LinoGiger · Jun 26, 2026 · Jun 26, 2026
diff --git a/.github/workflows/deploy_doc.yml b/.github/workflows/deploy_doc.yml
@@ -56,3 +56,29 @@ jobs:
           else
             uv run mike deploy --push ${{ github.event.inputs.version_alias }}
           fi
+      - name: Publish site-root files (llms.txt, robots.txt, landing page)
+        # mike serves every version under a subdirectory, so the generated
+        # llms.txt/robots.txt live under /latest/ and are invisible at the site
+        # root where crawlers and AI agents look. Mirror the curated root files
+        # (and the generated llms-full.txt) into the gh-pages root after mike has
+        # written the version. mike rewrites the root index.html as a bare
+        # redirect on set-default, so this must run last to keep our richer one.
+        if: hashFiles('site_root/**') != ''
+        run: |
+          set -euo pipefail
+          git fetch origin gh-pages
+          git worktree add ghpages gh-pages
+          cp site_root/robots.txt ghpages/robots.txt
+          cp site_root/llms.txt   ghpages/llms.txt
+          cp site_root/index.html ghpages/index.html
+          if [ -f ghpages/latest/llms-full.txt ]; then
+            cp ghpages/latest/llms-full.txt ghpages/llms-full.txt
+          fi
+          cd ghpages
+          git add robots.txt llms.txt index.html llms-full.txt
+          if git diff --cached --quiet; then
+            echo "Site-root files already up to date."
+          else
+            git commit -m "chore(docs): publish root llms.txt, robots.txt and landing page"
+            git push origin gh-pages
+          fi
diff --git a/site_root/README.md b/site_root/README.md
@@ -0,0 +1,20 @@
+# `site_root/`
+
+Files published to the **root** of the documentation site (`docs.rapidata.ai/`),
+outside of `mike`'s per-version subdirectories.
+
+`mike` serves each docs version under its own path (`/latest/`, `/3.x/`, …), so
+files generated by the build — including the per-version `llms.txt` and
+`llms-full.txt` — are only reachable under `/latest/…`. Crawlers and AI agents
+look for `llms.txt` and `robots.txt` at the site root, so the `Deploy
+Documentation` workflow copies these files into the gh-pages root after `mike`
+has written the version (see `.github/workflows/deploy_doc.yml`).
+
+| File | Purpose |
+|------|---------|
+| `robots.txt`  | Allows all crawlers (AI crawlers listed explicitly) and points to the sitemap. |
+| `llms.txt`    | Curated [llms.txt](https://llmstxt.org/) index of the docs and how to integrate. |
+| `index.html`  | Root landing page: real content + structured data for crawlers, JS redirect to `/latest/` for humans. |
+
+`llms-full.txt` is **not** stored here — it is generated per build and copied to
+the root from `/latest/llms-full.txt` by the workflow.
diff --git a/site_root/index.html b/site_root/index.html
@@ -0,0 +1,88 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="utf-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <title>Rapidata Python SDK — Documentation</title>
+  <meta name="description" content="Documentation for the Rapidata Python SDK: request human feedback at scale — crowd-sourced labeling, model evaluation, ranking, and preference data — directly from Python.">
+  <link rel="canonical" href="https://docs.rapidata.ai/latest/">
+
+  <meta property="og:type" content="website">
+  <meta property="og:site_name" content="Rapidata Documentation">
+  <meta property="og:title" content="Rapidata Python SDK — Documentation">
+  <meta property="og:description" content="Request human feedback at scale — labeling, model evaluation, ranking, and preference data — directly from Python.">
+  <meta property="og:url" content="https://docs.rapidata.ai/">
+  <meta name="twitter:card" content="summary">
+  <meta name="twitter:title" content="Rapidata Python SDK — Documentation">
+  <meta name="twitter:description" content="Request human feedback at scale, directly from Python.">
+
+  <script type="application/ld+json">
+  {
+    "@context": "https://schema.org",
+    "@graph": [
+      {
+        "@type": "Organization",
+        "@id": "https://www.rapidata.ai/#organization",
+        "name": "Rapidata",
+        "url": "https://www.rapidata.ai",
+        "logo": "https://docs.rapidata.ai/media/rapidata.svg",
+        "sameAs": [
+          "https://github.com/RapidataAI",
+          "https://pypi.org/project/rapidata/",
+          "https://www.linkedin.com/company/rapidata-ai"
+        ]
+      },
+      {
+        "@type": "WebSite",
+        "@id": "https://docs.rapidata.ai/#website",
+        "name": "Rapidata Python SDK Documentation",
+        "url": "https://docs.rapidata.ai/",
+        "publisher": { "@id": "https://www.rapidata.ai/#organization" }
+      },
+      {
+        "@type": "SoftwareApplication",
+        "name": "Rapidata Python SDK",
+        "applicationCategory": "DeveloperApplication",
+        "operatingSystem": "OS Independent",
+        "url": "https://docs.rapidata.ai/latest/",
+        "downloadUrl": "https://pypi.org/project/rapidata/",
+        "softwareHelp": "https://docs.rapidata.ai/latest/quickstart/",
+        "publisher": { "@id": "https://www.rapidata.ai/#organization" },
+        "offers": { "@type": "Offer", "price": "0", "priceCurrency": "USD" }
+      }
+    ]
+  }
+  </script>
+
+  <!-- Send human visitors to the default (latest) version. Crawlers and agents
+       that do not execute JavaScript still get the content and links below. -->
+  <script>
+    window.location.replace("latest/" + window.location.search + window.location.hash);
+  </script>
+  <noscript><meta http-equiv="refresh" content="0; url=latest/"></noscript>
+</head>
+<body>
+  <main>
+    <h1>Rapidata Python SDK</h1>
+    <p>
+      Rapidata provides human feedback at scale — crowd-sourced labeling, model
+      evaluation, ranking, and preference data collected from real people. The
+      supported way to access it programmatically is the Rapidata Python SDK,
+      documented here.
+    </p>
+    <p>Install with <code>pip install rapidata</code>, then authenticate with a token from
+      <a href="https://app.rapidata.ai/settings/tokens">app.rapidata.ai/settings/tokens</a>.</p>
+    <ul>
+      <li><a href="latest/">Documentation home</a></li>
+      <li><a href="latest/quickstart/">Quick Start</a></li>
+      <li><a href="latest/starting_page/">Overview &amp; core concepts</a></li>
+      <li><a href="latest/api/">API reference</a></li>
+      <li><a href="latest/ai_agents/">Use Rapidata from your AI agent</a></li>
+      <li><a href="llms.txt">llms.txt</a> · <a href="llms-full.txt">llms-full.txt</a></li>
+      <li><a href="https://github.com/RapidataAI/rapidata-python-sdk">Source on GitHub</a></li>
+      <li><a href="https://pypi.org/project/rapidata/">Package on PyPI</a></li>
+    </ul>
+    <p>Redirecting to the <a href="latest/">latest documentation</a>…</p>
+  </main>
+</body>
+</html>
diff --git a/site_root/llms.txt b/site_root/llms.txt
@@ -0,0 +1,54 @@
+# Rapidata
+
+> Rapidata provides human feedback at scale: crowd-sourced labeling, model
+> evaluation, ranking, and preference data (RLHF-style) collected from real
+> people. The supported way to access it programmatically is the Rapidata
+> Python SDK, documented on this site (docs.rapidata.ai).
+
+Integrate via the Python SDK (`pip install rapidata`). Authenticate either with
+an interactive browser login on first run, or with a client ID/secret token
+created at https://app.rapidata.ai/settings/tokens.
+
+## Guides
+
+- [Overview](https://docs.rapidata.ai/latest/starting_page/): what Rapidata does and its core concepts
+- [Quick Start](https://docs.rapidata.ai/latest/quickstart/): install, authenticate, and create your first order
+- [Custom Audiences](https://docs.rapidata.ai/latest/audiences/): target responses by country, language, and qualification
+- [Signals](https://docs.rapidata.ai/latest/signals/)
+- [Parameter Reference](https://docs.rapidata.ai/latest/job_definition_parameters/)
+- [Understanding Results](https://docs.rapidata.ai/latest/understanding_the_results/)
+- [Early Stopping](https://docs.rapidata.ai/latest/confidence_stopping/)
+- [Instruction Design](https://docs.rapidata.ai/latest/human_prompting/)
+- [Error Handling](https://docs.rapidata.ai/latest/error_handling/)
+- [Logging & Config](https://docs.rapidata.ai/latest/config/)
+
+## Examples
+
+- [Classification](https://docs.rapidata.ai/latest/examples/classify_job/)
+- [Comparison](https://docs.rapidata.ai/latest/examples/compare_job/)
+- [Locate](https://docs.rapidata.ai/latest/examples/locate_job/)
+- [Draw](https://docs.rapidata.ai/latest/examples/draw_job/)
+- [Select Words](https://docs.rapidata.ai/latest/examples/select_words_job/)
+- [Free Text](https://docs.rapidata.ai/latest/examples/free_text_job/)
+- [Ranking](https://docs.rapidata.ai/latest/examples/ranking_job/)
+
+## Model ranking & benchmarks
+
+- [Getting Started](https://docs.rapidata.ai/latest/mri/)
+- [Advanced](https://docs.rapidata.ai/latest/mri_advanced/)
+
+## AI agents & API
+
+- [Use Rapidata from your AI agent](https://docs.rapidata.ai/latest/ai_agents/): an official skill that teaches coding agents (Claude Code, Cursor, Copilot, and others) to write Rapidata integrations
+- [API reference](https://docs.rapidata.ai/latest/api/): the `RapidataClient` class and its managers
+
+## Access
+
+- Install: `pip install rapidata`
+- API tokens: https://app.rapidata.ai/settings/tokens
+- Source: https://github.com/RapidataAI/rapidata-python-sdk
+- PyPI: https://pypi.org/project/rapidata/
+
+## Optional
+
+- [llms-full.txt](https://docs.rapidata.ai/llms-full.txt): the full documentation concatenated into a single file
diff --git a/site_root/robots.txt b/site_root/robots.txt
@@ -0,0 +1,38 @@
+# Rapidata SDK documentation — https://docs.rapidata.ai
+# All crawlers, including AI/agent crawlers, are welcome.
+User-agent: *
+Allow: /
+
+# Named AI/agent crawlers, listed explicitly so operators that only honour
+# their own user-agent block still see an Allow.
+User-agent: GPTBot
+Allow: /
+
+User-agent: OAI-SearchBot
+Allow: /
+
+User-agent: ChatGPT-User
+Allow: /
+
+User-agent: ClaudeBot
+Allow: /
+
+User-agent: Claude-User
+Allow: /
+
+User-agent: anthropic-ai
+Allow: /
+
+User-agent: PerplexityBot
+Allow: /
+
+User-agent: Google-Extended
+Allow: /
+
+User-agent: Applebot-Extended
+Allow: /
+
+User-agent: CCBot
+Allow: /
+
+Sitemap: https://docs.rapidata.ai/sitemap.xml