diff --git a/.github/workflows/deploy_doc.yml b/.github/workflows/deploy_doc.yml index 8db2dd493..aadf72c55 100644 --- a/.github/workflows/deploy_doc.yml +++ b/.github/workflows/deploy_doc.yml @@ -56,3 +56,29 @@ jobs: else uv run mike deploy --push ${{ github.event.inputs.version_alias }} fi + - name: Publish site-root files (llms.txt, robots.txt, landing page) + # mike serves every version under a subdirectory, so the generated + # llms.txt/robots.txt live under /latest/ and are invisible at the site + # root where crawlers and AI agents look. Mirror the curated root files + # (and the generated llms-full.txt) into the gh-pages root after mike has + # written the version. mike rewrites the root index.html as a bare + # redirect on set-default, so this must run last to keep our richer one. + if: hashFiles('site_root/**') != '' + run: | + set -euo pipefail + git fetch origin gh-pages + git worktree add ghpages gh-pages + cp site_root/robots.txt ghpages/robots.txt + cp site_root/llms.txt ghpages/llms.txt + cp site_root/index.html ghpages/index.html + if [ -f ghpages/latest/llms-full.txt ]; then + cp ghpages/latest/llms-full.txt ghpages/llms-full.txt + fi + cd ghpages + git add robots.txt llms.txt index.html llms-full.txt + if git diff --cached --quiet; then + echo "Site-root files already up to date." + else + git commit -m "chore(docs): publish root llms.txt, robots.txt and landing page" + git push origin gh-pages + fi diff --git a/site_root/README.md b/site_root/README.md new file mode 100644 index 000000000..319bde718 --- /dev/null +++ b/site_root/README.md @@ -0,0 +1,20 @@ +# `site_root/` + +Files published to the **root** of the documentation site (`docs.rapidata.ai/`), +outside of `mike`'s per-version subdirectories. + +`mike` serves each docs version under its own path (`/latest/`, `/3.x/`, …), so +files generated by the build — including the per-version `llms.txt` and +`llms-full.txt` — are only reachable under `/latest/…`. Crawlers and AI agents +look for `llms.txt` and `robots.txt` at the site root, so the `Deploy +Documentation` workflow copies these files into the gh-pages root after `mike` +has written the version (see `.github/workflows/deploy_doc.yml`). + +| File | Purpose | +|------|---------| +| `robots.txt` | Allows all crawlers (AI crawlers listed explicitly) and points to the sitemap. | +| `llms.txt` | Curated [llms.txt](https://llmstxt.org/) index of the docs and how to integrate. | +| `index.html` | Root landing page: real content + structured data for crawlers, JS redirect to `/latest/` for humans. | + +`llms-full.txt` is **not** stored here — it is generated per build and copied to +the root from `/latest/llms-full.txt` by the workflow. diff --git a/site_root/index.html b/site_root/index.html new file mode 100644 index 000000000..53c82696c --- /dev/null +++ b/site_root/index.html @@ -0,0 +1,88 @@ + + + + + + Rapidata Python SDK — Documentation + + + + + + + + + + + + + + + + + + + +
+

Rapidata Python SDK

+

+ Rapidata provides human feedback at scale — crowd-sourced labeling, model + evaluation, ranking, and preference data collected from real people. The + supported way to access it programmatically is the Rapidata Python SDK, + documented here. +

+

Install with pip install rapidata, then authenticate with a token from + app.rapidata.ai/settings/tokens.

+ +

Redirecting to the latest documentation

+
+ + diff --git a/site_root/llms.txt b/site_root/llms.txt new file mode 100644 index 000000000..d2b80e284 --- /dev/null +++ b/site_root/llms.txt @@ -0,0 +1,54 @@ +# Rapidata + +> Rapidata provides human feedback at scale: crowd-sourced labeling, model +> evaluation, ranking, and preference data (RLHF-style) collected from real +> people. The supported way to access it programmatically is the Rapidata +> Python SDK, documented on this site (docs.rapidata.ai). + +Integrate via the Python SDK (`pip install rapidata`). Authenticate either with +an interactive browser login on first run, or with a client ID/secret token +created at https://app.rapidata.ai/settings/tokens. + +## Guides + +- [Overview](https://docs.rapidata.ai/latest/starting_page/): what Rapidata does and its core concepts +- [Quick Start](https://docs.rapidata.ai/latest/quickstart/): install, authenticate, and create your first order +- [Custom Audiences](https://docs.rapidata.ai/latest/audiences/): target responses by country, language, and qualification +- [Signals](https://docs.rapidata.ai/latest/signals/) +- [Parameter Reference](https://docs.rapidata.ai/latest/job_definition_parameters/) +- [Understanding Results](https://docs.rapidata.ai/latest/understanding_the_results/) +- [Early Stopping](https://docs.rapidata.ai/latest/confidence_stopping/) +- [Instruction Design](https://docs.rapidata.ai/latest/human_prompting/) +- [Error Handling](https://docs.rapidata.ai/latest/error_handling/) +- [Logging & Config](https://docs.rapidata.ai/latest/config/) + +## Examples + +- [Classification](https://docs.rapidata.ai/latest/examples/classify_job/) +- [Comparison](https://docs.rapidata.ai/latest/examples/compare_job/) +- [Locate](https://docs.rapidata.ai/latest/examples/locate_job/) +- [Draw](https://docs.rapidata.ai/latest/examples/draw_job/) +- [Select Words](https://docs.rapidata.ai/latest/examples/select_words_job/) +- [Free Text](https://docs.rapidata.ai/latest/examples/free_text_job/) +- [Ranking](https://docs.rapidata.ai/latest/examples/ranking_job/) + +## Model ranking & benchmarks + +- [Getting Started](https://docs.rapidata.ai/latest/mri/) +- [Advanced](https://docs.rapidata.ai/latest/mri_advanced/) + +## AI agents & API + +- [Use Rapidata from your AI agent](https://docs.rapidata.ai/latest/ai_agents/): an official skill that teaches coding agents (Claude Code, Cursor, Copilot, and others) to write Rapidata integrations +- [API reference](https://docs.rapidata.ai/latest/api/): the `RapidataClient` class and its managers + +## Access + +- Install: `pip install rapidata` +- API tokens: https://app.rapidata.ai/settings/tokens +- Source: https://github.com/RapidataAI/rapidata-python-sdk +- PyPI: https://pypi.org/project/rapidata/ + +## Optional + +- [llms-full.txt](https://docs.rapidata.ai/llms-full.txt): the full documentation concatenated into a single file diff --git a/site_root/robots.txt b/site_root/robots.txt new file mode 100644 index 000000000..2bfda14fe --- /dev/null +++ b/site_root/robots.txt @@ -0,0 +1,38 @@ +# Rapidata SDK documentation — https://docs.rapidata.ai +# All crawlers, including AI/agent crawlers, are welcome. +User-agent: * +Allow: / + +# Named AI/agent crawlers, listed explicitly so operators that only honour +# their own user-agent block still see an Allow. +User-agent: GPTBot +Allow: / + +User-agent: OAI-SearchBot +Allow: / + +User-agent: ChatGPT-User +Allow: / + +User-agent: ClaudeBot +Allow: / + +User-agent: Claude-User +Allow: / + +User-agent: anthropic-ai +Allow: / + +User-agent: PerplexityBot +Allow: / + +User-agent: Google-Extended +Allow: / + +User-agent: Applebot-Extended +Allow: / + +User-agent: CCBot +Allow: / + +Sitemap: https://docs.rapidata.ai/sitemap.xml