InternScience · ch3n-s4ndy · Apr 10, 2026 · gemini-code-assist · Apr 10, 2026
diff --git a/README.md b/README.md
@@ -39,7 +39,7 @@ Unlike benchmarks that test coding ability or factual recall, ResearchClawBench
 <td align="center" width="25%">🔄<br/><b>Two-Stage Pipeline</b><br/><sub>Autonomous research + rigorous peer-review-style evaluation</sub></td>
 <td align="center" width="25%">🧪<br/><b>40 Real-Science Tasks</b><br/><sub>10 disciplines, curated datasets from published papers</sub></td>
 <td align="center" width="25%">👁️<br/><b>Expert-Annotated Data</b><br/><sub>Tasks, checklists & datasets curated by domain experts</sub></td>
-<td align="center" width="25%">🤖<br/><b>Multi-Agent Support</b><br/><sub>Claude Code, Codex CLI, OpenClaw, Nanobot, EvoScientist, ResearchHarness & custom agents</sub></td>
+<td align="center" width="25%">🤖<br/><b>Multi-Agent Support</b><br/><sub>Claude Code, Codex CLI, OpenClaw, Nanobot, EvoScientist, ResearchClaw, ResearchHarness & custom agents</sub></td>
 </tr>
 <tr>
 <td align="center">🚀<br/><b>Re-Discovery to New-Discovery</b><br/><sub>50 = match the paper, 70+ = surpass it</sub></td>
@@ -60,11 +60,12 @@ Most AI benchmarks evaluate what models **know**. We evaluate what agents can **
 - **Real science, not toy problems.** 40 tasks sourced from published papers across 10 disciplines, each with curated experimental datasets.
 - **Two-stage pipeline.** Autonomous research first, rigorous evaluation second — just like peer review.
 - **Fine-grained, multimodal scoring.** A weighted checklist with text and image criteria, judged by an LLM acting as a strict peer reviewer.
-- **Agent-agnostic.** Ships with built-in support for Claude Code, Codex CLI, OpenClaw, Nanobot, EvoScientist, and a lightweight ResearchHarness baseline. Bring your own agent in one line.
+- **Agent-agnostic.** Ships with built-in support for Claude Code, Codex CLI, OpenClaw, Nanobot, EvoScientist, ResearchClaw, and a lightweight ResearchHarness baseline. Bring your own agent in one line.
 - **From Re-Discovery to New-Discovery.** Scoring above 50 means matching the original paper; above 70 means *surpassing* it. The frontier is wide open.
 
 ### 📢 News
 
+- **2026-04-10** 🔬 Added built-in [ResearchClaw](https://github.com/researchclaw/researchclaw) support — an intelligent agent-powered research assistant with built-in skills for paper search, literature review, and data analysis.
 - **2026-04-07** 🧪 Added built-in [ResearchHarness](https://github.com/black-yt/ResearchHarness) support as a lightweight baseline agent for testing different LLMs under the same ResearchClawBench workflow.
 - **2026-03-30** 🧬 Added built-in [EvoScientist](https://github.com/EvoScientist/EvoScientist) support and clarified multimodal judge prompting so the first attached image is explicitly treated as the ground-truth figure.
 - **2026-03-27** 🤗 Released a Hugging Face dataset mirror at [InternScience/ResearchClawBench](https://huggingface.co/datasets/InternScience/ResearchClawBench), including 10 additional tasks from ResearchClawBench-Self and a task downloader script.
@@ -350,6 +351,7 @@ Install whichever agent(s) you plan to benchmark. You do not need all six.
 | **OpenClaw** | [OpenClaw](https://openclaw.ai/) | Official website and setup entry |
 | **Nanobot** | [HKUDS/nanobot](https://github.com/HKUDS/nanobot) | Official GitHub repository |
 | **EvoScientist** | [EvoScientist/EvoScientist](https://github.com/EvoScientist/EvoScientist) | Official GitHub repository |
+| **ResearchClaw** | [researchclaw/researchclaw](https://github.com/researchclaw/researchclaw) | `pip install researchclaw` |
 | **ResearchHarness** | [black-yt/ResearchHarness benchmark README](https://github.com/black-yt/ResearchHarness/blob/main/benchmarks/ResearchClawBench/README.md) | Lightweight baseline harness for testing different LLMs; replace `/abs/path/to/ResearchHarness` in `agents.json` |
 
 #### 5. Launch
@@ -366,7 +368,7 @@ After a run completes, switch to the **Evaluation** tab and click **Score**. The
 
 ### 🤖 Supported Agents
 
-ResearchClawBench ships with built-in support for five frontier coding agents plus a lightweight ResearchHarness baseline:
+ResearchClawBench ships with built-in support for five frontier coding agents, ResearchClaw, plus a lightweight ResearchHarness baseline:
-ResearchClawBench ships with built-in support for five frontier coding agents, ResearchClaw, plus a lightweight ResearchHarness baseline:
+ResearchClawBench ships with built-in support for six frontier coding agents plus a lightweight ResearchHarness baseline:
-ResearchClawBench ships with built-in support for five frontier coding agents, ResearchClaw, plus a lightweight ResearchHarness baseline:
+ResearchClawBench ships with built-in support for six frontier coding agents plus a lightweight ResearchHarness baseline:
 
 | Agent | Command | Notes |
 |:------|:--------|:------|
@@ -375,6 +377,7 @@ ResearchClawBench ships with built-in support for five frontier coding agents pl
 | <img src="evaluation/static/logos/openclaw.svg" width="16" /> **OpenClaw** | `openclaw agent ...` | Self-hosted gateway, 3600s timeout |
 | <img src="evaluation/static/logos/nanobot.svg" width="16" /> **Nanobot** | `nanobot agent -m ...` | Ultra-lightweight, reliable tool execution |
 | <img src="evaluation/static/logos/evo.svg" width="16" /> **EvoScientist** | `evosci --ui cli ...` | Self-evolving AI Scientists |
+| <img src="evaluation/static/logos/researchclaw.svg" width="16" /> **ResearchClaw** | `researchclaw agent -m ...` | AI research assistant with built-in skills |
 | <img src="evaluation/static/logos/rh.svg" width="16" /> **ResearchHarness** | `python3 /abs/path/to/ResearchHarness/run_agent.py ...` | Lightweight baseline harness for testing different LLMs |
 
 #### 🔧 Add Your Own Agent

diff --git a/evaluation/agents.json b/evaluation/agents.json
@@ -34,5 +34,11 @@
     "icon": "H",
     "logo": "/static/logos/rh.svg",
     "cmd": "python3 /abs/path/to/ResearchHarness/run_agent.py <PROMPT> --workspace-root <WORKSPACE> --role-prompt-file /abs/path/to/ResearchHarness/benchmarks/ResearchClawBench/role_prompt.md --trace-dir <WORKSPACE>"
+  },
+  "researchclaw": {
+    "label": "ResearchClaw",
+    "icon": "R",
+    "logo": "/static/logos/researchclaw.svg",
+    "cmd": "researchclaw agent -m <PROMPT> -w <WORKSPACE>"
   }
 }
diff --git a/evaluation/static/logos/researchclaw.svg b/evaluation/static/logos/researchclaw.svg