diff --git a/README.md b/README.md
index 4d03736..ca85496 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ Unlike benchmarks that test coding ability or factual recall, ResearchClawBench
๐ Two-Stage Pipeline Autonomous research + rigorous peer-review-style evaluation
๐งช 40 Real-Science Tasks 10 disciplines, curated datasets from published papers
๐๏ธ Expert-Annotated Data Tasks, checklists & datasets curated by domain experts
-
๐ค Multi-Agent Support Claude Code, Codex CLI, OpenClaw, Nanobot, EvoScientist, ResearchHarness & custom agents
+
๐ค Multi-Agent Support Claude Code, Codex CLI, OpenClaw, Nanobot, EvoScientist, ResearchClaw, ResearchHarness & custom agents
๐ Re-Discovery to New-Discovery 50 = match the paper, 70+ = surpass it
@@ -60,11 +60,12 @@ Most AI benchmarks evaluate what models **know**. We evaluate what agents can **
- **Real science, not toy problems.** 40 tasks sourced from published papers across 10 disciplines, each with curated experimental datasets.
- **Two-stage pipeline.** Autonomous research first, rigorous evaluation second โ just like peer review.
- **Fine-grained, multimodal scoring.** A weighted checklist with text and image criteria, judged by an LLM acting as a strict peer reviewer.
-- **Agent-agnostic.** Ships with built-in support for Claude Code, Codex CLI, OpenClaw, Nanobot, EvoScientist, and a lightweight ResearchHarness baseline. Bring your own agent in one line.
+- **Agent-agnostic.** Ships with built-in support for Claude Code, Codex CLI, OpenClaw, Nanobot, EvoScientist, ResearchClaw, and a lightweight ResearchHarness baseline. Bring your own agent in one line.
- **From Re-Discovery to New-Discovery.** Scoring above 50 means matching the original paper; above 70 means *surpassing* it. The frontier is wide open.
### ๐ข News
+- **2026-04-10** ๐ฌ Added built-in [ResearchClaw](https://github.com/researchclaw/researchclaw) support โ an intelligent agent-powered research assistant with built-in skills for paper search, literature review, and data analysis.
- **2026-04-07** ๐งช Added built-in [ResearchHarness](https://github.com/black-yt/ResearchHarness) support as a lightweight baseline agent for testing different LLMs under the same ResearchClawBench workflow.
- **2026-03-30** ๐งฌ Added built-in [EvoScientist](https://github.com/EvoScientist/EvoScientist) support and clarified multimodal judge prompting so the first attached image is explicitly treated as the ground-truth figure.
- **2026-03-27** ๐ค Released a Hugging Face dataset mirror at [InternScience/ResearchClawBench](https://huggingface.co/datasets/InternScience/ResearchClawBench), including 10 additional tasks from ResearchClawBench-Self and a task downloader script.
@@ -350,6 +351,7 @@ Install whichever agent(s) you plan to benchmark. You do not need all six.
| **OpenClaw** | [OpenClaw](https://openclaw.ai/) | Official website and setup entry |
| **Nanobot** | [HKUDS/nanobot](https://github.com/HKUDS/nanobot) | Official GitHub repository |
| **EvoScientist** | [EvoScientist/EvoScientist](https://github.com/EvoScientist/EvoScientist) | Official GitHub repository |
+| **ResearchClaw** | [researchclaw/researchclaw](https://github.com/researchclaw/researchclaw) | `pip install researchclaw` |
| **ResearchHarness** | [black-yt/ResearchHarness benchmark README](https://github.com/black-yt/ResearchHarness/blob/main/benchmarks/ResearchClawBench/README.md) | Lightweight baseline harness for testing different LLMs; replace `/abs/path/to/ResearchHarness` in `agents.json` |
#### 5. Launch
@@ -366,7 +368,7 @@ After a run completes, switch to the **Evaluation** tab and click **Score**. The
### ๐ค Supported Agents
-ResearchClawBench ships with built-in support for five frontier coding agents plus a lightweight ResearchHarness baseline:
+ResearchClawBench ships with built-in support for five frontier coding agents, ResearchClaw, plus a lightweight ResearchHarness baseline:
| Agent | Command | Notes |
|:------|:--------|:------|
@@ -375,6 +377,7 @@ ResearchClawBench ships with built-in support for five frontier coding agents pl
| **OpenClaw** | `openclaw agent ...` | Self-hosted gateway, 3600s timeout |
| **Nanobot** | `nanobot agent -m ...` | Ultra-lightweight, reliable tool execution |
| **EvoScientist** | `evosci --ui cli ...` | Self-evolving AI Scientists |
+| **ResearchClaw** | `researchclaw agent -m ...` | AI research assistant with built-in skills |
| **ResearchHarness** | `python3 /abs/path/to/ResearchHarness/run_agent.py ...` | Lightweight baseline harness for testing different LLMs |
#### ๐ง Add Your Own Agent
diff --git a/evaluation/agents.json b/evaluation/agents.json
index 85de0ed..afc68e4 100644
--- a/evaluation/agents.json
+++ b/evaluation/agents.json
@@ -34,5 +34,11 @@
"icon": "H",
"logo": "/static/logos/rh.svg",
"cmd": "python3 /abs/path/to/ResearchHarness/run_agent.py --workspace-root --role-prompt-file /abs/path/to/ResearchHarness/benchmarks/ResearchClawBench/role_prompt.md --trace-dir "
+ },
+ "researchclaw": {
+ "label": "ResearchClaw",
+ "icon": "R",
+ "logo": "/static/logos/researchclaw.svg",
+ "cmd": "researchclaw agent -m -w "
}
}
diff --git a/evaluation/static/logos/researchclaw.svg b/evaluation/static/logos/researchclaw.svg
new file mode 100644
index 0000000..aa98b37
--- /dev/null
+++ b/evaluation/static/logos/researchclaw.svg
@@ -0,0 +1,4 @@
+