feat(studio): run evals from Studio with suite filter, test-id filter, and target override by christso · Pull Request #947 · EntityProcess/agentv

christso · 2026-04-06T01:50:37Z

feat: Run Evals from Studio (#945)

Adds a lightweight "Run Eval" flow to Studio that maps to existing CLI args, so users can launch evals without switching to the terminal.

What's New

Backend (apps/cli/src/commands/results/eval-runner.ts)

GET /api/eval/discover — finds eval files in the project via glob
GET /api/eval/targets — discovers target definitions from targets.yaml
POST /api/eval/run — spawns agentv eval ... as a child process
GET /api/eval/status/:id — polls process state (stdout/stderr streaming)
POST /api/eval/preview — returns the CLI command that would be executed
GET /api/eval/runs — lists active and recent runs
All endpoints also project-scoped under /api/projects/:projectId/eval/*

Frontend (apps/studio/src/components/RunEvalModal.tsx)

Two-step wizard modal:
- Step 1: Suite filter (with file discovery suggestions), test-id pill input, target dropdown (auto-populated)
- Step 2: Advanced options (threshold, workers, dry-run)
CLI command preview before launch
Live status view with stdout/stderr streaming, exit code display
Auto-refreshes runs list on completion

Entry Points (3 pages × 2 scopes = 6 route files)

Home page → "▶ Run Eval" button
Run detail → "▶ Re-run with Filters" (prefills target from current run)
Eval detail → "▶ Run this Test" (prefills test-id and target)

E2E Validation (agent-browser)

All flows validated in headless Chrome:

✅ "Run Eval" button visible on home page
✅ Modal opens with suite filter, test-id input, target dropdown (18 targets discovered)
✅ Eval file suggestions clickable, test-id pills work
✅ CLI preview renders correctly
✅ Advanced options expand/collapse
✅ Failed run (bad filter) shows error with exit code 1
✅ Successful dry-run shows "Finished" with stdout streaming, exit code 0
✅ Runs list auto-refreshes with new run visible
✅ "Re-run with Filters" button on run detail page
✅ "Run this Test" button on eval detail page

Files Changed

New: apps/cli/src/commands/results/eval-runner.ts
New: apps/studio/src/components/RunEvalModal.tsx
Modified: apps/cli/src/commands/results/serve.ts (route registration)
Modified: apps/studio/src/lib/types.ts (eval runner types)
Modified: apps/studio/src/lib/api.ts (query hooks + mutations)
Modified: 6 route files (entry point buttons)

Closes #945

…d target override - Add eval-runner.ts: Hono API endpoints for discovery, launch, and status polling - GET /api/eval/discover: discovers eval files in project - GET /api/eval/targets: lists available target names - POST /api/eval/run: spawns CLI eval process with validated args - GET /api/eval/status/:id: polls running eval status - POST /api/eval/preview: generates CLI command preview - All endpoints also available project-scoped under /api/projects/:projectId/eval/* - Add RunEvalModal component: two-step wizard modal - Step 1: suite filter (text input with discovered file suggestions), test-id pills (repeatable with glob support), target override (searchable dropdown) - Step 2: advanced options (threshold, workers, dry-run) collapsed by default - Live CLI preview before launch - Run status view with stdout/stderr streaming after launch - Add entry points on every relevant page: - Home page (both single-project and multi-project): 'Run Eval' button - Run detail page: 'Re-run with Filters' (prefilled with current target) - Eval detail page: 'Run this Test' (prefilled with test ID and target) - All project-scoped variants included Closes #945 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

cloudflare-workers-and-pages · 2026-04-06T01:51:06Z

Deploying agentv with Cloudflare Pages

Latest commit:	`23b369d`
Status:	✅ Deploy successful!
Preview URL:	https://fa1e5850.agentv.pages.dev
Branch Preview URL:	https://feat-945-studio-run-eval.agentv.pages.dev

View logs

christso marked this pull request as ready for review April 6, 2026 01:57

christso merged commit b81e456 into main Apr 6, 2026
4 checks passed

christso deleted the feat/945-studio-run-eval branch April 6, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(studio): run evals from Studio with suite filter, test-id filter, and target override#947

feat(studio): run evals from Studio with suite filter, test-id filter, and target override#947
christso merged 1 commit intomainfrom
feat/945-studio-run-eval

christso commented Apr 6, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages bot commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat: Run Evals from Studio (#945)

What's New

E2E Validation (agent-browser)

Files Changed

Uh oh!

cloudflare-workers-and-pages bot commented Apr 6, 2026

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christso commented Apr 6, 2026 •

edited

Loading