Skip to content

feat(studio): run evals from Studio with suite filter, test-id filter, and target override#947

Merged
christso merged 1 commit intomainfrom
feat/945-studio-run-eval
Apr 6, 2026
Merged

feat(studio): run evals from Studio with suite filter, test-id filter, and target override#947
christso merged 1 commit intomainfrom
feat/945-studio-run-eval

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Apr 6, 2026

feat: Run Evals from Studio (#945)

Adds a lightweight "Run Eval" flow to Studio that maps to existing CLI args, so users can launch evals without switching to the terminal.

What's New

Backend (apps/cli/src/commands/results/eval-runner.ts)

  • GET /api/eval/discover — finds eval files in the project via glob
  • GET /api/eval/targets — discovers target definitions from targets.yaml
  • POST /api/eval/run — spawns agentv eval ... as a child process
  • GET /api/eval/status/:id — polls process state (stdout/stderr streaming)
  • POST /api/eval/preview — returns the CLI command that would be executed
  • GET /api/eval/runs — lists active and recent runs
  • All endpoints also project-scoped under /api/projects/:projectId/eval/*

Frontend (apps/studio/src/components/RunEvalModal.tsx)

  • Two-step wizard modal:
    • Step 1: Suite filter (with file discovery suggestions), test-id pill input, target dropdown (auto-populated)
    • Step 2: Advanced options (threshold, workers, dry-run)
  • CLI command preview before launch
  • Live status view with stdout/stderr streaming, exit code display
  • Auto-refreshes runs list on completion

Entry Points (3 pages × 2 scopes = 6 route files)

  • Home page → "▶ Run Eval" button
  • Run detail → "▶ Re-run with Filters" (prefills target from current run)
  • Eval detail → "▶ Run this Test" (prefills test-id and target)

E2E Validation (agent-browser)

All flows validated in headless Chrome:

  1. ✅ "Run Eval" button visible on home page
  2. ✅ Modal opens with suite filter, test-id input, target dropdown (18 targets discovered)
  3. ✅ Eval file suggestions clickable, test-id pills work
  4. ✅ CLI preview renders correctly
  5. ✅ Advanced options expand/collapse
  6. ✅ Failed run (bad filter) shows error with exit code 1
  7. ✅ Successful dry-run shows "Finished" with stdout streaming, exit code 0
  8. ✅ Runs list auto-refreshes with new run visible
  9. ✅ "Re-run with Filters" button on run detail page
  10. ✅ "Run this Test" button on eval detail page

Files Changed

  • New: apps/cli/src/commands/results/eval-runner.ts
  • New: apps/studio/src/components/RunEvalModal.tsx
  • Modified: apps/cli/src/commands/results/serve.ts (route registration)
  • Modified: apps/studio/src/lib/types.ts (eval runner types)
  • Modified: apps/studio/src/lib/api.ts (query hooks + mutations)
  • Modified: 6 route files (entry point buttons)

Closes #945

…d target override

- Add eval-runner.ts: Hono API endpoints for discovery, launch, and status polling
  - GET /api/eval/discover: discovers eval files in project
  - GET /api/eval/targets: lists available target names
  - POST /api/eval/run: spawns CLI eval process with validated args
  - GET /api/eval/status/:id: polls running eval status
  - POST /api/eval/preview: generates CLI command preview
  - All endpoints also available project-scoped under /api/projects/:projectId/eval/*

- Add RunEvalModal component: two-step wizard modal
  - Step 1: suite filter (text input with discovered file suggestions), test-id
    pills (repeatable with glob support), target override (searchable dropdown)
  - Step 2: advanced options (threshold, workers, dry-run) collapsed by default
  - Live CLI preview before launch
  - Run status view with stdout/stderr streaming after launch

- Add entry points on every relevant page:
  - Home page (both single-project and multi-project): 'Run Eval' button
  - Run detail page: 'Re-run with Filters' (prefilled with current target)
  - Eval detail page: 'Run this Test' (prefilled with test ID and target)
  - All project-scoped variants included

Closes #945

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 23b369d
Status: ✅  Deploy successful!
Preview URL: https://fa1e5850.agentv.pages.dev
Branch Preview URL: https://feat-945-studio-run-eval.agentv.pages.dev

View logs

@christso christso marked this pull request as ready for review April 6, 2026 01:57
@christso christso merged commit b81e456 into main Apr 6, 2026
4 checks passed
@christso christso deleted the feat/945-studio-run-eval branch April 6, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(studio): run evals from Studio with suite filter, test-id filter, and target override

1 participant