Conversation
…d target override
- Add eval-runner.ts: Hono API endpoints for discovery, launch, and status polling
- GET /api/eval/discover: discovers eval files in project
- GET /api/eval/targets: lists available target names
- POST /api/eval/run: spawns CLI eval process with validated args
- GET /api/eval/status/:id: polls running eval status
- POST /api/eval/preview: generates CLI command preview
- All endpoints also available project-scoped under /api/projects/:projectId/eval/*
- Add RunEvalModal component: two-step wizard modal
- Step 1: suite filter (text input with discovered file suggestions), test-id
pills (repeatable with glob support), target override (searchable dropdown)
- Step 2: advanced options (threshold, workers, dry-run) collapsed by default
- Live CLI preview before launch
- Run status view with stdout/stderr streaming after launch
- Add entry points on every relevant page:
- Home page (both single-project and multi-project): 'Run Eval' button
- Run detail page: 'Re-run with Filters' (prefilled with current target)
- Eval detail page: 'Run this Test' (prefilled with test ID and target)
- All project-scoped variants included
Closes #945
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Deploying agentv with
|
| Latest commit: |
23b369d
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://fa1e5850.agentv.pages.dev |
| Branch Preview URL: | https://feat-945-studio-run-eval.agentv.pages.dev |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat: Run Evals from Studio (#945)
Adds a lightweight "Run Eval" flow to Studio that maps to existing CLI args, so users can launch evals without switching to the terminal.
What's New
Backend (
apps/cli/src/commands/results/eval-runner.ts)GET /api/eval/discover— finds eval files in the project via globGET /api/eval/targets— discovers target definitions from targets.yamlPOST /api/eval/run— spawnsagentv eval ...as a child processGET /api/eval/status/:id— polls process state (stdout/stderr streaming)POST /api/eval/preview— returns the CLI command that would be executedGET /api/eval/runs— lists active and recent runs/api/projects/:projectId/eval/*Frontend (
apps/studio/src/components/RunEvalModal.tsx)Entry Points (3 pages × 2 scopes = 6 route files)
E2E Validation (agent-browser)
All flows validated in headless Chrome:
Files Changed
apps/cli/src/commands/results/eval-runner.tsapps/studio/src/components/RunEvalModal.tsxapps/cli/src/commands/results/serve.ts(route registration)apps/studio/src/lib/types.ts(eval runner types)apps/studio/src/lib/api.ts(query hooks + mutations)Closes #945