feat: transcript import pipeline — grade existing sessions offline#946
Merged
feat: transcript import pipeline — grade existing sessions offline#946
Conversation
…t sessions offline (#872) Add `agentv import` command with Claude, Codex, and Copilot subcommands that read existing AI coding sessions from disk and normalize them into a tool-agnostic transcript JSONL format. Add `--transcript` flag to `agentv eval` that skips provider invocation and grades pre-recorded transcripts, enabling offline evaluation without re-running sessions. Rename `agentv trace` → `agentv inspect` (kept trace as deprecated alias). Key changes: - New parsers: codex-parser.ts, transcript-provider.ts - New discovery: codex-session-discovery.ts - Updated import output to spec format (input, output, source, token_usage, etc.) - TranscriptProvider implements Provider interface for eval pipeline integration - Re-export copilot parser/discovery from import barrel for CLI access Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When --transcript is used without --grader-target, the orchestrator's grader resolution would fall back to using the transcript provider as the grader, exhausting the transcript on the second invoke() call. Fix: return undefined from resolveGraderProvider when the target is a transcript provider so LLM-based evaluators skip gracefully. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the transcript-specific point check with a proper allowlist of provider kinds that can return structured JSON for LLM grading. Previously, resolveGraderProvider would blindly fall back to using the eval target as its own grader when no grader_target was configured. This silently broke for transcript, copilot-log, cli, and any other provider that can't produce grader responses. Now only providers in LLM_GRADER_CAPABLE_KINDS (openai, openrouter, azure, anthropic, gemini, agentv, mock) are used as fallback graders. All others return undefined, causing LLM-based evaluators to skip with a clear error rather than fail silently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Delete the trace/ command directory entirely (no deprecated alias). Update all imports from trace/utils to inspect/utils. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #872
Summary
agentv import claude/codex/copilot— reads existing AI coding sessions from disk and normalizes them into a tool-agnostic transcript JSONL format (.agentv/transcripts/)agentv eval --transcript <file>— grades pre-recorded transcripts without invoking a live provider, enabling offline evaluation and cross-client comparisonagentv inspect— renamed fromagentv trace(tracekept as deprecated alias)Architecture
New files
packages/core/src/import/codex-parser.tspackages/core/src/import/codex-session-discovery.tspackages/core/src/import/transcript-provider.tsapps/cli/src/commands/import/codex.tsagentv import codexapps/cli/src/commands/import/copilot.tsagentv import copilotapps/cli/src/commands/inspect/Modified files
packages/core/src/import/types.ts— AddedTranscriptJsonLinewire format,toTranscriptJsonLine(),readTranscriptJsonl()packages/core/src/evaluation/providers/types.ts— Added'transcript'toProviderKindapps/cli/src/commands/eval/commands/run.ts— Added--transcriptCLI argapps/cli/src/commands/eval/run-eval.ts— Transcript provider integration, target bypassapps/cli/src/commands/import/claude.ts— Updated output to spec format (full test case per line)Terminology note
Issue #872 references "dataset" — this has been renamed to "suite" per #944. Implementation follows the updated naming.
Test plan
agentv import claude --discover latestproduces valid transcript JSONLagentv import codex --discover latestproduces valid transcript JSONLagentv inspect --helpworksagentv trace --helpstill works (deprecated alias)agentv eval <file> --transcript <path>grades transcripts (needs eval YAML + grader API key)agentv import copilot --discover latest(needs copilot sessions on disk)🤖 Generated with Claude Code