Skip to content

feat: transcript import pipeline — grade existing sessions offline#946

Merged
christso merged 5 commits intomainfrom
feat/872-transcript-import-pipeline
Apr 6, 2026
Merged

feat: transcript import pipeline — grade existing sessions offline#946
christso merged 5 commits intomainfrom
feat/872-transcript-import-pipeline

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Apr 6, 2026

Closes #872

Summary

  • agentv import claude/codex/copilot — reads existing AI coding sessions from disk and normalizes them into a tool-agnostic transcript JSONL format (.agentv/transcripts/)
  • agentv eval --transcript <file> — grades pre-recorded transcripts without invoking a live provider, enabling offline evaluation and cross-client comparison
  • agentv inspect — renamed from agentv trace (trace kept as deprecated alias)

Architecture

Step 1: agentv import <source>  → transcript JSONL (.agentv/transcripts/)
Step 2: agentv eval <eval.yaml> --transcript <file>  → graded results (.agentv/results/)

New files

File Purpose
packages/core/src/import/codex-parser.ts Parse Codex CLI rollout JSONL → Message[]
packages/core/src/import/codex-session-discovery.ts Discover Codex sessions in ~/.codex/sessions/
packages/core/src/import/transcript-provider.ts Provider that replays transcript JSONL through eval pipeline
apps/cli/src/commands/import/codex.ts CLI handler for agentv import codex
apps/cli/src/commands/import/copilot.ts CLI handler for agentv import copilot
apps/cli/src/commands/inspect/ Renamed trace → inspect commands

Modified files

  • packages/core/src/import/types.ts — Added TranscriptJsonLine wire format, toTranscriptJsonLine(), readTranscriptJsonl()
  • packages/core/src/evaluation/providers/types.ts — Added 'transcript' to ProviderKind
  • apps/cli/src/commands/eval/commands/run.ts — Added --transcript CLI arg
  • apps/cli/src/commands/eval/run-eval.ts — Transcript provider integration, target bypass
  • apps/cli/src/commands/import/claude.ts — Updated output to spec format (full test case per line)

Terminology note

Issue #872 references "dataset" — this has been renamed to "suite" per #944. Implementation follows the updated naming.

Test plan

  • Build passes
  • Typecheck passes
  • Lint passes
  • All 1814 tests pass
  • All 53 example YAML files validate
  • agentv import claude --discover latest produces valid transcript JSONL
  • agentv import codex --discover latest produces valid transcript JSONL
  • agentv inspect --help works
  • agentv trace --help still works (deprecated alias)
  • agentv eval <file> --transcript <path> grades transcripts (needs eval YAML + grader API key)
  • agentv import copilot --discover latest (needs copilot sessions on disk)

🤖 Generated with Claude Code

…t sessions offline (#872)

Add `agentv import` command with Claude, Codex, and Copilot subcommands
that read existing AI coding sessions from disk and normalize them into
a tool-agnostic transcript JSONL format.

Add `--transcript` flag to `agentv eval` that skips provider invocation
and grades pre-recorded transcripts, enabling offline evaluation without
re-running sessions.

Rename `agentv trace` → `agentv inspect` (kept trace as deprecated alias).

Key changes:
- New parsers: codex-parser.ts, transcript-provider.ts
- New discovery: codex-session-discovery.ts
- Updated import output to spec format (input, output, source, token_usage, etc.)
- TranscriptProvider implements Provider interface for eval pipeline integration
- Re-export copilot parser/discovery from import barrel for CLI access

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Apr 6, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 983535f
Status:⚡️  Build in progress...

View logs

christso and others added 4 commits April 6, 2026 02:00
When --transcript is used without --grader-target, the orchestrator's
grader resolution would fall back to using the transcript provider as
the grader, exhausting the transcript on the second invoke() call.

Fix: return undefined from resolveGraderProvider when the target is a
transcript provider so LLM-based evaluators skip gracefully.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the transcript-specific point check with a proper allowlist of
provider kinds that can return structured JSON for LLM grading.

Previously, resolveGraderProvider would blindly fall back to using the
eval target as its own grader when no grader_target was configured. This
silently broke for transcript, copilot-log, cli, and any other provider
that can't produce grader responses.

Now only providers in LLM_GRADER_CAPABLE_KINDS (openai, openrouter,
azure, anthropic, gemini, agentv, mock) are used as fallback graders.
All others return undefined, causing LLM-based evaluators to skip with
a clear error rather than fail silently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Delete the trace/ command directory entirely (no deprecated alias).
Update all imports from trace/utils to inspect/utils.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@christso christso marked this pull request as ready for review April 6, 2026 02:24
@christso christso merged commit babeb19 into main Apr 6, 2026
3 of 4 checks passed
@christso christso deleted the feat/872-transcript-import-pipeline branch April 6, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: transcript import pipeline — grade existing Claude/Codex/Copilot sessions offline

1 participant