Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .agents/skills/e2e-tests/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ Cassettes mock provider HTTP responses (OpenAI, Anthropic, ...) so external-prov
- Run every scenario through `withScenarioHarness(...)`.
- Keep reusable logic in `e2e/helpers/`. Keep one-off fixtures and scenario-specific files inside the scenario directory.
- Snapshot stable contracts, not raw noise. Use `normalizeForSnapshot(...)` before inline snapshots and `formatJsonFileSnapshot(...)` plus file snapshots for larger payloads or version matrices.
- For span-tree snapshots, use `matchSpanTreeSnapshot(...)`. It writes and asserts paired `.span-tree.json` and `.span-tree.txt` files from the same normalized span tree. The JSON file is the structural contract that is easiest to parse mechanically; the TXT file is the ASCII tree that is easiest to review by eye. Keep them in sync by updating both through the e2e update/record commands, and never hand-edit only one side of the pair.
- When a scenario family already has `assertions.ts`, keep version- or provider-specific test setup in `scenario.test.ts` and reuse the shared assertions file.
- Keep the CI e2e summary up to date. If a scenario version matrix or `variantKey` changes, update `e2e/config/pr-comment-scenarios.json` in the same change and follow the established pattern used by other versioned scenarios: one summary row per version, not separate wrapped/auto rows unless that pattern already exists for the scenario family.
- Run new or updated scenarios three times in a row before considering snapshots stable.
Expand Down
5 changes: 5 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,14 @@ Each scenario runs the SDK in a subprocess against a mock Braintrust server and

```bash
pnpm run test:e2e # Run all e2e scenarios (from repo root)
pnpm run test:e2e:update # Update e2e snapshots without re-recording cassettes
pnpm run test:e2e:record # Re-record provider cassettes and update snapshots
```

When adding or modifying e2e tests, run the relevant e2e verification twice before stopping so flakes are caught proactively. After running `pnpm run test:e2e:update` or `pnpm run test:e2e:record`, always run the normal e2e tests afterward to verify there is no snapshot drift or unstable output.

Span-tree snapshots are paired: `*.span-tree.json` is the structural contract, and `*.span-tree.txt` is the human-readable ASCII tree generated from the same normalized spans. Both files are asserted and should be updated together through `pnpm run test:e2e:update` or `pnpm run test:e2e:record`; do not hand-edit only one side of the pair.

**From repo root:**

```bash
Expand Down
7 changes: 4 additions & 3 deletions e2e/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Any extra files needed only by one scenario stay in that scenario folder. Anythi
- `scenario-installer.ts` - Installs optional scenario-local dependencies from a colocated `package.json` into a shared cache and links them into prepared scenario copies.
- `mock-braintrust-server.ts` - Captures requests, merged log payloads, and parsed span-like events.
- `normalize.ts` - Makes snapshots deterministic by normalizing ids, timestamps, paths, and mock-server URLs.
- `trace-selectors.ts` / `trace-summary.ts` - Helpers for finding spans and snapshotting only the relevant shape.
- `trace-selectors.ts` / `span-tree.ts` / `trace-summary.ts` - Helpers for finding spans and snapshotting stable, human-readable trace trees.
- `scenario-runtime.ts` - Shared runtime utilities used by scenario entrypoints.
- `openai.ts` - Shared scenario lists and assertions for OpenAI wrapper and hook coverage across v4/v5/v6.
- `wrapper-contract.ts` - Helpers for snapshotting wrapper span contracts and filtering payload rows by root span id.
Expand Down Expand Up @@ -66,7 +66,7 @@ The main utilities you'll use in test files:
- `events()`, `payloads()`, `requestCursor()`, `requestsAfter()` - Lower-level access for ingestion payloads and HTTP request flow assertions.
- `testRunId` - Useful when a scenario or assertion needs the exact run marker.

Use `normalizeForSnapshot(...)` before snapshotting. It replaces timestamps and ids with stable tokens and strips machine-specific paths and localhost ports.
Prefer `matchSpanTreeSnapshot(...)` for span snapshots. It asserts both a structural `.span-tree.json` snapshot and a human-readable `.span-tree.txt` tree beside it. Both files are generated from the same normalized span tree and include stable span attributes, input, output, expected values, scores, tags, metadata, metrics, and errors. Use `normalizeForSnapshot(...)` for non-span JSON snapshots; it replaces timestamps and ids with stable tokens and strips machine-specific paths and localhost ports.

### Provider scenario cassettes

Expand All @@ -79,7 +79,7 @@ Wrapper scenarios often create a root span with `testRunId` metadata and then le
- Use `events()` rather than `testRunEvents()` to inspect the full trace tree.
- Find the scenario root span first.
- Scope raw payload snapshots by `root_span_id` using `payloadRowsForRootSpan(...)`.
- Pair a normalized `span-events` snapshot with a normalized `log-payloads` snapshot.
- Prefer normalized span-tree snapshots from `matchSpanTreeSnapshot(...)`. The `.json` sibling is the structural contract, and the `.txt` sibling is the ASCII tree for review; both are asserted and should be updated together.
- If the wrapper has an explicit support matrix, reuse one shared test across version-specific scenario entries instead of duplicating the assertions. The AI SDK wrapper scenario uses this for supported v3-v6 package combinations.

### Runner-wrapper scenario pattern
Expand Down Expand Up @@ -122,6 +122,7 @@ Scenario-local manifests are optional and should stay slim. They are only for sc

```bash
pnpm run test:e2e # Run all e2e tests
pnpm run test:e2e:update # Update snapshots in cassette replay mode
pnpm run test:e2e:record # Re-record provider cassettes and update snapshots
pnpm run test:e2e:record -- <name> # Re-record one scenario from the repo root
pnpm run test:e2e:canary # Run canary e2e tests
Expand Down
36 changes: 33 additions & 3 deletions e2e/helpers/normalize.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,23 @@ const UUID_REGEX =
/^[0-9a-f]{8}-[0-9a-f]{4}-[1-8][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i;
const UUID_SUBSTRING_REGEX =
/[0-9a-f]{8}-[0-9a-f]{4}-[1-8][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}/gi;
const TIME_KEYS = new Set(["created", "date", "start", "end"]);
const TIME_KEYS = new Set([
"completed_at",
"created",
"created_at",
"date",
"end",
"expires_at",
"start",
"started_at",
"updated_at",
]);
const SPAN_ID_KEYS = new Set(["id", "span_id", "root_span_id"]);
const ZERO_NUMBER_KEYS = new Set([
"avgLogprobs",
"caller_lineno",
"duration",
"github_copilot.context_window.current",
"time_to_first_token",
]);
const XACT_VERSION_KEYS = new Set([
Expand All @@ -46,7 +57,13 @@ const DYNAMIC_HEADER_KEYS = new Set([
"x-ratelimit-reset-tokens",
"x-request-id",
]);
const PROVIDER_ID_KEYS = new Set(["itemId", "responseId", "toolCallId"]);
const PROVIDER_ID_KEYS = new Set([
"agentId",
"claude_agent_sdk.task_id",
"itemId",
"responseId",
"toolCallId",
]);
const PROJECT_ID_KEYS = new Set(["project_id", "projectId"]);
const PROJECT_NAME_KEYS = new Set(["project_name", "projectName"]);
const HELPERS_DIR = path.dirname(fileURLToPath(import.meta.url));
Expand Down Expand Up @@ -219,7 +236,12 @@ function normalizeValue(
}

if (typeof value === "number") {
if (currentKey && ZERO_NUMBER_KEYS.has(currentKey)) {
if (
currentKey &&
(ZERO_NUMBER_KEYS.has(currentKey) ||
currentKey.endsWith("_ms") ||
currentKey.endsWith("Ms"))
) {
return 0;
}
if (currentKey && TIME_KEYS.has(currentKey)) {
Expand All @@ -240,6 +262,14 @@ function normalizeValue(
return normalizeCallerFilename(value);
}

if (currentKey === "openai_codex.working_directory") {
const normalizedPath = value.replace(/\\/g, "/");
const match = normalizedPath.match(
/\/braintrust-codex-e2e-[^/]+\/([^/]+)$/,
);
return match ? `<tmp>/braintrust-codex-e2e/${match[1]}` : "<tmp>";
}

if (currentKey === "_xact_id") {
return tokenFor(tokenMaps.xacts, value, "xact");
}
Expand Down
Loading
Loading