feat: vitest assertions

## Status: P3 — Research complete, defer implementation

Research into Vitest assertion patterns for AI eval is complete. **Conclusion: AgentV's existing SDK already surpasses what Vitest matchers would offer.** A Vitest bridge is a CI/DX convenience, not a capability gap.

## Why AgentV Doesn't Need This Yet

AgentV already has two grading layers that cover the same ground:

### `@agentv/core` — In-process TypeScript SDK
```typescript
await evaluate({
  tests: [{
    id: 'test-1',
    input: 'Capital of France?',
    assert: [
      { type: 'contains', value: 'Paris' },                                    // 22+ built-in types
      { type: 'llm-grader', prompt: './rubric.md' },                           // LLM-as-judge
      ({ output }) => ({ name: 'len', score: output.length > 10 ? 1 : 0 }),   // inline fn
    ],
  }],
  task: async (input) => callMyAgent(input),
});
```

### `@agentv/eval` — Subprocess graders (stdin/stdout)
```typescript
// .agentv/assertions/sentiment.ts — auto-discovered by filename
import { defineAssertion } from '@agentv/eval';

export default defineAssertion(({ output }) => ({
  pass: true,
  score: 0.85,
  assertions: [{ text: 'Professional tone', passed: true }],
}));
```

### Advantages over Vitest `expect.extend()` matchers
- **Continuous scoring (0-1)** — Vitest matchers return binary `pass: boolean`
- **Multi-assertion results per evaluator** — `assertions[]` array with per-criterion verdicts
- **Subprocess isolation** — `defineCodeGrader`/`defineAssertion` run in separate processes
- **Auto-discovery** — drop `.ts` in `.agentv/assertions/` → type name from filename
- **YAML-first** — AI agents can read/write EVAL.yaml without TypeScript
- **Composite evaluators** — weighted average, threshold, code/LLM aggregation built-in

## What a Vitest Bridge Would Look Like (If Implemented)

A thin `@agentv/vitest` package — NOT a rewrite of the assertion engine:

```typescript
import { installAgentVMatchers } from '@agentv/vitest';
installAgentVMatchers();

test('my eval', async () => {
  const { summary } = await evaluate({
    specFile: './evals/EVAL.yaml',
    task: async (input) => myAgent(input),
  });
  expect(summary.passed).toBe(summary.total);
});
```

Or wrapping individual assertions as custom matchers:
```typescript
// Uses expect.extend() under the hood
await expect(response).toPassAgentVAssertion({ type: 'llm-grader', prompt: './rubric.md' });
await expect(response).toPassAgentVAssertion({ type: 'contains', value: 'Paris' });
```

## Industry Context

Frameworks that have Vitest/Jest integration:
- **Promptfoo**: `installMatchers()` → 4 custom matchers (`toMatchSemanticSimilarity`, `toPassLLMRubric`, etc.)
- **LangSmith**: `ls.describe()` / `ls.test()` wrappers around native Vitest
- **vitest-evals** (Sentry): `describeEval()` with multi-scorer + threshold
- **Evalite** (Matt Pocock): Vitest runner + web UI + trace capture

These serve users who want eval assertions inside existing Vitest test suites. AgentV's `evaluate()` API serves users who want a dedicated eval framework. Different audiences.

## Decision

Defer. Focus on:
1. **#925** — Per-assertion configurable thresholds (P0)
2. Shared assertion templates (P1)
3. AND/OR logic operators (P2)

Revisit Vitest bridge if user demand emerges. The integration surface is small (~100 LOC) so it can be built quickly when needed.

## Research

Full findings: `agentevals-research/research/findings/test-framework-assertion-patterns/README.md`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: vitest assertions #926

Status: P3 — Research complete, defer implementation

Why AgentV Doesn't Need This Yet

`@agentv/core` — In-process TypeScript SDK

`@agentv/eval` — Subprocess graders (stdin/stdout)

Advantages over Vitest `expect.extend()` matchers

What a Vitest Bridge Would Look Like (If Implemented)

Industry Context

Decision

Research

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: vitest assertions #926

Description

Status: P3 — Research complete, defer implementation

Why AgentV Doesn't Need This Yet

@agentv/core — In-process TypeScript SDK

@agentv/eval — Subprocess graders (stdin/stdout)

Advantages over Vitest expect.extend() matchers

What a Vitest Bridge Would Look Like (If Implemented)

Industry Context

Decision

Research

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`@agentv/core` — In-process TypeScript SDK

`@agentv/eval` — Subprocess graders (stdin/stdout)

Advantages over Vitest `expect.extend()` matchers