Skip to content

feat: vitest assertions #926

@christso

Description

@christso

Status: P3 — Research complete, defer implementation

Research into Vitest assertion patterns for AI eval is complete. Conclusion: AgentV's existing SDK already surpasses what Vitest matchers would offer. A Vitest bridge is a CI/DX convenience, not a capability gap.

Why AgentV Doesn't Need This Yet

AgentV already has two grading layers that cover the same ground:

@agentv/core — In-process TypeScript SDK

await evaluate({
  tests: [{
    id: 'test-1',
    input: 'Capital of France?',
    assert: [
      { type: 'contains', value: 'Paris' },                                    // 22+ built-in types
      { type: 'llm-grader', prompt: './rubric.md' },                           // LLM-as-judge
      ({ output }) => ({ name: 'len', score: output.length > 10 ? 1 : 0 }),   // inline fn
    ],
  }],
  task: async (input) => callMyAgent(input),
});

@agentv/eval — Subprocess graders (stdin/stdout)

// .agentv/assertions/sentiment.ts — auto-discovered by filename
import { defineAssertion } from '@agentv/eval';

export default defineAssertion(({ output }) => ({
  pass: true,
  score: 0.85,
  assertions: [{ text: 'Professional tone', passed: true }],
}));

Advantages over Vitest expect.extend() matchers

  • Continuous scoring (0-1) — Vitest matchers return binary pass: boolean
  • Multi-assertion results per evaluatorassertions[] array with per-criterion verdicts
  • Subprocess isolationdefineCodeGrader/defineAssertion run in separate processes
  • Auto-discovery — drop .ts in .agentv/assertions/ → type name from filename
  • YAML-first — AI agents can read/write EVAL.yaml without TypeScript
  • Composite evaluators — weighted average, threshold, code/LLM aggregation built-in

What a Vitest Bridge Would Look Like (If Implemented)

A thin @agentv/vitest package — NOT a rewrite of the assertion engine:

import { installAgentVMatchers } from '@agentv/vitest';
installAgentVMatchers();

test('my eval', async () => {
  const { summary } = await evaluate({
    specFile: './evals/EVAL.yaml',
    task: async (input) => myAgent(input),
  });
  expect(summary.passed).toBe(summary.total);
});

Or wrapping individual assertions as custom matchers:

// Uses expect.extend() under the hood
await expect(response).toPassAgentVAssertion({ type: 'llm-grader', prompt: './rubric.md' });
await expect(response).toPassAgentVAssertion({ type: 'contains', value: 'Paris' });

Industry Context

Frameworks that have Vitest/Jest integration:

  • Promptfoo: installMatchers() → 4 custom matchers (toMatchSemanticSimilarity, toPassLLMRubric, etc.)
  • LangSmith: ls.describe() / ls.test() wrappers around native Vitest
  • vitest-evals (Sentry): describeEval() with multi-scorer + threshold
  • Evalite (Matt Pocock): Vitest runner + web UI + trace capture

These serve users who want eval assertions inside existing Vitest test suites. AgentV's evaluate() API serves users who want a dedicated eval framework. Different audiences.

Decision

Defer. Focus on:

  1. feat: configurable threshold + naming taxonomy cleanup #925 — Per-assertion configurable thresholds (P0)
  2. Shared assertion templates (P1)
  3. AND/OR logic operators (P2)

Revisit Vitest bridge if user demand emerges. The integration surface is small (~100 LOC) so it can be built quickly when needed.

Research

Full findings: agentevals-research/research/findings/test-framework-assertion-patterns/README.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    wontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions