-
Notifications
You must be signed in to change notification settings - Fork 0
feat: shared assertion templates (include_assertions) #948
Description
Problem
Users frequently reuse the same assertion sets across multiple tests and eval files — safety checks, format validation, tone requirements. Currently they must copy-paste assertion blocks, leading to:
- Duplication across EVAL.yaml files
- Drift when updating shared criteria
- Verbose eval files that obscure test-specific logic
Proposed Design
Reusable assertion template files
# .agentv/templates/safe-response.yaml
assertions:
- type: llm-grader
prompt: ./graders/no-hallucination.md
required: true
- type: llm-grader
prompt: ./graders/safe-content.md
required: true
min_score: 0.9
- type: contains
value: "disclaimer"
negate: trueReference from EVAL.yaml
# evals/EVAL.yaml
tests:
- id: refund-request
input: "I want a refund"
assertions:
- include: safe-response # resolves .agentv/templates/safe-response.yaml
- type: llm-grader
prompt: ./graders/refund-quality.md # test-specific assertion
- id: greeting
input: "Hello"
assertions:
- include: safe-response # same shared template
- type: contains
value: "hello"Suite-level includes
# Apply to all tests
assertions:
- include: safe-response
tests:
- id: test-1
input: "..."
# inherits safe-response assertions + can add test-specific onesResolution rules
include: nameresolves to.agentv/templates/{name}.yaml- Relative paths also work:
include: ./my-templates/safety.yaml - Templates can include other templates (max depth 3 to prevent cycles)
- Test-level assertions merge with included assertions (not replace)
skip_defaults: trueon a test still skips suite-level includes
Template Location
Two resolution mechanisms, no config needed:
- Convention directory:
include: safe-responseresolves to.agentv/templates/safe-response.yaml - Relative path:
include: ./my-templates/safety.yamlresolves relative to the eval file
| Use case | How |
|---|---|
| Shared across repo | .agentv/templates/ (convention) |
| Co-located with evals | include: ./shared/safety.yaml |
| Shared across eval suites | include: ../../common/safety.yaml |
| Monorepo shared | include: ../../../packages/evals/templates/safety.yaml |
Relative paths are explicit and traceable — no ambiguity about which directory a template resolved from.
Implementation
Files to modify
packages/core/src/evaluation/validation/eval-file.schema.ts— addincludevariant toEvaluatorSchemapackages/core/src/evaluation/loaders/evaluator-parser.ts— resolveincludereferences, load template files, flatten into assertion listpackages/core/src/evaluation/yaml-parser.ts— handle include resolution during test loading
Template discovery
.agentv/templates/ # convention directory
safe-response.yaml
json-output.yaml
professional-tone.yaml
Discovery chain (closest wins): {eval-dir}/.agentv/templates/ → {repo-root}/.agentv/templates/
Template file format
Same as an assertion block — a YAML file with a top-level assertions array. Each entry is a standard evaluator config. This keeps templates authorable by AI agents (same schema as inline assertions).
Research Context
Inspired by RSpec's shared_examples pattern — define reusable test behaviors that can be included with context. See test-framework-assertion-patterns research.
Dependencies
- feat: configurable threshold + naming taxonomy cleanup #925 must land first (introduces
min_scorefield used in template examples)
Acceptance Signals
include: nameresolves correctly — loads from.agentv/templates/{name}.yaml— verified by testinclude: ./pathresolves relative paths — verified by test- Suite-level includes apply to all tests unless
skip_defaults: true— verified by test - Test-level includes merge with test-specific assertions — verified by test
- Nested includes work up to depth 3; depth > 3 produces a clear error — verified by test
- Missing template produces a clear error with the resolved path — verified by test
- Schema validation accepts
includeentries in assertion arrays — verified by test eval-schema.jsonregenerated with include support- All existing tests pass — no regressions
Non-goals
- Template parameters/variables (keep it simple — templates are static assertion sets)
- Template versioning
- Remote template registries
- Cross-repo template sharing (use relative paths with monorepo layout)
- Configurable
template_dirsin config.yaml (relative paths already cover custom locations — a config-based search path creates "which directory did this resolve from?" debugging headaches) - Template inheritance/override (templates are flat assertion lists, not class hierarchies)