Standards for building agents, better
-
Updated
Feb 22, 2026 - TypeScript
Standards for building agents, better
Agentic testing for agentic codebases
The definitive benchmark for AI agents on OpenClaw. 45 tasks across 4 tiers. Powered by MyClaw.ai
Ship agents you can audit.
The pre-flight check for AI agents
lintlang by Hermes Labs — static linter for AI agent configs, tool descriptions, and system prompts. HERM v1.1 scoring.
The open-source MultiAgentOps evaluation and verification harness for any industry business workflow.
infrastructure chaos to test the resilience of ai agents
GitHub template for agent-testable SaaS apps. Next.js 16 + shadcn/ui + Neon Postgres + agent-browser e2e testing via accessibility tree.
Deterministic runtime for agent evaluation
A living world where agents exist as participants alongside NPCs, internal actors, real service APIs, budgets, policies, and consequences.
Token-efficient stochastic testing for AI agents. 5-20x cost reduction. 10 framework adapters. Paper: arXiv:2603.02601
Playwright for AI Agents. Test what your agent DOES, not what it SAYS. YAML-first behavioral testing. Catch PII leaks, tool abuse, step explosions. 3200+ tests.
Diagnose your AI agents in production. Extract policies from prompts, evaluate traces, generate diagnostic reports.
Typed Kotlin DSL framework for AI agent systems.
Intent-first unit testing framework for AI agents in Node.js and TypeScript.
Qualitative benchmark suite for evaluating AI coding agents and orchestration paradigms on realistic, complex development tasks
Agent testing automation 🤖 by simulating users 👥 and agents 🤝 with judge ⚖️(langwatch-scenario)
Simulation environment for testing and validating autonomous agents
Evaluation and competition arena for testing agents, systems, or workflows in structured local-first scenarios.
Add a description, image, and links to the agent-testing topic page so that developers can more easily learn about it.
To associate your repository with the agent-testing topic, visit your repo's landing page and select "manage topics."