Add coding agent benchmarks by himmi-01 · Pull Request #8 · Corbell-AI/evalmonkey

himmi-01 · 2026-05-20T04:37:26Z

Add coding agent benchmarks
Add category for benchmarks

…ring - Add dedicated coding loaders for human-eval, mbpp, apps, swe-bench with code-specific rubrics (function correctness, test passing, syntax) - Add 6 coding-agent-specific chaos profiles: code_context_strip, code_wrong_language, code_syntax_break, code_test_poison, code_incomplete_signature, code_conflicting_constraints - Add get_benchmarks_by_category() helper for category-based filtering - CLI: list-benchmarks --category Coding (or any category) + Category column - CLI: run-chaos-suite now includes all coding chaos profiles - Backend: /api/benchmarks?category=Coding optional filter - Frontend: coding chaos filter pills shown when Coding benchmark selected - 24 new unit tests covering all above (70 total, all passing)

himmi-01 force-pushed the feat/coding-agent-benchmarks branch from b320815 to a2d7c5c Compare May 20, 2026 04:39

himmi-01 merged commit 7cb5a57 into Corbell-AI:main May 20, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add coding agent benchmarks#8

Add coding agent benchmarks#8
himmi-01 merged 1 commit into
Corbell-AI:mainfrom
himmi-01:feat/coding-agent-benchmarks

himmi-01 commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

himmi-01 commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant