Skip to content

Add coding agent benchmarks#8

Merged
himmi-01 merged 1 commit into
Corbell-AI:mainfrom
himmi-01:feat/coding-agent-benchmarks
May 20, 2026
Merged

Add coding agent benchmarks#8
himmi-01 merged 1 commit into
Corbell-AI:mainfrom
himmi-01:feat/coding-agent-benchmarks

Conversation

@himmi-01
Copy link
Copy Markdown
Contributor

  • Add coding agent benchmarks
  • Add category for benchmarks

#7

…ring

- Add dedicated coding loaders for human-eval, mbpp, apps, swe-bench
  with code-specific rubrics (function correctness, test passing, syntax)
- Add 6 coding-agent-specific chaos profiles: code_context_strip,
  code_wrong_language, code_syntax_break, code_test_poison,
  code_incomplete_signature, code_conflicting_constraints
- Add get_benchmarks_by_category() helper for category-based filtering
- CLI: list-benchmarks --category Coding (or any category) + Category column
- CLI: run-chaos-suite now includes all coding chaos profiles
- Backend: /api/benchmarks?category=Coding optional filter
- Frontend: coding chaos filter pills shown when Coding benchmark selected
- 24 new unit tests covering all above (70 total, all passing)
@himmi-01 himmi-01 force-pushed the feat/coding-agent-benchmarks branch from b320815 to a2d7c5c Compare May 20, 2026 04:39
@himmi-01 himmi-01 merged commit 7cb5a57 into Corbell-AI:main May 20, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant