coding-benchmark

Here are 6 public repositories matching this topic...

wd041216-bit / ai-benchmark-kb

AI Benchmark 知识库 — 全面收录各大 AI 公司用来测试模型性能的 Benchmark 题库完整集合

benchmark knowledge-base model-evaluation reasoning multimodal ai-benchmarks instruction-following llm long-context safety-evaluation ai-performance math-reasoning coding-benchmark benchmark-collection eval-frameworks

Updated Apr 16, 2026

redush-com / SaotriBench

Star

Saotri Bench — coding benchmark for evaluating LLM agents on multi-phase programming tasks with hidden requirements.

python benchmark machine-learning code-generation evaluation-framework ai-agents llm llm-evaluation agent-evaluation coding-benchmark

Updated Feb 21, 2026
Python

ArturasDedinas123 / local-llm-benchmark

Star

Benchmark of local LLMs (Qwen 3.6, Gemma 4) vs ChatGPT and Gemini on coding tasks. Apple M5, 32GB. Methodology, raw outputs, judge prompts, scores, and charts.

benchmark gemma mlx apple-silicon ai-evaluation llm local-llm ollama qwen coding-benchmark

Updated May 25, 2026
Jupyter Notebook

slappymambadoo / claude-code-local-qwen-case-study

Star

Raw logs of Claude Code running on local Qwen3.5-27B (llama.cpp). Builds a Python todo app with 50 tests. Real-world performance data: 30 min, cache thrashing, 38 t/s generation.

Updated Apr 14, 2026
Python

spfunctions / major-model-benchmark

Star

Open benchmark harness for latest major AI models on general reasoning, coding, tool use, and long-context tasks.

benchmark reasoning tool-use long-context llm-evaluation coding-benchmark major-models

Updated May 5, 2026
Python

dikatwoone / FluxCodeBench

Star

🔍 Evaluate LLM agents on multi-phase programming tasks with FluxCodeBench, focusing on hidden requirements, long-context retention, and iterative refinement.

python benchmark machine-learning code-generation evaluation-framework ai-agents llm llm-evaluation agent-evaluation coding-benchmark

Updated May 26, 2026
Python

Improve this page

Add a description, image, and links to the coding-benchmark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the coding-benchmark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coding-benchmark

Here are 6 public repositories matching this topic...

wd041216-bit / ai-benchmark-kb

redush-com / SaotriBench

ArturasDedinas123 / local-llm-benchmark

slappymambadoo / claude-code-local-qwen-case-study

spfunctions / major-model-benchmark

dikatwoone / FluxCodeBench

Improve this page

Add this topic to your repo