22
33This bundle is generated from ` runs/official/ ` and includes only valid scored tasks (` passed ` /` failed ` with numeric reward).
44
5- Generated: ` 2026-03-01T13:28:24.423549 +00:00 `
5+ Generated: ` 2026-03-01T19:12:02.665318 +00:00 `
66
77## Local Browse
88
@@ -19,16 +19,16 @@ Historical reruns/backfills remain available in `data/official_results.json` und
1919| ---| ---| ---:| ---:| ---:| ---:| ---|
2020| [ ccb_build] ( suites/ccb_build.md ) | ` baseline-local-direct ` | 23 | 23 | 0.580 | 0.783 | ok |
2121| [ ccb_build] ( suites/ccb_build.md ) | ` mcp-remote-direct ` | 20 | 23 | 0.592 | 0.800 | FLAG: below minimum |
22- | [ ccb_debug] ( suites/ccb_debug.md ) | ` baseline-local-direct ` | 16 | 20 | 0.739 | 1.000 | FLAG: below minimum |
23- | [ ccb_debug] ( suites/ccb_debug.md ) | ` mcp-remote-direct ` | 16 | 20 | 0.559 | 0.688 | FLAG: below minimum |
24- | [ ccb_design] ( suites/ccb_design.md ) | ` baseline-local-direct ` | 20 | 20 | 0.766 | 1.000 | ok |
25- | [ ccb_design] ( suites/ccb_design.md ) | ` mcp-remote-direct ` | 33 | 20 | 0.741 | 1.000 | ok |
26- | [ ccb_document] ( suites/ccb_document.md ) | ` baseline-local-direct ` | 20 | 20 | 0.890 | 1.000 | ok |
27- | [ ccb_document] ( suites/ccb_document.md ) | ` mcp-remote-direct ` | 44 | 20 | 0.841 | 1.000 | ok |
22+ | [ ccb_debug] ( suites/ccb_debug.md ) | ` baseline-local-direct ` | 20 | 20 | 0.688 | 1.000 | ok |
23+ | [ ccb_debug] ( suites/ccb_debug.md ) | ` mcp-remote-direct ` | 25 | 20 | 0.510 | 0.720 | ok |
24+ | [ ccb_design] ( suites/ccb_design.md ) | ` baseline-local-direct ` | 20 | 20 | 0.770 | 1.000 | ok |
25+ | [ ccb_design] ( suites/ccb_design.md ) | ` mcp-remote-direct ` | 33 | 20 | 0.720 | 0.970 | ok |
26+ | [ ccb_document] ( suites/ccb_document.md ) | ` baseline-local-direct ` | 20 | 20 | 0.845 | 1.000 | ok |
27+ | [ ccb_document] ( suites/ccb_document.md ) | ` mcp-remote-direct ` | 44 | 20 | 0.875 | 1.000 | ok |
2828| [ ccb_feature] ( suites/ccb_feature.md ) | ` baseline-local-direct ` | 20 | 20 | 0.631 | 0.850 | ok |
2929| [ ccb_feature] ( suites/ccb_feature.md ) | ` mcp-remote-direct ` | 20 | 20 | 0.553 | 0.800 | ok |
3030| [ ccb_fix] ( suites/ccb_fix.md ) | ` baseline-local-direct ` | 25 | 25 | 0.450 | 0.600 | ok |
31- | [ ccb_fix] ( suites/ccb_fix.md ) | ` mcp-remote-direct ` | 70 | 25 | 0.572 | 0.714 | ok |
31+ | [ ccb_fix] ( suites/ccb_fix.md ) | ` mcp-remote-direct ` | 72 | 25 | 0.556 | 0.694 | ok |
3232| [ ccb_mcp_compliance] ( suites/ccb_mcp_compliance.md ) | ` baseline-local-artifact ` | 1 | 28 | 0.375 | 1.000 | FLAG: below minimum |
3333| [ ccb_mcp_compliance] ( suites/ccb_mcp_compliance.md ) | ` baseline-local-direct ` | 7 | 28 | 0.648 | 1.000 | FLAG: below minimum |
3434| [ ccb_mcp_compliance] ( suites/ccb_mcp_compliance.md ) | ` mcp-remote-artifact ` | 1 | 28 | 0.742 | 1.000 | FLAG: below minimum |
@@ -65,13 +65,13 @@ Historical reruns/backfills remain available in `data/official_results.json` und
6565| [ ccb_mcp_security] ( suites/ccb_mcp_security.md ) | ` baseline-local-direct ` | 10 | 25 | 0.524 | 0.800 | FLAG: below minimum |
6666| [ ccb_mcp_security] ( suites/ccb_mcp_security.md ) | ` mcp-remote-artifact ` | 6 | 25 | 0.792 | 1.000 | FLAG: below minimum |
6767| [ ccb_mcp_security] ( suites/ccb_mcp_security.md ) | ` mcp-remote-direct ` | 25 | 25 | 0.719 | 1.000 | ok |
68- | [ ccb_refactor] ( suites/ccb_refactor.md ) | ` baseline-local-direct ` | 20 | 20 | 0.755 | 1.000 | ok |
69- | [ ccb_refactor] ( suites/ccb_refactor.md ) | ` mcp-remote-direct ` | 20 | 20 | 0.671 | 1.000 | ok |
70- | [ ccb_secure] ( suites/ccb_secure.md ) | ` baseline-local-direct ` | 20 | 20 | 0.669 | 0.950 | ok |
71- | [ ccb_secure] ( suites/ccb_secure.md ) | ` mcp-remote-direct ` | 24 | 20 | 0.637 | 0.917 | ok |
72- | [ ccb_test] ( suites/ccb_test.md ) | ` baseline-local-direct ` | 20 | 20 | 0.480 | 0.750 | ok |
73- | [ ccb_test] ( suites/ccb_test.md ) | ` mcp-remote-direct ` | 35 | 20 | 0.475 | 0.686 | ok |
74- | [ ccb_understand] ( suites/ccb_understand.md ) | ` baseline-local-direct ` | 34 | 20 | 0.771 | 0.853 | ok |
68+ | [ ccb_refactor] ( suites/ccb_refactor.md ) | ` baseline-local-direct ` | 20 | 20 | 0.789 | 0.950 | ok |
69+ | [ ccb_refactor] ( suites/ccb_refactor.md ) | ` mcp-remote-direct ` | 20 | 20 | 0.703 | 1.000 | ok |
70+ | [ ccb_secure] ( suites/ccb_secure.md ) | ` baseline-local-direct ` | 20 | 20 | 0.712 | 1.000 | ok |
71+ | [ ccb_secure] ( suites/ccb_secure.md ) | ` mcp-remote-direct ` | 24 | 20 | 0.707 | 0.958 | ok |
72+ | [ ccb_test] ( suites/ccb_test.md ) | ` baseline-local-direct ` | 20 | 20 | 0.484 | 0.700 | ok |
73+ | [ ccb_test] ( suites/ccb_test.md ) | ` mcp-remote-direct ` | 35 | 20 | 0.468 | 0.686 | ok |
74+ | [ ccb_understand] ( suites/ccb_understand.md ) | ` baseline-local-direct ` | 34 | 20 | 0.902 | 0.971 | ok |
7575| [ ccb_understand] ( suites/ccb_understand.md ) | ` mcp-remote-direct ` | 48 | 20 | 0.873 | 0.979 | ok |
7676
7777<details >
@@ -317,13 +317,35 @@ Historical reruns/backfills remain available in `data/official_results.json` und
317317| [ debug_haiku_20260228_231033] ( runs/debug_haiku_20260228_231033.md ) | ` ccb_debug ` | ` mcp-remote-direct ` | 10 | 0.804 | 1.000 |
318318| [ debug_haiku_20260301_021540] ( runs/debug_haiku_20260301_021540.md ) | ` ccb_debug ` | ` baseline-local-direct ` | 11 | 0.847 | 1.000 |
319319| [ debug_haiku_20260301_021540] ( runs/debug_haiku_20260301_021540.md ) | ` ccb_debug ` | ` mcp-remote-direct ` | 11 | 0.813 | 1.000 |
320+ | [ debug_haiku_20260301_030159] ( runs/debug_haiku_20260301_030159.md ) | ` ccb_debug ` | ` baseline-local-direct ` | 11 | 0.837 | 1.000 |
321+ | [ debug_haiku_20260301_030159] ( runs/debug_haiku_20260301_030159.md ) | ` ccb_debug ` | ` mcp-remote-direct ` | 11 | 0.801 | 1.000 |
322+ | [ debug_haiku_20260301_031844] ( runs/debug_haiku_20260301_031844.md ) | ` ccb_debug ` | ` baseline-local-direct ` | 11 | 0.806 | 1.000 |
323+ | [ debug_haiku_20260301_031844] ( runs/debug_haiku_20260301_031844.md ) | ` ccb_debug ` | ` mcp-remote-direct ` | 11 | 0.750 | 1.000 |
324+ | [ debug_haiku_20260301_033225] ( runs/debug_haiku_20260301_033225.md ) | ` ccb_debug ` | ` baseline-local-direct ` | 9 | 0.444 | 0.889 |
325+ | [ debug_haiku_20260301_033225] ( runs/debug_haiku_20260301_033225.md ) | ` ccb_debug ` | ` mcp-remote-direct ` | 9 | 0.389 | 0.778 |
326+ | [ debug_haiku_20260301_035030] ( runs/debug_haiku_20260301_035030.md ) | ` ccb_debug ` | ` baseline-local-direct ` | 9 | 0.333 | 0.667 |
327+ | [ debug_haiku_20260301_035030] ( runs/debug_haiku_20260301_035030.md ) | ` ccb_debug ` | ` mcp-remote-direct ` | 9 | 0.278 | 0.556 |
328+ | [ debug_haiku_20260301_040300] ( runs/debug_haiku_20260301_040300.md ) | ` ccb_debug ` | ` baseline-local-direct ` | 9 | 0.500 | 1.000 |
329+ | [ debug_haiku_20260301_040300] ( runs/debug_haiku_20260301_040300.md ) | ` ccb_debug ` | ` mcp-remote-direct ` | 9 | 0.389 | 0.778 |
330+ | [ debug_haiku_20260301_071226] ( runs/debug_haiku_20260301_071226.md ) | ` ccb_debug ` | ` baseline-local-direct ` | 11 | 0.842 | 1.000 |
331+ | [ debug_haiku_20260301_071226] ( runs/debug_haiku_20260301_071226.md ) | ` ccb_debug ` | ` mcp-remote-direct ` | 11 | 0.841 | 1.000 |
320332| [ design_haiku_20260223_124652] ( runs/design_haiku_20260223_124652.md ) | ` ccb_design ` | ` baseline-local-direct ` | 13 | 0.770 | 1.000 |
321333| [ design_haiku_20260223_124652] ( runs/design_haiku_20260223_124652.md ) | ` ccb_design ` | ` mcp-remote-direct ` | 20 | 0.718 | 1.000 |
322334| [ design_haiku_20260301_022406] ( runs/design_haiku_20260301_022406.md ) | ` ccb_design ` | ` baseline-local-direct ` | 20 | 0.766 | 1.000 |
323335| [ design_haiku_20260301_022406] ( runs/design_haiku_20260301_022406.md ) | ` ccb_design ` | ` mcp-remote-direct ` | 20 | 0.734 | 1.000 |
336+ | [ design_haiku_20260301_031030] ( runs/design_haiku_20260301_031030.md ) | ` ccb_design ` | ` baseline-local-direct ` | 20 | 0.762 | 0.950 |
337+ | [ design_haiku_20260301_031030] ( runs/design_haiku_20260301_031030.md ) | ` ccb_design ` | ` mcp-remote-direct ` | 20 | 0.747 | 1.000 |
338+ | [ design_haiku_20260301_031845] ( runs/design_haiku_20260301_031845.md ) | ` ccb_design ` | ` baseline-local-direct ` | 20 | 0.807 | 1.000 |
339+ | [ design_haiku_20260301_031845] ( runs/design_haiku_20260301_031845.md ) | ` ccb_design ` | ` mcp-remote-direct ` | 19 | 0.701 | 1.000 |
340+ | [ design_haiku_20260301_071227] ( runs/design_haiku_20260301_071227.md ) | ` ccb_design ` | ` baseline-local-direct ` | 20 | 0.770 | 1.000 |
341+ | [ design_haiku_20260301_071227] ( runs/design_haiku_20260301_071227.md ) | ` ccb_design ` | ` mcp-remote-direct ` | 20 | 0.699 | 0.950 |
324342| [ document_haiku_20260223_164240] ( runs/document_haiku_20260223_164240.md ) | ` ccb_document ` | ` baseline-local-direct ` | 19 | 0.851 | 1.000 |
325343| [ document_haiku_20260223_164240] ( runs/document_haiku_20260223_164240.md ) | ` ccb_document ` | ` mcp-remote-direct ` | 20 | 0.822 | 1.000 |
326344| [ document_haiku_20260226_013910] ( runs/document_haiku_20260226_013910.md ) | ` ccb_document ` | ` baseline-local-direct ` | 1 | 1.000 | 1.000 |
345+ | [ document_haiku_20260301_031846] ( runs/document_haiku_20260301_031846.md ) | ` ccb_document ` | ` baseline-local-direct ` | 20 | 0.875 | 1.000 |
346+ | [ document_haiku_20260301_031846] ( runs/document_haiku_20260301_031846.md ) | ` ccb_document ` | ` mcp-remote-direct ` | 20 | 0.908 | 1.000 |
347+ | [ document_haiku_20260301_071228] ( runs/document_haiku_20260301_071228.md ) | ` ccb_document ` | ` baseline-local-direct ` | 20 | 0.845 | 1.000 |
348+ | [ document_haiku_20260301_071228] ( runs/document_haiku_20260301_071228.md ) | ` ccb_document ` | ` mcp-remote-direct ` | 20 | 0.898 | 1.000 |
327349| [ feature_haiku_20260228_190114] ( runs/feature_haiku_20260228_190114.md ) | ` ccb_feature ` | ` baseline-local-direct ` | 5 | 0.507 | 0.600 |
328350| [ feature_haiku_20260228_190114] ( runs/feature_haiku_20260228_190114.md ) | ` ccb_feature ` | ` mcp-remote-direct ` | 6 | 0.550 | 0.833 |
329351| [ feature_haiku_20260228_211127] ( runs/feature_haiku_20260228_211127.md ) | ` ccb_feature ` | ` baseline-local-direct ` | 17 | 0.664 | 0.941 |
@@ -343,6 +365,8 @@ Historical reruns/backfills remain available in `data/official_results.json` und
343365| [ feature_haiku_20260301_071229] ( runs/feature_haiku_20260301_071229.md ) | ` ccb_feature ` | ` baseline-local-direct ` | 20 | 0.631 | 0.850 |
344366| [ feature_haiku_20260301_071229] ( runs/feature_haiku_20260301_071229.md ) | ` ccb_feature ` | ` mcp-remote-direct ` | 19 | 0.582 | 0.842 |
345367| [ feature_haiku_vscode_rerun_20260301_023018] ( runs/feature_haiku_vscode_rerun_20260301_023018.md ) | ` ccb_feature ` | ` baseline-local-direct ` | 1 | 0.500 | 1.000 |
368+ | [ fix_haiku_20260301_190026] ( runs/fix_haiku_20260301_190026.md ) | ` ccb_fix ` | ` baseline-local-direct ` | 2 | 0.000 | 0.000 |
369+ | [ fix_haiku_20260301_190026] ( runs/fix_haiku_20260301_190026.md ) | ` ccb_fix ` | ` mcp-remote-direct ` | 2 | 0.000 | 0.000 |
346370| [ refactor_haiku_20260228_210652] ( runs/refactor_haiku_20260228_210652.md ) | ` ccb_refactor ` | ` baseline-local-direct ` | 1 | 0.750 | 1.000 |
347371| [ refactor_haiku_20260228_210652] ( runs/refactor_haiku_20260228_210652.md ) | ` ccb_refactor ` | ` mcp-remote-direct ` | 1 | 0.790 | 1.000 |
348372| [ refactor_haiku_20260228_230116] ( runs/refactor_haiku_20260228_230116.md ) | ` ccb_refactor ` | ` mcp-remote-direct ` | 6 | 0.585 | 1.000 |
@@ -355,19 +379,33 @@ Historical reruns/backfills remain available in `data/official_results.json` und
355379| [ refactor_haiku_20260301_023530] ( runs/refactor_haiku_20260301_023530.md ) | ` ccb_refactor ` | ` mcp-remote-direct ` | 10 | 0.717 | 0.900 |
356380| [ refactor_haiku_20260301_031849] ( runs/refactor_haiku_20260301_031849.md ) | ` ccb_refactor ` | ` baseline-local-direct ` | 20 | 0.755 | 1.000 |
357381| [ refactor_haiku_20260301_031849] ( runs/refactor_haiku_20260301_031849.md ) | ` ccb_refactor ` | ` mcp-remote-direct ` | 20 | 0.671 | 1.000 |
382+ | [ refactor_haiku_20260301_071230] ( runs/refactor_haiku_20260301_071230.md ) | ` ccb_refactor ` | ` baseline-local-direct ` | 20 | 0.789 | 0.950 |
383+ | [ refactor_haiku_20260301_071230] ( runs/refactor_haiku_20260301_071230.md ) | ` ccb_refactor ` | ` mcp-remote-direct ` | 19 | 0.713 | 1.000 |
358384| [ secure_haiku_20260223_232545] ( runs/secure_haiku_20260223_232545.md ) | ` ccb_secure ` | ` baseline-local-direct ` | 20 | 0.669 | 0.950 |
359385| [ secure_haiku_20260223_232545] ( runs/secure_haiku_20260223_232545.md ) | ` ccb_secure ` | ` mcp-remote-direct ` | 18 | 0.705 | 1.000 |
360386| [ secure_haiku_20260224_011825] ( runs/secure_haiku_20260224_011825.md ) | ` ccb_secure ` | ` mcp-remote-direct ` | 2 | 0.500 | 0.500 |
387+ | [ secure_haiku_20260301_031850] ( runs/secure_haiku_20260301_031850.md ) | ` ccb_secure ` | ` baseline-local-direct ` | 20 | 0.737 | 0.950 |
388+ | [ secure_haiku_20260301_031850] ( runs/secure_haiku_20260301_031850.md ) | ` ccb_secure ` | ` mcp-remote-direct ` | 20 | 0.728 | 1.000 |
389+ | [ secure_haiku_20260301_071231] ( runs/secure_haiku_20260301_071231.md ) | ` ccb_secure ` | ` baseline-local-direct ` | 20 | 0.712 | 1.000 |
390+ | [ secure_haiku_20260301_071231] ( runs/secure_haiku_20260301_071231.md ) | ` ccb_secure ` | ` mcp-remote-direct ` | 20 | 0.767 | 1.000 |
361391| [ test_haiku_20260223_235732] ( runs/test_haiku_20260223_235732.md ) | ` ccb_test ` | ` baseline-local-direct ` | 10 | 0.492 | 0.800 |
362392| [ test_haiku_20260223_235732] ( runs/test_haiku_20260223_235732.md ) | ` ccb_test ` | ` mcp-remote-direct ` | 19 | 0.495 | 0.684 |
363393| [ test_haiku_20260224_011816] ( runs/test_haiku_20260224_011816.md ) | ` ccb_test ` | ` baseline-local-direct ` | 11 | 0.295 | 0.545 |
364394| [ test_haiku_20260224_011816] ( runs/test_haiku_20260224_011816.md ) | ` ccb_test ` | ` mcp-remote-direct ` | 11 | 0.262 | 0.455 |
365395| [ test_haiku_20260228_230654] ( runs/test_haiku_20260228_230654.md ) | ` ccb_test ` | ` mcp-remote-direct ` | 1 | 0.000 | 0.000 |
366396| [ test_haiku_20260228_231039] ( runs/test_haiku_20260228_231039.md ) | ` ccb_test ` | ` mcp-remote-direct ` | 1 | 0.200 | 1.000 |
397+ | [ test_haiku_20260301_031851] ( runs/test_haiku_20260301_031851.md ) | ` ccb_test ` | ` baseline-local-direct ` | 17 | 0.571 | 0.824 |
398+ | [ test_haiku_20260301_031851] ( runs/test_haiku_20260301_031851.md ) | ` ccb_test ` | ` mcp-remote-direct ` | 8 | 0.769 | 1.000 |
399+ | [ test_haiku_20260301_071232] ( runs/test_haiku_20260301_071232.md ) | ` ccb_test ` | ` baseline-local-direct ` | 17 | 0.569 | 0.824 |
400+ | [ test_haiku_20260301_071232] ( runs/test_haiku_20260301_071232.md ) | ` ccb_test ` | ` mcp-remote-direct ` | 8 | 0.780 | 1.000 |
367401| [ understand_haiku_20260224_001815] ( runs/understand_haiku_20260224_001815.md ) | ` ccb_understand ` | ` baseline-local-direct ` | 20 | 0.533 | 0.650 |
368402| [ understand_haiku_20260224_001815] ( runs/understand_haiku_20260224_001815.md ) | ` ccb_understand ` | ` mcp-remote-direct ` | 20 | 0.679 | 0.850 |
369403| [ understand_haiku_20260225_211346] ( runs/understand_haiku_20260225_211346.md ) | ` ccb_understand ` | ` baseline-local-direct ` | 7 | 0.789 | 1.000 |
370404| [ understand_haiku_20260225_211346] ( runs/understand_haiku_20260225_211346.md ) | ` ccb_understand ` | ` mcp-remote-direct ` | 7 | 0.870 | 1.000 |
405+ | [ understand_haiku_20260301_031852] ( runs/understand_haiku_20260301_031852.md ) | ` ccb_understand ` | ` baseline-local-direct ` | 20 | 0.728 | 0.850 |
406+ | [ understand_haiku_20260301_031852] ( runs/understand_haiku_20260301_031852.md ) | ` ccb_understand ` | ` mcp-remote-direct ` | 20 | 0.832 | 0.950 |
407+ | [ understand_haiku_20260301_071233] ( runs/understand_haiku_20260301_071233.md ) | ` ccb_understand ` | ` baseline-local-direct ` | 20 | 0.884 | 1.000 |
408+ | [ understand_haiku_20260301_071233] ( runs/understand_haiku_20260301_071233.md ) | ` ccb_understand ` | ` mcp-remote-direct ` | 20 | 0.850 | 1.000 |
371409
372410</details >
373411
0 commit comments