feat(agent-runtime): honor per-step gate on the coding lane (#301)#307
Merged
Conversation
The coding lane's inline post-hook gating now consults each declared verify_build / verify_lint step's `gate` through the runner's gate_status (made public from _gate_status) — one gate implementation for both lanes. A coding workflow declaring `gate: strict` now fails the task on a build failure; `informational` never gates. The three shipped coding workflows keep their exact pre-change effective gating, locked by a parity test matrix (inline == runner == legacy verdicts). The inline ordering is preserved: ensure_pr still runs after a gating verify failure so the agent's work surfaces as a reviewable PR. Refs #301
Contributor
Code Review: PR #307 — honor per-step
|
| Risk area | Verdict |
|---|---|
| Inline verdict vs. runner verdict parity | The PR's central claim is that _apply_post_hook_gates agrees with the runner's gate_status + on_failure outcome. Traced all (gate, on_failure, passed, was_passing_before, read_only) combos — they agree. ✓ |
| No behavior change for the 3 shipped workflows | Confirmed against the real YAMLs (new-task-v1/pr-iteration-v1: regression_only; pr-review-v1: informational + read_only). New code reproduces the old build_ok = passed or not build_before exactly across the full (build_passed, build_before) matrix. ✓ |
| Removed-behavior re-establishment | The deleted hardcoded block's invariants (read-only never gates, regression-only default, undeclared lint never gates, "already failing before" not counted as a regression) are all re-established. ✓ |
workflow=None fallback |
Reload failure → build keeps legacy regression-only, lint never gates; work is never stranded (preserves #296 finding #5). ✓ |
Cross-file breakage (_gate_status → gate_status rename) |
grep -rn "_gate_status" repo-wide: zero leftover references; correctly exported from workflow/__init__.py. ✓ |
Workflow | None annotation under TYPE_CHECKING |
Safe — pipeline.py has from __future__ import annotations. ✓ |
| Tests vacuous / wrong patch targets? | Ran them: 63 passed (29 new + 34 in the runner suite). TestShippedWorkflowParity loads the real YAMLs; TestRunTaskHonorsGate exercises run_task end-to-end and asserts status == "error" requires build_ok == False — not vacuous. ✓ |
Quality notes (not findings)
- The "consult
step.gateinline rather than route post-hooks through the runner" decision is the intentional feat(agent-runtime): honor per-stepgateon the coding lane (declarative steps decorative for verify_build/ensure_pr) #301 scope boundary (routing would strand committed work beforeensure_pr). It's documented in both the code andWORKFLOWS.md, with the deferred runner-unification rationale called out. Right altitude. - Gate semantics genuinely live in one place now (
gate_status), and the diff respectson_failure(advisorycontinue/skip_remainingdon't gate the verdict), matching the runner. No special-casing drift introduced. - Minor:
build_okkeeps its name even though it now also folds in lint gating — slightly under-named, but the function name (_apply_post_hook_gates) and call-site comment make intent clear. Not worth changing.
Ship it.
isadeks
approved these changes
Jun 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does (plain English)
After a coding agent finishes its work, the platform runs checks (like "does the build still pass?") before opening a pull request. Workflow authors can declare how strict each check is in the workflow YAML:
The bug: on coding tasks, that setting was silently ignored — the pipeline used hardcoded regression-only logic instead. You could write
gate: strictand a build the agent broke would still pass. The three built-in workflows behaved correctly only by coincidence (their declared gates happened to match the hardcoded behavior).The fix: the coding path now actually reads and applies the declared gate, using the same single piece of logic the repo-less lane already used — one definition of gate semantics, not two copies that can drift. Even when a check fails the task, the PR is still opened so the agent's work stays visible and reviewable, just marked failed. Tests prove
strict/informationalnow work as declared and that the three shipped workflows behave exactly as before.Summary
Fixes #301. On the coding lane,
steps[].gatewas declarative-only: the post-hook block inpipeline.run_taskhardcoded regression-only build gating, ignoring eachverify_*step's declaredgate. This PR makes the coding lane honor the declared gate through the same implementation the repo-less lane uses — the runner'sgate_status(formerly_gate_status, now public) — so gate semantics live in exactly one place.Decision (documented in
_apply_post_hook_gatesand WORKFLOWS.md §"Step kinds"): the inline path consultsstep.gateviagate_statusrather than routing the post-hooks through the runner's step handlers. Routing through the runner would change failure-path side effects (a gatingverify_buildwithon_failure: failstops the runner beforeensure_pr, stranding committed work with no PR) — that is the broader half-migrated-runner unification #301 explicitly defers. The inline ordering (verify →ensure_pralways runs) is preserved; only the task verdict honors the gate.Changes
agent/src/workflow/runner.py—_gate_status→gate_status(public; exported fromworkflow).agent/src/pipeline.py— new_apply_post_hook_gates(workflow, ...)replaces the hardcodedbuild_okblock:verify_build/verify_lintgate per theirgate(strict|regression_only|informational; unset =regression_only), only whenon_failure: fail(advisorycontinue/skip_remainingnever gate, matching the runner);verify_buildkeeps legacy regression-only gating; undeclaredverify_lintnever gates (legacy);read_onlyworkflows never gate (legacy);agent/tests/test_pipeline_post_hook_gates.py— 29 new tests.docs/design/WORKFLOWS.md— documents thegatefield, the one-implementation rule, and the deferred runner unification (+ synced Starlight mirror).Acceptance criteria
verify_build/verify_linthonor declaredgatevia the samegate_statuspath the repo-less lane uses — one implementation, not two.gate: strictfails the task on a build regression (and even on a pre-existing failure);informationalnever gates — proven end-to-end throughrun_task(TestRunTaskHonorsGate).TestShippedWorkflowParityasserts inline == runner == legacy verdicts across the full(build_passed, build_before)matrix forregression_onlyandread_onlycases.step.gateinline vs route through runner) documented in code and WORKFLOWS.md.Test plan
mise //agent:quality— 1004 passed (29 new), ruff clean, coverage 76.05%mise run build— full monorepo build green (CDK 1844 tests, CLI, docs sync + build)