Skip to content

feat(agent-runtime): honor per-step gate on the coding lane (#301)#307

Merged
krokoko merged 2 commits into
mainfrom
feat/301-honor-gate-coding-lane
Jun 10, 2026
Merged

feat(agent-runtime): honor per-step gate on the coding lane (#301)#307
krokoko merged 2 commits into
mainfrom
feat/301-honor-gate-coding-lane

Conversation

@krokoko

@krokoko krokoko commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

What this does (plain English)

After a coding agent finishes its work, the platform runs checks (like "does the build still pass?") before opening a pull request. Workflow authors can declare how strict each check is in the workflow YAML:

  • strict — any failure fails the task.
  • regression_only — fails only if the agent broke something that was working before.
  • informational — reported, but never fails the task.

The bug: on coding tasks, that setting was silently ignored — the pipeline used hardcoded regression-only logic instead. You could write gate: strict and a build the agent broke would still pass. The three built-in workflows behaved correctly only by coincidence (their declared gates happened to match the hardcoded behavior).

The fix: the coding path now actually reads and applies the declared gate, using the same single piece of logic the repo-less lane already used — one definition of gate semantics, not two copies that can drift. Even when a check fails the task, the PR is still opened so the agent's work stays visible and reviewable, just marked failed. Tests prove strict/informational now work as declared and that the three shipped workflows behave exactly as before.


Summary

Fixes #301. On the coding lane, steps[].gate was declarative-only: the post-hook block in pipeline.run_task hardcoded regression-only build gating, ignoring each verify_* step's declared gate. This PR makes the coding lane honor the declared gate through the same implementation the repo-less lane uses — the runner's gate_status (formerly _gate_status, now public) — so gate semantics live in exactly one place.

Decision (documented in _apply_post_hook_gates and WORKFLOWS.md §"Step kinds"): the inline path consults step.gate via gate_status rather than routing the post-hooks through the runner's step handlers. Routing through the runner would change failure-path side effects (a gating verify_build with on_failure: fail stops the runner before ensure_pr, stranding committed work with no PR) — that is the broader half-migrated-runner unification #301 explicitly defers. The inline ordering (verify → ensure_pr always runs) is preserved; only the task verdict honors the gate.

Changes

  • agent/src/workflow/runner.py_gate_statusgate_status (public; exported from workflow).
  • agent/src/pipeline.py — new _apply_post_hook_gates(workflow, ...) replaces the hardcoded build_ok block:
  • agent/tests/test_pipeline_post_hook_gates.py — 29 new tests.
  • docs/design/WORKFLOWS.md — documents the gate field, the one-implementation rule, and the deferred runner unification (+ synced Starlight mirror).

Acceptance criteria

  • Coding lane verify_build/verify_lint honor declared gate via the same gate_status path the repo-less lane uses — one implementation, not two.
  • gate: strict fails the task on a build regression (and even on a pre-existing failure); informational never gates — proven end-to-end through run_task (TestRunTaskHonorsGate).
  • No behavior change for the three shipped coding workflows — TestShippedWorkflowParity asserts inline == runner == legacy verdicts across the full (build_passed, build_before) matrix for regression_only and read_only cases.
  • Choice (consult step.gate inline vs route through runner) documented in code and WORKFLOWS.md.

Test plan

  • mise //agent:quality — 1004 passed (29 new), ruff clean, coverage 76.05%
  • mise run build — full monorepo build green (CDK 1844 tests, CLI, docs sync + build)

The coding lane's inline post-hook gating now consults each declared
verify_build / verify_lint step's `gate` through the runner's gate_status
(made public from _gate_status) — one gate implementation for both lanes.
A coding workflow declaring `gate: strict` now fails the task on a build
failure; `informational` never gates. The three shipped coding workflows
keep their exact pre-change effective gating, locked by a parity test
matrix (inline == runner == legacy verdicts). The inline ordering is
preserved: ensure_pr still runs after a gating verify failure so the
agent's work surfaces as a reviewable PR.

Refs #301
@krokoko krokoko requested a review from a team as a code owner June 10, 2026 16:34
@isadeks

isadeks commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Code Review: PR #307 — honor per-step gate on the coding lane

Verdict: Approve. No blockers, no nits.

This is a clean, well-scoped PR. I reviewed the actual diff (7 files, +512/−51) against main, read the full enclosing functions, ran independent finder passes, and verified the key claims empirically — including running the test suite — rather than trusting the description.

What I checked

Risk area Verdict
Inline verdict vs. runner verdict parity The PR's central claim is that _apply_post_hook_gates agrees with the runner's gate_status + on_failure outcome. Traced all (gate, on_failure, passed, was_passing_before, read_only) combos — they agree. ✓
No behavior change for the 3 shipped workflows Confirmed against the real YAMLs (new-task-v1/pr-iteration-v1: regression_only; pr-review-v1: informational + read_only). New code reproduces the old build_ok = passed or not build_before exactly across the full (build_passed, build_before) matrix. ✓
Removed-behavior re-establishment The deleted hardcoded block's invariants (read-only never gates, regression-only default, undeclared lint never gates, "already failing before" not counted as a regression) are all re-established. ✓
workflow=None fallback Reload failure → build keeps legacy regression-only, lint never gates; work is never stranded (preserves #296 finding #5). ✓
Cross-file breakage (_gate_statusgate_status rename) grep -rn "_gate_status" repo-wide: zero leftover references; correctly exported from workflow/__init__.py. ✓
Workflow | None annotation under TYPE_CHECKING Safe — pipeline.py has from __future__ import annotations. ✓
Tests vacuous / wrong patch targets? Ran them: 63 passed (29 new + 34 in the runner suite). TestShippedWorkflowParity loads the real YAMLs; TestRunTaskHonorsGate exercises run_task end-to-end and asserts status == "error" requires build_ok == False — not vacuous. ✓

Quality notes (not findings)

  • The "consult step.gate inline rather than route post-hooks through the runner" decision is the intentional feat(agent-runtime): honor per-step gate on the coding lane (declarative steps decorative for verify_build/ensure_pr) #301 scope boundary (routing would strand committed work before ensure_pr). It's documented in both the code and WORKFLOWS.md, with the deferred runner-unification rationale called out. Right altitude.
  • Gate semantics genuinely live in one place now (gate_status), and the diff respects on_failure (advisory continue/skip_remaining don't gate the verdict), matching the runner. No special-casing drift introduced.
  • Minor: build_ok keeps its name even though it now also folds in lint gating — slightly under-named, but the function name (_apply_post_hook_gates) and call-site comment make intent clear. Not worth changing.

Ship it.

@krokoko krokoko requested a review from isadeks June 10, 2026 17:34
@krokoko krokoko enabled auto-merge June 10, 2026 17:36
@krokoko krokoko added this pull request to the merge queue Jun 10, 2026
Merged via the queue into main with commit 14820c8 Jun 10, 2026
6 of 7 checks passed
@krokoko krokoko deleted the feat/301-honor-gate-coding-lane branch June 10, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(agent-runtime): honor per-step gate on the coding lane (declarative steps decorative for verify_build/ensure_pr)

2 participants