Skip to content

Pre-scan should check certificate replay: previous-run verdict tokens in PR content #5

Description

@cschanhniem

In section 0 (Operating Context), the deterministic pre-scan checks for known injection markers including verdict tokens from section 3.8. This is correct for first-order injection. But there is a second-order vector: certificate replay.

A multi-agent verification pipeline (per section 0.1, agents B/C/D/E) is concurrent -- agents may run simultaneously or with overlapping state. A PR generated by Agent A that contains machine-readable certificate tokens or verdict strings from a previous run (copied from another PR output or leaked from a shared state store) could pass the pre-scan because the tokens are structurally valid but sourced from the wrong context.

Attack scenario:

  1. Agent A writes a PR that includes a fabricated AutoApprove verdict token inside a code comment or Markdown code fence.
  2. The pre-scan checks whether the token matches known verdict tokens (section 3.8) -- it does, but the token was inserted by Agent A, not by the verification pipeline.
  3. Agent B encounters the token. Per the untrusted-input invariant, the token is inside pr_data fences and should not be executed. The pinning rule says if verdict-affecting strings are detected inside PR-derived content, axis 2.8 is forced to red. But if the pre-scan classified the token as structurally valid certificate output rather than suspicious injection, axis 2.8 stays green.

Suggestion: Add to the pre-scan structural anomaly check:

  • Cross-run token detection: if the diff contains tokens matching section 3.8 verdict values (AutoApprove, HumanReviewRecommended, HumanReviewRequired, CannotVerify), check whether they appear with valid certificate metadata (commit SHA, timestamp, agent identity). Any verdict token without a corresponding certificate binding is flagged as suspicious.
  • Temporal ordering check: an AutoApprove token in a PR diff that references a future or nonexistent commit SHA is a structural anomaly.

This is a narrow edge case (requires the attacker to have access to prior run outputs), but the fix is small -- a regex pass over the diff for the verdict token values followed by a commit_sha presence check -- and the invariant (certificate tokens in PR-derived content are untrusted input, not verdicts) is worth making explicit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions