Security: GitHub token handling in the open_pr worker

The feat_github_pr_worker worker (backend/workers/git_pr.py) is the only RelyLoop component that holds a long-lived GitHub credential (PAT) on disk. This doc enumerates the storage model, rotation procedures, scope requirements, and the AC-7 leak-prevention checklist that every code path must respect.

Storage model — per-repo `auth_ref` pattern

Each registered config repo has an auth_ref field that names a file under ./secrets/{auth_ref}. The worker reads the PAT at job time:

./secrets/
├── postgres_password           # infra_foundation
├── database_url                # infra_foundation
├── openai_api_key              # feat_llm_judgments (optional)
├── acme-prod-search-config     # config_repo "acme-prod" — auth_ref="acme-prod-search-config"
├── acme-staging-config         # config_repo "acme-staging" — auth_ref="acme-staging-config"
└── beta-team-config            # config_repo "beta-team"   — auth_ref="beta-team-config"

Each file holds ONE PAT scoped to ONE config repo. This is the killer- feature vs the older single-token model:

Blast-radius bounded. A compromised auth_ref exposes one repo; rotation touches one file, not the whole install.
Operator-side audit clarity. GitHub's audit log shows which PAT performed which commit — operators can map "PR opened by ghp_abc..." to "config_repo {name}" without grepping the worker source.
Independent rotation windows. Sensitive prod repos can be rotated on a quarterly schedule while less-sensitive dev repos stay on a longer cadence — no all-or-nothing trade-off.

The legacy GITHUB_TOKEN_FILE env var from infra_foundation was retired in chore_infra_foundation_github_token_file_retirement. The API now emits a startup WARN if it's still set in env, and the Settings field has been removed — pre-retirement installs see no functional change beyond the warning, but the env var should be dropped on the next deploy.

Rotation procedures

Routine rotation (planned)

Generate a replacement PAT on GitHub with the scopes from the next section.
Overwrite the secret file in-place:
```
echo "<new-pat>" > ./secrets/<auth_ref>
```
No service restart needed — backend/workers/git_pr.py:_read_pat reads the file fresh on every job.
Revoke the old PAT on GitHub.

Emergency rotation (suspected compromise)

Revoke first. Go to GitHub Settings → Developer settings → PATs → Delete the compromised token. Subsequent worker calls will get 401 from GitHub; pr_open_error will surface GITHUB_API_FAILED with the 401 response.

Wipe the local file:

: > ./secrets/<auth_ref>   # truncate without removing

Generate + write the new PAT (per "Routine rotation" steps 1–2).
Audit recent commits via git log --since=<compromise-window> on each branch the worker has pushed against the affected repo — force-push concerns are bounded by AC-4 (worker refuses to overwrite existing branches) but the operator should still verify no unexpected commits landed.
Re-trigger any pending proposals that failed during the window.

PAT scopes required

Scope	Why
`contents:write`	Push commits to the proposal branch (Step 12 of the worker contract).
`pull_requests:write`	Open PRs via `POST /repos/{owner}/{repo}/pulls` (Step 13).
`workflow:write`	OPTIONAL — only needed if the config repo has CI that runs on the proposal branch (worker commits don't touch `.github/workflows/` directly, but some setups gate other branches via workflow files).

Fine-grained PATs (github_pat_...) are the recommended format — the redaction regex (cycle-3 F2) covers both classic (ghp_/ghs_/gho_/ghu_/ghr_) and fine-grained prefixes.

Token-safe git invocations (cycle-1 F4)

Every git subprocess invocation in the worker uses the process-scoped env-var auth mechanism instead of embedding the PAT in argv or .git/config:

env = {
    "GIT_CONFIG_COUNT": "1",
    "GIT_CONFIG_KEY_0": "http.https://github.com/.extraheader",
    "GIT_CONFIG_VALUE_0": f"AUTHORIZATION: Bearer {token}",
}
subprocess.run(["git", "clone", "https://github.com/{owner}/{repo}.git", clone_dir], env=env, ...)

This pattern mirrors GitHub Actions' actions/checkout for the same reason: the token lives ONLY in the subprocess environment (visible to git and its children, not to ps / argv inspection, not persisted on disk via .git/config).

The git clone URL is the tokenless form https://github.com/{owner}/{repo}.git. The Authorization header arrives via the GIT_CONFIG_* env vars — never in the URL.

Log-line redaction (FR-5)

Every WARN/ERROR log line passes its error string through redact_token (defined in backend.app.domain.git.redaction). The global RedactTokensProcessor (wired into backend.app.core.logging at the structlog chain) is the defense-in-depth backstop — even a future log line that forgets explicit redaction gets scrubbed before the JSON renderer serializes it.

Redacted tokens are replaced with the literal string [REDACTED-GH-TOKEN] so grep through log archives is deterministic ("did this exfiltrate?" → grep for gh[a-z]_ or github_pat_; any hit is a regression).

AC-7 leak surfaces — full enumeration

The worker MUST guarantee the PAT never appears in any of the following 9 surfaces. The token-leak contract test (backend/tests/contract/test_token_never_leaks.py — Story 4.2; not yet shipped in this PR) covers each.

PR title — built from study.name / proposal id; no PAT input.
PR body — Markdown body composition uses only safe inputs (proposal/study/digest fields, config_diff); no PAT input.
Commit messages — built from proposal id + cluster + template names; no PAT input. Passed via git commit -F <tempfile> (NOT -m + shell-quoted args) for additional argv safety.
pr_url — populated from GitHub's response html_url field; no PAT input.
pr_open_error — every write through _safe_set_pr_open_error applies redact_token to the input string before persisting.
Worker log lines — explicit redact_token on every error string + the global RedactTokensProcessor backstop on the entire event_dict.
Subprocess argv — git invocations use the tokenless URL form with auth supplied via env vars (cycle-1 F4); the captured argv for subprocess.run calls NEVER contains the PAT.
Subprocess stdout / stderr — captured by subprocess.run(.., capture_output=True); the worker's _redact_subprocess_error helper applies redact_token to the captured streams before any log emission.
.git/config — the worker NEVER calls git config http.https://github.com/.extraheader ... (which would persist the token to disk). The auth header lives only in the subprocess environment, which is gone the moment git exits.

Operator verification checklist

When deploying a new RelyLoop install (or auditing an existing one):

Confirm each config_repo.auth_ref maps to a real file under ./secrets/ (verify via the POST /api/v1/config-repos 400 AUTH_REF_NOT_FOUND response if not).
Run grep -r 'gh[a-z]_\|github_pat_' ./logs/ against archived logs — any hit is a regression.
Run git -C ./data/repo-clones/<config_repo_id> config --get-all http.https://github.com/.extraheader — should return empty (header lives in env, not config).
Verify ps auxf during an active PR-open never shows the PAT in any git argv (use the production load-test or staging).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security: GitHub token handling in the open_pr worker

Storage model — per-repo `auth_ref` pattern

Rotation procedures

Routine rotation (planned)

Emergency rotation (suspected compromise)

PAT scopes required

Token-safe git invocations (cycle-1 F4)

Log-line redaction (FR-5)

AC-7 leak surfaces — full enumeration

Operator verification checklist

FilesExpand file tree

github-token-handling.md

Latest commit

History

github-token-handling.md

File metadata and controls

Security: GitHub token handling in the open_pr worker

Storage model — per-repo auth_ref pattern

Rotation procedures

Routine rotation (planned)

Emergency rotation (suspected compromise)

PAT scopes required

Token-safe git invocations (cycle-1 F4)

Log-line redaction (FR-5)

AC-7 leak surfaces — full enumeration

Operator verification checklist

Storage model — per-repo `auth_ref` pattern