[Security hardening] Add automated security audit workflow#2442
[Security hardening] Add automated security audit workflow#2442PascalThuet wants to merge 36 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new GitHub Actions workflow to introduce automated security checks for the Python codebase, aiming to catch dependency advisories and high-severity static-analysis findings in CI alongside the existing test/lint workflows.
Changes:
- Add
.github/workflows/security.ymlwith a dedicatedSecurity Auditworkflow. - Run
pip-auditon pushes tomain, pull requests, a weekly cron, and manual dispatch. - Run Bandit against
src/, temporarily skippingB602pending the shell-step hardening tracked in #2440.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/security.yml |
New CI workflow that adds dependency-audit and static-analysis jobs for the Python project. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 1/1 changed files
- Comments generated: 5
mnriem
left a comment
There was a problem hiding this comment.
Please address Copilot feedback. If not applicable, please explain why
|
Addressed the Copilot feedback in the latest push:
Local validation: uv export --quiet --extra test --frozen --format requirements.txt --no-emit-project --output-file /tmp/spec-kit-audit-requirements.txt
uvx --from pip-audit==2.10.0 pip-audit -r /tmp/spec-kit-audit-requirements.txt --progress-spinner off
uvx --from bandit==1.9.4 bandit -r src -lll
git diff --checkResults: |
|
Added automated regression coverage for the security workflow in The new tests statically verify that:
Validation after this commit: uv run python -m pytest tests/test_security_workflow.py -q
uv export --quiet --extra test --frozen --format requirements.txt --no-emit-project --output-file /tmp/spec-kit-audit-requirements.txt
uvx --from pip-audit==2.10.0 pip-audit -r /tmp/spec-kit-audit-requirements.txt --progress-spinner off
uvx --from bandit==1.9.4 bandit -r src -lll
git diff --checkAll passed locally. |
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
src/specify_cli/init.py:414
- This second
# nosec B602suppresses Bandit for the non-capturing branch as well, so neither path throughrun_commandwill ever raise B602 again. If the intent is only to defer the current finding until #2440, the skip needs to stay in the workflow/configuration layer rather than permanently muting this call site in source.
# shell=True is only available to callers that opt in explicitly.
subprocess.run(cmd, check=check_return, shell=shell) # nosec B602
- Files reviewed: 5/5 changed files
- Comments generated: 5
|
Please address Copilot feedback. If not applicable, please explain why. Note the shell step should indeed be ignored |
|
Addressed the follow-up review in
Validation: uv export --quiet --extra test --format requirements.txt --no-emit-project --output-file /tmp/spec-kit-audit-requirements.txt
uvx --from pip-audit==2.10.0 pip-audit -r /tmp/spec-kit-audit-requirements.txt --progress-spinner off
uvx --from bandit==1.9.4 bandit -r src -lll --baseline .github/bandit-baseline.json
uv run python -m pytest tests/test_security_workflow.py tests/test_workflows.py -q
uvx ruff check src/
git diff --checkI also verified the |
|
One more small cleanup after re-review: pushed Reason: it still audits the runtime + Revalidated locally: uv pip compile pyproject.toml --extra test --quiet --output-file /tmp/spec-kit-audit-requirements.txt
uvx --from pip-audit==2.10.0 pip-audit -r /tmp/spec-kit-audit-requirements.txt --progress-spinner off
uvx --from bandit==1.9.4 bandit -r src -lll --baseline .github/bandit-baseline.json
uv run python -m pytest tests/test_security_workflow.py -q
git diff --checkAll passed. |
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
tests/test_security_workflow.py:91
- Bandit doesn't require the exact literal
# nosec B602to suppress this finding; plain# nosec,#nosec, and multi-ID forms also disable B602. This check leaves those suppression paths untested, so a future source-level bypass can slip in without failing CI.
)
def test_b602_is_not_suppressed_in_source(self):
source_text = "\n".join(
path.read_text(encoding="utf-8")
for path in (REPO_ROOT / "src").rglob("*.py")
)
- Files reviewed: 5/5 changed files
- Comments generated: 3
|
Please address Copilot feedback |
|
Addressed the latest Copilot review in
Validation: uv run python -m pytest tests/test_security_workflow.py -q
uv pip compile pyproject.toml --extra test --python-version 3.11 --generate-hashes --quiet --output-file /tmp/spec-kit-audit-py311.txt
uv pip compile pyproject.toml --extra test --python-version 3.12 --generate-hashes --quiet --output-file /tmp/spec-kit-audit-py312.txt
uv pip compile pyproject.toml --extra test --python-version 3.13 --generate-hashes --quiet --output-file /tmp/spec-kit-audit-py313.txt
uvx --from pip-audit==2.10.0 pip-audit --disable-pip --require-hashes -r /tmp/spec-kit-audit-py311.txt --progress-spinner off
uvx --from pip-audit==2.10.0 pip-audit --disable-pip --require-hashes -r /tmp/spec-kit-audit-py312.txt --progress-spinner off
uvx --from pip-audit==2.10.0 pip-audit --disable-pip --require-hashes -r /tmp/spec-kit-audit-py313.txt --progress-spinner off
uvx --from bandit==1.9.4 bandit -r src -lll --baseline .github/bandit-baseline.json
git diff --checkAll passed locally. |
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 5/5 changed files
- Comments generated: 5
mnriem
left a comment
There was a problem hiding this comment.
Please address Copilot feedback
|
Addressed the latest Copilot review round in commit aa02062:
Local validation passed:
The new GitHub workflow runs are currently waiting for fork PR approval ( |
There was a problem hiding this comment.
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 6/6 changed files
- Comments generated: 2
Polishes from the previous review pass that I noticed after pushing. - security.yml: drop the unneeded fetch-depth: 0 from the secret-scan checkout. detect-secrets-hook reads the working tree only — fetching full history slows the job without adding signal. - security.yml: add a follow-on step that surfaces the Bandit medium- severity informational pass in $GITHUB_STEP_SUMMARY. With continue-on-error: true the previous step never marks the job yellow/red, so findings were buried in the log; the summary now flags them with a⚠️ heading (or ✅ when clean) at the top of the run page. - CONTRIBUTING.md: document the new tooling and gates so contributors don't bounce off CI: - detect-secrets-hook command + how to regenerate .secrets.baseline - the bandit baseline label gate (security-baseline-change) - shellcheck --severity=error invocation - explicit note that committed security-audit-requirements.txt can drift purely from upstream package releases and needs periodic regeneration even on unrelated PRs.
Four hardening / robustness items raised during self-review of the PR. - check_bandit_baseline.py: normalize whitespace in the code-snippet hash that's part of each entry's identity. A bandit version bump that reformats the snippet (different number of context lines, different indentation) would otherwise make every baseline entry look "new", forcing the security-baseline-change label on every unrelated PR. - security.yml + check_secrets_baseline.py: symmetric growth gate on .secrets.baseline. detect-secrets-hook already blocks unknown secrets, but extending the baseline (whitelisting a new finding) was silent. Mirror the bandit gate — PR must carry secrets-baseline-change to acknowledge any new identity (filename + line + type + hashed_secret). - test_security_workflow.py: drop the brittle exact-name lookup for the blocking bandit step. The test now finds it by the baseline-arg signature, so future renames of the step don't silently bypass the --skip B602 check. Added _find_step_by_run_signature helper that insists on exactly one match. Strict assertions on OS matrix and tool version pins are kept — those are intentional security choices. - workflows/PUBLISHING.md: the shell-step NOTE in src/specify_cli/workflows/steps/shell/__init__.py points authors here for "security guidance", but the section didn't exist. Added an explicit "Security: shell steps execute arbitrary code" subsection covering the no-sandbox model, the inspect-before-install obligation, input-interpolation hygiene, and reviewer expectations.
Two items from the second self-review: - workflows/PUBLISHING.md: fix invented command name. The first draft recommended `specify workflow inspect <name>` which doesn't exist — the actual subcommand is `workflow info`, and even that only shows metadata (name/version/inputs/step IDs+types), never the shell `run` content. Replace with explicit guidance to read the raw workflow.yml directly when auditing shell steps. - tests/test_baseline_gates.py: new file. 12 unit tests covering both check_bandit_baseline.py and check_secrets_baseline.py — no PR base ref, introduction (baseline absent at base), identical baselines, growth without ack label, growth with ack label, swap attack (constant count, new identity), and (bandit-only) whitespace-only drift in the code snippet hash. The latter verifies the normalization added earlier protects against bandit reformatting its output.
Three polish items from the third self-review pass. - tests/test_upgrade.py: new TestBoundedRead class. Pins the contract that _fetch_latest_release_tag wraps the response body through read_response_limited with max_bytes=1024*1024. Protects the hardening against a silent revert to `resp.read()` in a future refactor (the extraction to _version.py during the last merge would have lost it if we hadn't caught it manually). - tests/test_baseline_gates.py: replace the two near-identical test classes with a parametrized TestSharedBaselineGate (bandit + secrets via a GateConfig dataclass and a `gate` fixture). Bandit-only quirks (no-base-ref short-circuit, whitespace-normalized identity) stay in TestBanditSpecific. Removes ~80 lines of duplication; the two scripts now exercise the same scenarios by construction so a divergence is caught the moment one drifts. - tests/test_baseline_gates.py: new shared scenario test_corrupt_json_at_base_falls_back_to_empty. Covers the except JSONDecodeError branch of _read_baseline_at — corrupt base doesn't crash the script; instead the head set becomes "all new" and the normal label gate fires. Was previously dead code from a coverage standpoint. 3009 passed (up from 3006 — 14 baseline tests now parametrized as 12 + 2 bandit-specific, plus 1 new bounded-read test).
Three micro-cleanups raised during review github#4 of my own work — no behavior change, just clarity. - Replace the __import__("specify_cli._download_security", fromlist=...) dance with a plain `import ... as _real_read_response_limited` at the top of the file. Easier to grep, no runtime difference. - Type the recorded dict explicitly and make max_bytes/label keyword- only without defaults on the spy. If a future refactor drops either argument the spy now raises TypeError immediately, instead of silently recording None and tripping the post-call assertion with a more confusing message. - Tighten the label check from fuzzy substring match ("github" in label.lower()) to exact equality ("GitHub latest release"). Both catch regressions; exact equality also catches typos.
Six items from the new Copilot pass. Three were latent bugs in the guardrails added by earlier commits, two are documentation/wording, one is parity coverage. Bugs - security.yml: the MEDIUM Bandit informational pass ran without --baseline, so the whitelisted HIGH B602 finding re-fired there on every run, turning the job summary into a permanent warning. Apply the same baseline to both passes; medium-only NEW findings now surface, as intended. - security.yml: the summary step ran with if: always() but the MEDIUM pass has the default if: success() — when the blocking HIGH step fails, the MEDIUM pass is skipped (outcome=skipped, not failure) and the summary wrote "✅ clean" anyway. Switch to a case statement that handles failure/success/skipped distinctly (⚠️ / ✅ / ⏭️). - check_bandit_baseline.py and check_secrets_baseline.py used `git show <head_ref>:` on both sides, so an unreadable/unfetched head ref returned empty results and the diff computed 0 new identities → fail-open. Read the head side from the working tree instead (CI is checked out at the PR head), fail-closed when the file is missing, and SystemExit on corrupt JSON. The base side keeps the lenient JSONDecodeError fallback because that's historical state we can't change. Wording - security.yml + CONTRIBUTING.md: both mentioned `# nosec` as a suppression mechanism, but tests/test_security_workflow.py:: test_bandit_nosec_is_not_suppressed_in_source explicitly forbids `# nosec` under src/. Replace with the actually-supported paths (bandit baseline for HIGH findings, `# noqa: S6xx` for ruff subprocess-shell rules) and flag the forbidden-comment policy. Parity coverage - tests/test_security_workflow.py: three new tests for the secret-scan job mirroring the dependency-audit / static-analysis coverage — detect-secrets-hook command, baseline path, excluded paths, growth gate env wiring (BASE only, no HEAD env), and fetch-depth: 0. - tests/test_workflows.py: regression test that WorkflowCatalog._fetch_single_catalog routes through read_response_limited with error_type=WorkflowCatalogError and label "workflow catalog". Mirrors TestBoundedRead for _fetch_latest_ release_tag and the equivalent test in test_integration_catalog.py. - tests/test_baseline_gates.py: two new fail-closed cases (head missing in working tree, head corrupt in working tree); drop the now-unused head_sha returns and the head env var from GateHandle.run. Note: Copilot also flagged "no tests on baseline gate scripts" — those tests already shipped in tests/test_baseline_gates.py (commit 2fd8071, posted before the review). Updated here with the new fail-closed cases. Tests: 3017 passed (was 3009).
- read_response_limited: read in a loop until EOF or one byte past the limit instead of a single read(max_bytes + 1). A server using chunked transfer encoding can return fewer bytes per read() than requested while streaming more than max_bytes total, defeating the single-read bound. Add regression tests for the short-read and within-limit paths. - _download_security: annotate _raise / _raise_from as NoReturn so type checkers treat call sites as unreachable. - Extract the duplicated is_https_or_localhost_http redirect-safety predicate into _download_security and import it from both _github_http and authentication/http so the rule lives in one place. - azure_devops: stop catching broad ValueError/KeyError around token acquisition; give the bounded read a dedicated _TokenResponseTooLarge type and catch only URLError, OSError, JSONDecodeError, and that type so unrelated programming errors still surface. - tests: make response mocks faithful streams (advancing cursor, b"" at EOF) so the bounded read loop terminates as it would against a real http.client.HTTPResponse. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- __init__.py preset/extension URL installs: give read_response_limited a domain error_type (PresetError / ExtensionError) and catch that instead of a blanket ValueError, so an oversized body is reported cleanly while unrelated ValueErrors surface as real errors. The extension catch now also covers install_from_zip's ValidationError (an ExtensionError). - _utils.run_command: rewrite the misleading docstring — shell=False is the only honoured mode; shell=True is rejected with ValueError, the parameter is retained only so existing keyword callers don't hit TypeError. - _download_security: document that the loopback allowance is an exact-string match (not an IP-range check), that read_response_limited's max_bytes default is the 50 MiB ceiling (callers with tighter budgets should pass an explicit value), and how _safe_zip_name handles single trailing-slash directory markers vs malformed empty segments. - authentication/http: comment the empty-hosts _StripAuthOnRedirect use as the HTTPS-downgrade guard on the unauthenticated path. - check_security_requirements: document the HEAD^ fallback failing safe (audit anyway) on shallow / single-commit checkouts. - security.yml: document the universal committed snapshot vs per-Python scheduled compile distinction. - tests: add a regression test that a symlink alongside benign members is rejected with no partial extraction to disk. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Audit follow-up to the download-hardening work, closing similar cases within the scope of this PR: - Add read_zip_member_limited() and use it for the inline extension.yml read in the extension *update* path (__init__.py). That read happened before install_from_zip()'s safe_extract_zip(), so a raw zf.open().read() bypassed the per-member size bound: a manifest declaring a huge file_size (few KB compressed, gigabytes uncompressed) would be fully loaded by yaml.safe_load. The helper rejects on declared size and reads bounded. - Route the Azure DevOps OAuth token request through a strict-redirect opener so a 307/308 redirect cannot forward the client_secret POST body to a non-HTTPS, non-loopback host. - Tests for the new helper and the updated ADO opener path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…P bytes Addresses the Copilot review on PR github#2442 and the same pattern elsewhere. - safe_extract_zip(): track the cumulative bytes actually written and fail past max_total_bytes, so the total-size bound holds even if member headers understate file_size (the declared-total check alone could be evaded). Mirrors the existing per-member written guard — defense-in-depth consistency. - Pass an explicit max_bytes to read_response_limited() at every JSON call site instead of inheriting the 50 MiB archive/payload default: * MAX_JSON_METADATA_BYTES (1 MiB): Azure AD token, GitHub release metadata, and the existing latest-release fetch (migrated off an inline literal). * MAX_JSON_CATALOG_BYTES (8 MiB): preset, extension, workflow and integration catalog fetches. Binary/archive downloads keep the 50 MiB ceiling. Both ceilings are centralized as documented constants in _download_security.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- pin actions/checkout to the repo-wide df4cb1c (v6.0.3) in lint.yml and security.yml - replace the ad-hoc ip_address loopback checks in the workflow add URL/catalog flows with the shared is_https_or_localhost_http predicate, so HTTP-on-loopback rules match the redirect handler - drop the empty member name from the zip dot-segment test: zipfile cannot write such an entry, the case crashed in the test itself
…gates - align the setup-uv pin in security.yml with test.yml (v8.2.0) - use is_https_or_localhost_http for the preset_add/extension_add URL checks and pass strict_redirects=True to the latest-release fetch and the release-asset resolver call sites - baseline gate scripts fail closed on unresolvable refs and git read errors instead of treating them as "baseline did not exist"; the security workflow re-runs on labeled/unlabeled so the ack label can turn the gate green without a push - regenerate the bandit baseline against HEAD (two entries referenced removed code, one had drifted); track baseline entries by file+test_id in tests so line drift no longer breaks them - raise ZIP size-limit errors outside the broad except in safe_extract_zip so an error_type subclassing OSError/RuntimeError cannot re-wrap them - tests: drop two redirect tests duplicated from test_authentication, move the downgrade test next to its siblings, assert the workflow catalog max_bytes, route OpenerDirector.open through urlopen in the modules that patch urlopen, add set -euo pipefail to the secret scan, misc cleanup (unused helper, redundant imports, EOF-less fake read)
is_https_or_localhost_http allows HTTP for localhost, 127.0.0.1 and ::1; the user-facing messages and the open_url docstring only said localhost.
The non-HTTPS redirect rejection in _StripAuthOnRedirect applies to every authenticated attempt regardless of strict_redirects; the flag only extends the same guard to the unauthenticated fallback. Document both guards on the class and correct the open_url docstring, which previously gated the whole scheme restriction under strict_redirects.
A URL without a hostname (e.g. https:///x) has no real target; reject it regardless of scheme. Folds main's hostless-HTTPS guard into the shared predicate so every download/redirect call site benefits.
|
Rebased the branch on current Pushed commit Local verification:
GitHub reports the PR as mergeable. No GitHub checks are currently reported for this branch. |
| Redirect scheme safety: every authenticated attempt goes through | ||
| ``_StripAuthOnRedirect``, which always rejects redirects to non-HTTPS | ||
| URLs (except HTTP to localhost / 127.0.0.1 / ::1, the hosts allowed by | ||
| ``is_https_or_localhost_http``). *strict_redirects* extends that same | ||
| scheme guard and the optional redirect validator to the unauthenticated | ||
| fallback; without it, the fallback follows redirects without that handler. |
| from specify_cli.authentication.http import ( | ||
| _StripAuthOnRedirect, | ||
| _validate_strict_redirect, | ||
| ) | ||
|
|
||
| # A 307/308 redirect preserves the POST body, which carries the | ||
| # client_secret. Reuse the package HTTPS-downgrade guard (empty host | ||
| # list means no auth header to strip, just the scheme check) so the | ||
| # secret can never be forwarded to a non-HTTPS, non-loopback host. | ||
| opener = urllib.request.build_opener( | ||
| _StripAuthOnRedirect((), _validate_strict_redirect) | ||
| ) |
|
Please address Copilot feedback and test & lint errors |
|
Updated on behalf of @PascalThuet by Codex (GPT-5). Commit: e3f0153 What changed:
Local validation:
GitHub created new pull_request runs for |
| from specify_cli._download_security import read_response_limited as _read_response_limited | ||
| from specify_cli.authentication.http import open_url as _open_url | ||
|
|
||
| with _open_url(from_url, timeout=60) as response: | ||
| zip_data = response.read() | ||
| zip_data = _read_response_limited( |
| try: | ||
| with self._open_url( | ||
| download_url, timeout=60, extra_headers=extra_headers | ||
| ) as response: | ||
| zip_data = response.read() | ||
| zip_data = read_response_limited( |
| try: | ||
| with self._open_url(download_url, timeout=60, extra_headers=extra_headers) as response: |
| try: | ||
| with self._open_url(entry.url, timeout=10) as response: |
| try: | ||
| import urllib.error | ||
|
|
||
| with self._open_url(catalog_url, timeout=10) as response: | ||
| catalog_data = json.loads(response.read()) | ||
| catalog_data = json.loads( | ||
| read_response_limited( |
| try: | ||
| with self._open_url(entry.url, timeout=10) as response: |
| try: | ||
| with self._open_url(catalog_url, timeout=10) as response: |
Summary
Security context
This creates a repeatable CI baseline for dependency advisories and high-severity static-analysis issues while preserving a scheduled live-resolution signal for newly published dependency problems. It also closes similar supply-chain and archive-handling gaps found during follow-up review.
Closes #2438
Validation