diff --git a/README.md b/README.md index f662a82..f486179 100644 --- a/README.md +++ b/README.md @@ -8,8 +8,10 @@ Every 6 hours, the scheduled workflow in this repo: 1. Enumerates every skill in `coder/registry` (both the in-tree `.agents/skills/` format and the future external-sources format). 2. Shallow-clones each source repo. -3. Runs [NVIDIA SkillSpector](https://github.com/NVIDIA/SkillSpector) in - `--no-llm` static mode over the upstream content. +3. Runs [NVIDIA SkillSpector](https://github.com/NVIDIA/SkillSpector) over + the upstream content. The scheduled scan uses LLM semantic analysis + when the credential secret is configured, and falls back to + `--no-llm` static-only mode otherwise. 4. Builds a per-skill verdict (`clean`, `suspicious`, `malicious`, `unknown`) from `risk_score` plus the thresholds in `config.yaml`. 5. Builds the React SPA in `site/` and ships it together with @@ -60,6 +62,26 @@ Vite's dev proxy (see `site/vite.config.ts`) forwards `latest.json`, app sees real scanner output without CORS shenanigans. SPA routes such as `/skills/coder/setup` stay client-side. +## One-time setup on the repo + +Three things have to be configured once on the GitHub repo before the +scheduled scan publishes a useful result: + +1. **Settings > Pages**: set source to "GitHub Actions". The + `publish-pages` job in `scan.yaml` will fail until this is set. +2. **Settings > Actions**: workflow permissions "Read and write" so + `publish-release` can create the rolling `latest` release. +3. **Settings > Secrets and variables > Actions**: add the LLM + credential matching the provider in `config.yaml`'s + `scanners.skillspector.llm.provider`. For the default `anthropic` + provider this is `ANTHROPIC_API_KEY` (from + [console.anthropic.com](https://console.anthropic.com)). Without + the secret the scan still runs, but SkillSpector falls back to + `--no-llm` static-only mode and precision drops from roughly 87% + to roughly 70%. See `docs/CALIBRATION.md` for the precision + discussion. The optional `SLACK_WEBHOOK_URL` secret enables the + `notify-slack-on-failure` job; without it that job is a no-op. + ## Repo layout ```text @@ -97,7 +119,10 @@ This scanner is data-driven. To run it against a different registry: "GitHub Actions"). 4. Set Actions workflow permissions to "Read and write" so the publish-release job can create releases. -5. Enable Actions. +5. Add the LLM credential secret matching your chosen provider + (see "One-time setup on the repo" above). Optional; static-only + mode works without it. +6. Enable Actions. No source changes required for catalogue changes. @@ -115,7 +140,8 @@ SkillSpector's `risk_score` (0-100) is the only input. The thresholds are aligned to SkillSpector's own `HIGH` and `CRITICAL` bands; [`docs/CALIBRATION.md`](./docs/CALIBRATION.md) walks through the evidence (SkillSpector source, the ClawHub paper, our in-tree -catalogue) behind the chosen numbers. +catalogue) behind the chosen numbers, and the LLM-on-vs-off precision +discussion behind running the semantic pass on every scheduled scan. The architecture keeps room for additional scanners (gitleaks, Semgrep, VirusTotal Premium, etc.); adding one is a new module under `scanner/`, diff --git a/config.yaml b/config.yaml index b20462a..9cc2328 100644 --- a/config.yaml +++ b/config.yaml @@ -39,8 +39,43 @@ scanners: # so a bumper bot lives outside the loop until the upstream # publishes to PyPI and the pin can move into pyproject.toml. pin: "skillspector @ git+https://github.com/NVIDIA/SkillSpector.git@2eb844780ab163f01468ecf142c40a2ec0fcaec0" - flags: - - "--no-llm" + # Extra CLI flags passed to every SkillSpector invocation. Empty by + # default; the scan workflow appends --no-llm dynamically when the + # LLM credential secret is not set (see llm: block below). CI runs + # do not invoke SkillSpector live. + flags: [] + # SkillSpector ships a two-stage analyser: fast static rules followed + # by an optional LLM semantic pass. The LLM pass lifts precision + # from roughly 70% to roughly 87% per upstream docs by filtering + # context-aware false positives, classifying intent on prompt + # injection patterns, and producing human-readable explanations. + # + # The scheduled scan reads the credential matching the provider + # below from a repository secret. When the secret is configured, + # LLM mode is on. When the secret is missing, the workflow falls + # back to --no-llm automatically so a fresh fork is never broken + # by an unset secret. + # + # Provider options and the env var SkillSpector consumes: + # + # provider env var(s) + # anthropic ANTHROPIC_API_KEY (api.anthropic.com) + # anthropic_proxy ANTHROPIC_PROXY_API_KEY + ANTHROPIC_PROXY_ENDPOINT_URL + # openai OPENAI_API_KEY (+ OPENAI_BASE_URL for AI gateways) + # nv_build NVIDIA_INFERENCE_KEY (free; build.nvidia.com) + # + # Changing provider also requires updating the env block in + # .github/workflows/scan.yaml so the matching secret is wired in, + # and adding the secret under Settings > Secrets and variables > + # Actions. + llm: + provider: anthropic + # SkillSpector's bundled default for the anthropic provider is + # claude-opus-4-6. Sonnet 4.6 is roughly 5x cheaper than Opus and + # is well-suited for the finding-classification task the LLM pass + # actually does, so it is the better cost/quality choice for + # periodic scanning. Override here to pin a different revision. + model: "claude-sonnet-4-6" # Per-skill verdict policy. v1 has one input (SkillSpector risk_score). # When more scanners join the pipeline we add new threshold fields here diff --git a/docs/CALIBRATION.md b/docs/CALIBRATION.md index e4cf5a0..5091163 100644 --- a/docs/CALIBRATION.md +++ b/docs/CALIBRATION.md @@ -99,6 +99,42 @@ verdict: This avoids broadcasting the ~half-of-catalogue base rate that ClawHub measured. +## LLM semantic pass + +SkillSpector ships a two-stage analyser: fast static rules (the 64 +patterns SkillSpector documents) followed by an optional LLM semantic +pass. Upstream's published precision numbers are: + +- `--no-llm` (static only): high recall, moderate precision (~70%). + False positives on context-sensitive patterns are common; for + example, EA2 ("autonomous decision making") fires on prose that + documents safeguards as well as prose that bypasses them. +- Default (LLM on): ~87% precision. The LLM pass reads each finding's + surrounding context, classifies intent, filters context-aware false + positives, and writes a human-readable explanation that ships in the + per-finding output. + +The scheduled scan runs LLM mode when the workflow's chosen credential +secret (`NVIDIA_INFERENCE_KEY` for the default `nv_build` provider) is +configured. The fallback to `--no-llm` is automatic when the secret is +missing, so an unset secret on a fresh fork degrades the scan rather +than breaking it. + +The LLM pass does not affect the threshold math: SkillSpector's +`risk_score` is still a 0-100 weighted sum of rule hits, and the +51/81 cutoffs above still map directly to `HIGH` and `CRITICAL` bands. +It does affect which findings reach the verdict: false positives that +the LLM filters out no longer contribute to the score. Expect verdicts +to move down (or stay the same) when LLM mode flips on, not up. + +For the five existing in-tree skills, the static-only scan placed +`coder/setup` at 100 / `malicious`. With LLM mode on we expect the +findings list to shrink (the EA2 prose hits and the asset-path MP2 +hits should be filtered) but the score will still be high. Reducing +`coder/setup`'s verdict below `suspicious` requires the upcoming +permissions-manifest layer (Phase 3 of the v3 plan), not the LLM pass +alone. + ## What we did not change (and why) - We did not raise `suspicious_risk_score` above `51`. SkillSpector @@ -127,6 +163,9 @@ Re-run this analysis when any of: that shifts where its bands sit. The pinned commit in `config.yaml` protects us from drifting silently; a deliberate bump should walk through this doc. +- The LLM provider changes (e.g., moving from `nv_build` to + `anthropic`). Different models filter differently; spot-check the + five in-tree skills before merging the provider swap. - We observe a real-world skill that lands in an obviously wrong bucket (false positive or false negative). Open a tracking issue, link it from this doc, and adjust with evidence in the next PR.