Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 30 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,10 @@ Every 6 hours, the scheduled workflow in this repo:
1. Enumerates every skill in `coder/registry` (both the in-tree
`.agents/skills/` format and the future external-sources format).
2. Shallow-clones each source repo.
3. Runs [NVIDIA SkillSpector](https://github.com/NVIDIA/SkillSpector) in
`--no-llm` static mode over the upstream content.
3. Runs [NVIDIA SkillSpector](https://github.com/NVIDIA/SkillSpector) over
the upstream content. The scheduled scan uses LLM semantic analysis
when the credential secret is configured, and falls back to
`--no-llm` static-only mode otherwise.
Comment on lines +11 to +14
4. Builds a per-skill verdict (`clean`, `suspicious`, `malicious`,
`unknown`) from `risk_score` plus the thresholds in `config.yaml`.
5. Builds the React SPA in `site/` and ships it together with
Expand Down Expand Up @@ -60,6 +62,26 @@ Vite's dev proxy (see `site/vite.config.ts`) forwards `latest.json`,
app sees real scanner output without CORS shenanigans. SPA routes such
as `/skills/coder/setup` stay client-side.

## One-time setup on the repo

Three things have to be configured once on the GitHub repo before the
scheduled scan publishes a useful result:

1. **Settings > Pages**: set source to "GitHub Actions". The
`publish-pages` job in `scan.yaml` will fail until this is set.
2. **Settings > Actions**: workflow permissions "Read and write" so
`publish-release` can create the rolling `latest` release.
3. **Settings > Secrets and variables > Actions**: add the LLM
credential matching the provider in `config.yaml`'s
`scanners.skillspector.llm.provider`. For the default `anthropic`
provider this is `ANTHROPIC_API_KEY` (from
[console.anthropic.com](https://console.anthropic.com)). Without
the secret the scan still runs, but SkillSpector falls back to
`--no-llm` static-only mode and precision drops from roughly 87%
to roughly 70%. See `docs/CALIBRATION.md` for the precision
discussion. The optional `SLACK_WEBHOOK_URL` secret enables the
`notify-slack-on-failure` job; without it that job is a no-op.

## Repo layout

```text
Expand Down Expand Up @@ -97,7 +119,10 @@ This scanner is data-driven. To run it against a different registry:
"GitHub Actions").
4. Set Actions workflow permissions to "Read and write" so the
publish-release job can create releases.
5. Enable Actions.
5. Add the LLM credential secret matching your chosen provider
(see "One-time setup on the repo" above). Optional; static-only
mode works without it.
6. Enable Actions.
Comment on lines +122 to +125

No source changes required for catalogue changes.

Expand All @@ -115,7 +140,8 @@ SkillSpector's `risk_score` (0-100) is the only input. The thresholds
are aligned to SkillSpector's own `HIGH` and `CRITICAL` bands;
[`docs/CALIBRATION.md`](./docs/CALIBRATION.md) walks through the
evidence (SkillSpector source, the ClawHub paper, our in-tree
catalogue) behind the chosen numbers.
catalogue) behind the chosen numbers, and the LLM-on-vs-off precision
discussion behind running the semantic pass on every scheduled scan.

The architecture keeps room for additional scanners (gitleaks, Semgrep,
VirusTotal Premium, etc.); adding one is a new module under `scanner/`,
Expand Down
39 changes: 37 additions & 2 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,43 @@ scanners:
# so a bumper bot lives outside the loop until the upstream
# publishes to PyPI and the pin can move into pyproject.toml.
pin: "skillspector @ git+https://github.com/NVIDIA/SkillSpector.git@2eb844780ab163f01468ecf142c40a2ec0fcaec0"
flags:
- "--no-llm"
# Extra CLI flags passed to every SkillSpector invocation. Empty by
# default; the scan workflow appends --no-llm dynamically when the
# LLM credential secret is not set (see llm: block below). CI runs
# do not invoke SkillSpector live.
flags: []
Comment on lines +42 to +46
# SkillSpector ships a two-stage analyser: fast static rules followed
# by an optional LLM semantic pass. The LLM pass lifts precision
# from roughly 70% to roughly 87% per upstream docs by filtering
# context-aware false positives, classifying intent on prompt
# injection patterns, and producing human-readable explanations.
#
# The scheduled scan reads the credential matching the provider
# below from a repository secret. When the secret is configured,
# LLM mode is on. When the secret is missing, the workflow falls
# back to --no-llm automatically so a fresh fork is never broken
# by an unset secret.
Comment on lines +53 to +57
#
# Provider options and the env var SkillSpector consumes:
#
# provider env var(s)
# anthropic ANTHROPIC_API_KEY (api.anthropic.com)
# anthropic_proxy ANTHROPIC_PROXY_API_KEY + ANTHROPIC_PROXY_ENDPOINT_URL
# openai OPENAI_API_KEY (+ OPENAI_BASE_URL for AI gateways)
# nv_build NVIDIA_INFERENCE_KEY (free; build.nvidia.com)
#
# Changing provider also requires updating the env block in
# .github/workflows/scan.yaml so the matching secret is wired in,
# and adding the secret under Settings > Secrets and variables >
# Actions.
llm:
provider: anthropic
# SkillSpector's bundled default for the anthropic provider is
# claude-opus-4-6. Sonnet 4.6 is roughly 5x cheaper than Opus and
# is well-suited for the finding-classification task the LLM pass
# actually does, so it is the better cost/quality choice for
# periodic scanning. Override here to pin a different revision.
model: "claude-sonnet-4-6"

# Per-skill verdict policy. v1 has one input (SkillSpector risk_score).
# When more scanners join the pipeline we add new threshold fields here
Expand Down
39 changes: 39 additions & 0 deletions docs/CALIBRATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,42 @@ verdict:
This avoids broadcasting the ~half-of-catalogue base rate that
ClawHub measured.

## LLM semantic pass

SkillSpector ships a two-stage analyser: fast static rules (the 64
patterns SkillSpector documents) followed by an optional LLM semantic
pass. Upstream's published precision numbers are:

- `--no-llm` (static only): high recall, moderate precision (~70%).
False positives on context-sensitive patterns are common; for
example, EA2 ("autonomous decision making") fires on prose that
documents safeguards as well as prose that bypasses them.
- Default (LLM on): ~87% precision. The LLM pass reads each finding's
surrounding context, classifies intent, filters context-aware false
positives, and writes a human-readable explanation that ships in the
per-finding output.

The scheduled scan runs LLM mode when the workflow's chosen credential
secret (`NVIDIA_INFERENCE_KEY` for the default `nv_build` provider) is
configured. The fallback to `--no-llm` is automatic when the secret is
missing, so an unset secret on a fresh fork degrades the scan rather
than breaking it.
Comment on lines +117 to +121

The LLM pass does not affect the threshold math: SkillSpector's
`risk_score` is still a 0-100 weighted sum of rule hits, and the
51/81 cutoffs above still map directly to `HIGH` and `CRITICAL` bands.
It does affect which findings reach the verdict: false positives that
the LLM filters out no longer contribute to the score. Expect verdicts
to move down (or stay the same) when LLM mode flips on, not up.

For the five existing in-tree skills, the static-only scan placed
`coder/setup` at 100 / `malicious`. With LLM mode on we expect the
findings list to shrink (the EA2 prose hits and the asset-path MP2
hits should be filtered) but the score will still be high. Reducing
`coder/setup`'s verdict below `suspicious` requires the upcoming
permissions-manifest layer (Phase 3 of the v3 plan), not the LLM pass
alone.

## What we did not change (and why)

- We did not raise `suspicious_risk_score` above `51`. SkillSpector
Expand Down Expand Up @@ -127,6 +163,9 @@ Re-run this analysis when any of:
that shifts where its bands sit. The pinned commit in `config.yaml`
protects us from drifting silently; a deliberate bump should walk
through this doc.
- The LLM provider changes (e.g., moving from `nv_build` to
`anthropic`). Different models filter differently; spot-check the
five in-tree skills before merging the provider swap.
- We observe a real-world skill that lands in an obviously wrong
bucket (false positive or false negative). Open a tracking issue,
link it from this doc, and adjust with evidence in the next PR.
Loading