Skip to content

Add layered plugin/skill validation tooling across agent ecosystems#219

Merged
ScriptedAlchemy merged 16 commits into
masterfrom
codex/plugin-validation-tooling
Jul 3, 2026
Merged

Add layered plugin/skill validation tooling across agent ecosystems#219
ScriptedAlchemy merged 16 commits into
masterfrom
codex/plugin-validation-tooling

Conversation

@ScriptedAlchemy

@ScriptedAlchemy ScriptedAlchemy commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Summary

Adopts the strongest available validation for the plugin bundles we generate (cursor-plugin/, codex-plugin/) and the skills they ship, layered from cheapest to most end-to-end. Full architecture in the new docs/PLUGIN-VALIDATION.md.

  • Layer 1 — official schema validation (offline, cargo test): vendors Cursor's published JSON Schemas (plugin.schema.json, marketplace.schema.json from cursor/plugins@4a91a6e / 920a87f) plus doc-derived mcp.schema.json / hooks.schema.json (no official standalone schemas exist; derived from cursor.com/docs/context/mcp and cursor.com/docs/hooks, provenance recorded in each schema). Validated via a jsonschema (0.46.8, default-features = false) dev-dependency in tests/agent_suite/plugin_manifest_schema_test.rs and plugin_config_schema_test.rs, with negative cases guarding the derived schemas.
    • Real bug caught immediately: both bundle manifests declared author.url, which the official schema rejects (author allows only name/email). Fixed; the URL still ships via homepage/repository.
  • Layer 2 — skill lint (Cursor): skill_lint_cursor_test.rs ports the useful closed-rule subset of skillmark, skilldoctor, and skillkit into Rust: file hygiene, heading conventions, description quality, and reference integrity (cross-skill tracedecay:<slug> refs, /slash refs, and every tracedecay_* tool mention resolved against the live get_tool_definitions() list).
  • Layer 3 — cross-bundle sync: plugin_bundle_sync_test.rs enforces disk-level parity across all bundles through declarative, self-cleaning policy tables (undeclared divergence fails, and so does a stale exception). Bundle-count agnostic: a future claude-plugin/ joins by adding one row.
  • Layer 4 — rendered-output validation: update_plugin_test.rs now installs into temp homes and validates the rendered bundles: full draft-07 schema validation of the rendered manifest, absolute shell-quoted hook commands, version stamps, source⊆rendered file completeness, and a placeholder sweep where the only ${...} survivor allowed is the intentional mcp.json ${workspaceFolder} arg (pin shared with fix: tolerate literal ${workspaceFolder} in serve --path #206's serve-side fallback).
  • Layer 5 — Claude Code portability: skill_lint_claude_test.rs validates all 65 skills against Claude Code / Agent Skills spec rules (code.claude.com/docs/en/skills, agentskills.io/specification, anthropics/skills quick_validate.py, and the .claude-plugin/ layouts Anthropic ships). Two documented conflict skips (disable-model-invocation, paths — Claude Code supports both; only the strict packaging spec rejects them) with a stale-allowlist guard. A future claude-plugin/ bundle is a re-packaging exercise.
  • Layer 6 — CI: .github/workflows/plugin-validation.yml mirrors the official cursor/plugins ajv workflow (pinned ajv-cli@5.0.0 + ajv-formats@2.1.1) and wires scripts/mcp-conformance-smoke.sh — a hermetic smoke driving tracedecay serve through the pinned MCP Inspector CLI (@modelcontextprotocol/inspector@0.22.0), adding protocol-version negotiation and SDK-side Zod validation the Rust MCP tests can't cover.

All five new test modules were folded into the consolidated agent_suite binary (matching the repo's link-time convention, cf. #211), with shared helpers (SkillDoc loader, schema compile/validate, repo_path, kebab-case rule, tree walk) deduped into tests/common/mod.rs.

Adopted vs rejected

  • Adopted: official Cursor schemas + ajv workflow; MCP Inspector CLI smoke; skillmark/skilldoctor/skillkit rules ported to Rust (offline, no node toolchain in cargo test); Claude/agentskills.io spec rules.
  • Rejected: @modelcontextprotocol/conformance (server mode is streamable-HTTP-only; tracedecay serve is stdio-only — revisit if an HTTP transport lands); running skill-tools/skillmark as npx CI steps (network/toolchain dependency for rules we can enforce natively); skillmark's script-security AST rules (bundles ship zero scripts today), NLP-ish description heuristics, and scoring-only rules (conflict with the deliberately lean skill style).

Manifest-path verdict

.cursor-plugin/plugin.json is the documented location and this repo already conformed (docs: cursor.com/docs/reference/plugins; every official plugin and the working local install use it — the earlier "root plugin.json" claim was an ls missing the dot-directory). No layout change; the layout was already pinned by three existing assertions.

Test plan

  • cargo test --test agent_suite382 passed, 0 failed (repeated runs; includes the 5 new modules + rendered-output tests)
  • cargo test --lib agents:: (135) / --lib hooks:: (23) / --test hooks_lsp_suite (104) — all pass
  • cargo check --all-targets, cargo clippy (0 warnings in repo code), cargo fmt --check — clean
  • actionlint + YAML parse on the workflow — clean; bash -n on the smoke script — clean
  • scripts/mcp-conformance-smoke.sh against a debug build — 7/7 checks pass
  • Fixed a pre-existing flake surfaced by the suite consolidation: test_cursor_healthcheck_warns_on_literal_workspace_folder_transcript_path (from fix: tolerate literal ${workspaceFolder} in serve --path #206) read the user-data-dir env without the suite's env lock; it now pins and serializes like the other TraceDecay::init tests (0 failures across 10+ full-suite runs after the fix).
  • mcp_suite full-parallel runs showed 10 environmental flakes on this shared dev box (dashboard port collisions with concurrently running agents); all pass in isolation and none touch this PR's surface.

Follow-ups (deliberately not in this PR)

  • Promote CODEX_SKILL_*_DIVERGENCES in src/agents/codex.rs to pub(crate) consts consumed by both the unit parity test and the sync test (single source of truth for the divergence allowlists).
  • memorize-subject vs memorizing-subject: near-duplicate explicit-invoke skill names; both are referenced so neither is stale, but the naming deserves a deliberate look.
  • If a third ecosystem bundle lands, consider generating bundles from cursor-plugin/ as canonical source (cargo xtask sync-bundles); the sync test's policy tables are the generator's spec, and today's 2-bundle/2-divergence reality doesn't justify it yet.
  • The plugin-validation workflow is path-filtered, so it reports as skipped on unrelated PRs — account for that before adding it to required checks.

Merge interplay

#206 and #210 are merged and this branch is up to date with master (the #206 rendered-args pin was reconciled into the shared rendered-bundle validator). #212 (host-integration-parity) is still open and touches tests/agent_suite/main.rs + agent tests — expect a trivial mod-list merge conflict in main.rs for whichever lands second; the schema tests will re-validate any bundle content it changes (including re-flagging author.url if it gets re-added).

@changeset-bot

changeset-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: fbfd9ec

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@ScriptedAlchemy ScriptedAlchemy changed the title [codex] Add plugin validation tooling Add layered plugin/skill validation tooling across agent ecosystems Jul 2, 2026
@ScriptedAlchemy ScriptedAlchemy marked this pull request as ready for review July 2, 2026 09:08

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17b565b78b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/plugin-validation.yml
ScriptedAlchemy added 14 commits July 2, 2026 10:27
Vendor Cursor's official plugin/marketplace JSON schemas (cursor/plugins
@4a91a6e) plus doc-derived mcp.json/hooks.json schemas, and validate both
source bundles offline via a jsonschema dev-dependency. Drops the
schema-invalid author.url key the validation immediately caught.
Prove rendered Cursor/Codex installs are structurally sound: absolute
quoted hook commands, version-stamped manifests, no surviving template
placeholders except the intentional mcp.json workspaceFolder pin, and
no source-bundle file silently dropped.
Port the useful closed-rule subset of skillmark/skilldoctor/skillkit
(hygiene, headings, reference integrity, description quality) and the
Claude Code / Agent Skills spec portability rules so every bundled
skill is proven valid for each ecosystem it targets, offline in cargo.
Declarative bundle-count-agnostic sync policy: every top-level bundle
entry and every skill is byte-synced across cursor-plugin/ and
codex-plugin/ or covered by a documented, self-cleaning exception.
ajv schema job mirroring cursor/plugins' official validate workflow
(pinned deps), plus a hermetic MCP Inspector CLI smoke script driving
tracedecay serve through the official TypeScript SDK client.
New docs/PLUGIN-VALIDATION.md mapping each validation layer to its
tests/schemas, plus a CONTRIBUTING section on validating plugins.
@ScriptedAlchemy ScriptedAlchemy force-pushed the codex/plugin-validation-tooling branch from d3055e3 to 00961eb Compare July 2, 2026 10:30
ScriptedAlchemy and others added 2 commits July 2, 2026 16:37
@ScriptedAlchemy ScriptedAlchemy merged commit 2706da2 into master Jul 3, 2026
16 checks passed
@ScriptedAlchemy ScriptedAlchemy deleted the codex/plugin-validation-tooling branch July 3, 2026 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant