feat(review): custom reviews as Agent Skills + whole-file/general findings#955
Merged
Conversation
6333ec7 to
4a00af5
Compare
A custom review is an Agent Skill, read live from the user's global skill folders and injected as the review's focus. Skills with reference/script files are pointed at where they live, never copied. Backend plus a launch-panel picker. - Loader: discover global skills, strip the SKILL.md body (no frontmatter parse), curate via review-skills.json, map to the existing ResolvedReviewProfile. - Skills with extra files: point the agent at the real skill dir (no copy). - Endpoints: GET /api/agents/review-profiles (enabled), GET /api/agents/skills (all, for the picker), POST /api/agents/review-skills (enable one). - UI: a Review dropdown in the launch panel (Default, enabled reviews, separator, "Add new review") and a type-ahead dialog over every discovered skill. - Replaces the old ~/.plannotator/reviews JSON-profile design. Default review stays byte-identical. Bun and Pi in parity.
4a00af5 to
eef2c7c
Compare
When a user picks a custom review skill, its SKILL.md body now fully replaces the default review system prompt instead of being appended as a trailing section, so the picked review actually drives the review. The user message is stripped to git/PR context only. The built-in default review stays byte-for-byte unchanged, and an empty skill body now fails loud instead of silently running the default.
…ently drop a finding A review finding can now be placed three ways — a line, a whole file, or general (review-level). Findings that can't be pinned to a line become whole-file or general comments instead of being filtered out, and the verdict count is recomputed from what actually rendered so the card can't claim more than it shows. Server: relax both agent output schemas so location is optional; route every finding by what it carries via classifyFindingPlacement (no drop); recompute the Claude verdict from rendered findings (mirrored in Pi). UI: a General section in the review sidebar, scope-aware card badges and copy text, and general/file comments carried in the agent feedback and PR body. Validator is scope-aware and stays strict on line findings.
…equired OpenAI strict structured output (codex exec --output-schema) requires every property of an object with additionalProperties:false to appear in `required`; a field is made optional by being nullable, not by being dropped from required. ADR 004 removed code_location/line_range (Codex) from required, which made codex exec fail with a 400 invalid_json_schema before the review ran. Restore them to required and mark them nullable; apply the same nullable pattern to the Claude schema for parity. Add a strict-mode invariant test over both schemas.
The job detail panel rendered every finding as a line finding — general comments showed an empty path with 'L0' and copied ':0', file comments showed a fake line 0. Make AnnotationRow scope-aware (general badge, file path without a line, line as before) and don't open a diff file for a general comment. Extract the shared copy-prefix helper so the sidebar and the panel can't drift.
A custom review skill carries its own methodology but doesn't know Plannotator's output contract, so a verdict-style skill (e.g. interrogate) collapses good, line-locatable findings into one block. Append a short output-contract reminder to custom review prompts: one finding per issue, attach to file+line when about specific code, whole-file or general otherwise, and emit any final verdict as its own general finding in addition to the specific ones. Covers only how to report, never what to look for, and is appended only for custom skills — the default review already states its own contract and stays byte-identical.
…t Default Two ways a picked review could silently become Default: 1. Server: an unresolvable non-default reviewProfileId (renamed/removed skill, stale cookie, malformed request) fell through to the built-in default. Resolution now lives in one shared resolveRequestedReviewProfile() used by both Bun and Pi, which throws a clear error instead of downgrading. The previous F3 guard (skill found but unreadable/empty/oversized) is preserved as one of its branches. 2. Client: a saved custom pick was coerced to Default during the window before /api/agents/review-profiles loaded, so a launch in that instant ran Default. Hold launch for a custom pick until the profile list has loaded; a Default pick never waits. Adds loader tests: unknown id throws, empty-body skill throws, the reserved default id and absent id resolve to default.
Two code-quality fixes from PR review:
- launchJob's inner params type was missing reviewProfileId (the public
UseAgentJobsReturn interface and callers already had it). Add it so the
type matches the contract.
- reviewProfileLabel was threaded server→client but never rendered, so it
was dead on the wire and its comment ('the UI can show a profile tag')
overstated reality. Add the field to CodeAnnotation and render it as a
small tag on agent findings in the sidebar and job detail panel, so a
custom review's findings show which review produced them.
Replace the Claude/Codex engine dropdown in the launcher with a small two-icon toggle — tap an agent's brand mark to select it. Icons are inlined as components (new AgentIcons.tsx) so the single-file review bundle keeps no external asset dependency: Claude as a vector path, Codex as a base64 data-URI (it only ships as a PNG in the repo). Falls back to a single static icon+label when only one engine is available.
The open-in-app control in the code-review file header showed the app label inline when the header was wide. Drop showLabel so it renders icon-only like the plan/annotate side (DocBadges). The picked app's name still shows in the dropdown.
PR #942 moved the semantic diff from a dockview panel into a resizable sidebar accordion. Put it back as its own dock tab: recover the deleted ReviewSemanticDiffPanel verbatim from before #942, re-register it in the dock component/type maps, restore the App.tsx wiring (panel-open + active tracking, code-nav reference panel, hide-headers, auto-fallback, and the advert handler) and the ReviewStateContext callbacks the panel needs, and re-add the 'Semantic diff' button in the file tree. Remove the sidebar accordion. Shared pieces (semanticDiffShared, /api/semantic-diff, the data hooks) are reused unchanged; #942's other changes are untouched.
Sweep of code this PR orphaned: the .semantic-diff-accordion CSS (the accordion is gone), loadSemanticDiff's now-unnecessary export (the deleted accordion was its only external caller), the unused React import in the generated AgentIcons, and the ResolvedReviewProfile type import in review.ts and the Pi serverReview.ts mirror (orphaned when the inline type annotation gave way to resolveRequestedReviewProfile's inferred return).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Two related changes that make code review configurable and lossless.
1. Custom reviews as Agent Skills. Pick a skill and that skill becomes the review. Plannotator reads the skill's
SKILL.mdbody live from your global skill folders (~/.claude/skills,~/.codex/skills,~/.agents/skills, XDG) and uses it as the entire review prompt — the default review prompt is dropped for custom reviews, and the user message is trimmed to just the git/PR context. Nothing is copied; skills that carryreferences//scripts//assets/are read on demand from where they already live (the agent is pointed at the real folder). No frontmatter parsing: name = folder name, instructions = body with the leading---…---block stripped. The built-in default review stays byte-for-byte unchanged.2. Agent findings can be a line, a whole file, or general — and are never silently dropped. A finding that couldn't be pinned to a diff line used to be filtered out with no trace; now it becomes a whole-file or review-level (general) comment instead. The verdict count is recomputed from what actually rendered, so the card can't claim more than it shows.
UI: a Review picker in the launch panel (Default, your enabled reviews, "+ add a review") and a "General" section in the review sidebar.
Design / invariants
~/.plannotator/review-skills.json({ version: 1, enabled: [names] }). Global skills only (fork-trust boundary).required), socodex exec --output-schemaand Claude's--json-schemaboth accept them. A test asserts the strict invariant on both.vendor.sh(one shared resolver/composer, vendored to the Pi extension).Verification
Typecheck clean; 1686 tests pass / 0 fail — covering the placement classifier, the scope-aware validator, transform routing, prompt composition, the strict-schema invariant, and launch-time profile resolution.
Docs
docs/custom-reviews.md— user-facing behavior and migration from the old JSON-profile design (supersedes #913).