Skip to content

feat(review): custom reviews as Agent Skills + whole-file/general findings#955

Merged
backnotprop merged 12 commits into
mainfrom
feat/review-skills
Jun 24, 2026
Merged

feat(review): custom reviews as Agent Skills + whole-file/general findings#955
backnotprop merged 12 commits into
mainfrom
feat/review-skills

Conversation

@backnotprop

@backnotprop backnotprop commented Jun 23, 2026

Copy link
Copy Markdown
Owner

What

Two related changes that make code review configurable and lossless.

1. Custom reviews as Agent Skills. Pick a skill and that skill becomes the review. Plannotator reads the skill's SKILL.md body live from your global skill folders (~/.claude/skills, ~/.codex/skills, ~/.agents/skills, XDG) and uses it as the entire review prompt — the default review prompt is dropped for custom reviews, and the user message is trimmed to just the git/PR context. Nothing is copied; skills that carry references//scripts//assets/ are read on demand from where they already live (the agent is pointed at the real folder). No frontmatter parsing: name = folder name, instructions = body with the leading ---…--- block stripped. The built-in default review stays byte-for-byte unchanged.

2. Agent findings can be a line, a whole file, or general — and are never silently dropped. A finding that couldn't be pinned to a diff line used to be filtered out with no trace; now it becomes a whole-file or review-level (general) comment instead. The verdict count is recomputed from what actually rendered, so the card can't claim more than it shows.

UI: a Review picker in the launch panel (Default, your enabled reviews, "+ add a review") and a "General" section in the review sidebar.

Design / invariants

  • Skill becomes the prompt for custom reviews; default review byte-identical. A short output-contract reminder is appended to custom prompts so a verdict-style skill returns discrete, line-attached findings plus a general summary, instead of one block.
  • Read live, never copied, never writes your repo. Edit a skill the normal way and the next review uses it.
  • Curation: ~/.plannotator/review-skills.json ({ version: 1, enabled: [names] }). Global skills only (fork-trust boundary).
  • Placement inferred from what a finding carries: file + line → line, file only → whole-file, neither → general. Nothing is dropped.
  • Output schemas are relaxed the strict-safe way (nullable but still required), so codex exec --output-schema and Claude's --json-schema both accept them. A test asserts the strict invariant on both.
  • Selection is authoritative: a picked review that can't be resolved fails loud instead of quietly running Default; launch is held until the review list has loaded.
  • Bun/Pi parity via vendor.sh (one shared resolver/composer, vendored to the Pi extension).

Verification

Typecheck clean; 1686 tests pass / 0 fail — covering the placement classifier, the scope-aware validator, transform routing, prompt composition, the strict-schema invariant, and launch-time profile resolution.

Docs

docs/custom-reviews.md — user-facing behavior and migration from the old JSON-profile design (supersedes #913).

@backnotprop backnotprop force-pushed the feat/review-skills branch 4 times, most recently from 6333ec7 to 4a00af5 Compare June 23, 2026 03:54
A custom review is an Agent Skill, read live from the user's global skill folders
and injected as the review's focus. Skills with reference/script files are pointed
at where they live, never copied. Backend plus a launch-panel picker.

- Loader: discover global skills, strip the SKILL.md body (no frontmatter parse),
  curate via review-skills.json, map to the existing ResolvedReviewProfile.
- Skills with extra files: point the agent at the real skill dir (no copy).
- Endpoints: GET /api/agents/review-profiles (enabled), GET /api/agents/skills
  (all, for the picker), POST /api/agents/review-skills (enable one).
- UI: a Review dropdown in the launch panel (Default, enabled reviews, separator,
  "Add new review") and a type-ahead dialog over every discovered skill.
- Replaces the old ~/.plannotator/reviews JSON-profile design. Default review
  stays byte-identical. Bun and Pi in parity.
When a user picks a custom review skill, its SKILL.md body now fully
replaces the default review system prompt instead of being appended as a
trailing section, so the picked review actually drives the review. The
user message is stripped to git/PR context only. The built-in default
review stays byte-for-byte unchanged, and an empty skill body now fails
loud instead of silently running the default.
…ently drop a finding

A review finding can now be placed three ways — a line, a whole file, or
general (review-level). Findings that can't be pinned to a line become
whole-file or general comments instead of being filtered out, and the
verdict count is recomputed from what actually rendered so the card can't
claim more than it shows.

Server: relax both agent output schemas so location is optional; route
every finding by what it carries via classifyFindingPlacement (no drop);
recompute the Claude verdict from rendered findings (mirrored in Pi). UI:
a General section in the review sidebar, scope-aware card badges and copy
text, and general/file comments carried in the agent feedback and PR body.
Validator is scope-aware and stays strict on line findings.
…equired

OpenAI strict structured output (codex exec --output-schema) requires every
property of an object with additionalProperties:false to appear in `required`;
a field is made optional by being nullable, not by being dropped from required.
ADR 004 removed code_location/line_range (Codex) from required, which made
codex exec fail with a 400 invalid_json_schema before the review ran. Restore
them to required and mark them nullable; apply the same nullable pattern to the
Claude schema for parity. Add a strict-mode invariant test over both schemas.
The job detail panel rendered every finding as a line finding — general
comments showed an empty path with 'L0' and copied ':0', file comments
showed a fake line 0. Make AnnotationRow scope-aware (general badge, file
path without a line, line as before) and don't open a diff file for a
general comment. Extract the shared copy-prefix helper so the sidebar and
the panel can't drift.
A custom review skill carries its own methodology but doesn't know
Plannotator's output contract, so a verdict-style skill (e.g. interrogate)
collapses good, line-locatable findings into one block. Append a short
output-contract reminder to custom review prompts: one finding per issue,
attach to file+line when about specific code, whole-file or general
otherwise, and emit any final verdict as its own general finding in
addition to the specific ones. Covers only how to report, never what to
look for, and is appended only for custom skills — the default review
already states its own contract and stays byte-identical.
…t Default

Two ways a picked review could silently become Default:

1. Server: an unresolvable non-default reviewProfileId (renamed/removed
   skill, stale cookie, malformed request) fell through to the built-in
   default. Resolution now lives in one shared resolveRequestedReviewProfile()
   used by both Bun and Pi, which throws a clear error instead of downgrading.
   The previous F3 guard (skill found but unreadable/empty/oversized) is
   preserved as one of its branches.

2. Client: a saved custom pick was coerced to Default during the window
   before /api/agents/review-profiles loaded, so a launch in that instant ran
   Default. Hold launch for a custom pick until the profile list has loaded; a
   Default pick never waits.

Adds loader tests: unknown id throws, empty-body skill throws, the reserved
default id and absent id resolve to default.
@backnotprop backnotprop changed the title feat(review): custom reviews as Agent Skills (backend) feat(review): custom reviews as Agent Skills + whole-file/general findings Jun 23, 2026
Two code-quality fixes from PR review:
- launchJob's inner params type was missing reviewProfileId (the public
  UseAgentJobsReturn interface and callers already had it). Add it so the
  type matches the contract.
- reviewProfileLabel was threaded server→client but never rendered, so it
  was dead on the wire and its comment ('the UI can show a profile tag')
  overstated reality. Add the field to CodeAnnotation and render it as a
  small tag on agent findings in the sidebar and job detail panel, so a
  custom review's findings show which review produced them.
Replace the Claude/Codex engine dropdown in the launcher with a small
two-icon toggle — tap an agent's brand mark to select it. Icons are
inlined as components (new AgentIcons.tsx) so the single-file review
bundle keeps no external asset dependency: Claude as a vector path, Codex
as a base64 data-URI (it only ships as a PNG in the repo). Falls back to a
single static icon+label when only one engine is available.
The open-in-app control in the code-review file header showed the app
label inline when the header was wide. Drop showLabel so it renders
icon-only like the plan/annotate side (DocBadges). The picked app's name
still shows in the dropdown.
PR #942 moved the semantic diff from a dockview panel into a resizable
sidebar accordion. Put it back as its own dock tab: recover the deleted
ReviewSemanticDiffPanel verbatim from before #942, re-register it in the
dock component/type maps, restore the App.tsx wiring (panel-open + active
tracking, code-nav reference panel, hide-headers, auto-fallback, and the
advert handler) and the ReviewStateContext callbacks the panel needs, and
re-add the 'Semantic diff' button in the file tree. Remove the sidebar
accordion. Shared pieces (semanticDiffShared, /api/semantic-diff, the data
hooks) are reused unchanged; #942's other changes are untouched.
Sweep of code this PR orphaned: the .semantic-diff-accordion CSS (the
accordion is gone), loadSemanticDiff's now-unnecessary export (the deleted
accordion was its only external caller), the unused React import in the
generated AgentIcons, and the ResolvedReviewProfile type import in review.ts
and the Pi serverReview.ts mirror (orphaned when the inline type annotation
gave way to resolveRequestedReviewProfile's inferred return).
@backnotprop backnotprop merged commit affeaa0 into main Jun 24, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant