From 098596170febf12c812f1c7a99b83b2f0d22969a Mon Sep 17 00:00:00 2001 From: Mikhail Filimonov Date: Mon, 4 May 2026 18:32:35 +0200 Subject: [PATCH] Add ubrella-clickhose-review and codebase-consistency-reviewer skills Co-Authored-By: Claude Opus 4.7 (1M context) --- .../codebase-consistency-reviewer/SKILL.md | 131 +++++ .../references/lookup-worker-instructions.md | 91 ++++ .../references/output-template.md | 118 +++++ .../references/review-rubric.md | 126 +++++ .../references/search-strategy.md | 118 +++++ .../skills/ubrella-clickhose-review/SKILL.md | 470 ++++++++++++++++++ 6 files changed, 1054 insertions(+) create mode 100644 .claude/skills/codebase-consistency-reviewer/SKILL.md create mode 100644 .claude/skills/codebase-consistency-reviewer/references/lookup-worker-instructions.md create mode 100644 .claude/skills/codebase-consistency-reviewer/references/output-template.md create mode 100644 .claude/skills/codebase-consistency-reviewer/references/review-rubric.md create mode 100644 .claude/skills/codebase-consistency-reviewer/references/search-strategy.md create mode 100644 .claude/skills/ubrella-clickhose-review/SKILL.md diff --git a/.claude/skills/codebase-consistency-reviewer/SKILL.md b/.claude/skills/codebase-consistency-reviewer/SKILL.md new file mode 100644 index 000000000000..2241ee6ef80b --- /dev/null +++ b/.claude/skills/codebase-consistency-reviewer/SKILL.md @@ -0,0 +1,131 @@ +--- +name: codebase-consistency-reviewer +description: Use when asked to review a PR, commit, commit range, branch, patch, or pasted diff for duplicated functionality, reinvented wheels, not-invented-here patterns, parallel abstractions, inconsistent naming, inconsistent settings/APIs/schemas/metrics/errors/logs, redundant config keys, non-standard implementations, or places where new code should reuse, generalize, extend, or align with existing codebase patterns and user-facing conventions. Triggers on phrases like "review for duplication", "consistency review", "is this already implemented", "does this match our conventions", "reinventing the wheel". +--- + +# Codebase Consistency Reviewer + +## Purpose + +This is **not** a correctness or bug review. The lens is: + +- Is this functionality already implemented somewhere else, possibly under another name? +- Is there an established helper, abstraction, API, setting, schema, or naming pattern to reuse? +- Does the new code follow the standard mechanism this codebase uses for this kind of job? +- Do new user-facing names/settings/fields/metrics/errors/logs match existing conventions? +- Should the change generalize an existing implementation instead of paralleling it? +- If direct reuse isn't appropriate, should it borrow concepts from related code? + +Report **actionable consistency risks**. Skip weak similarities. + +## Inputs + +Accept any of: PR number, PR URL, commit SHA, commit range, branch name, local patch/diff text, pasted changed files, repo path or URL. Ask for missing input only when the review cannot proceed. + +## Workflow + +### 1. Intake & capability detection + +Detect what's available before planning: +- Local repo? (`git`, file reads, `rg`/`grep`) +- GitHub access? (`gh` CLI, GitHub MCP, web fetch) +- Subagents available for parallel lookup? +- Single language/framework or many? + +If repo access exists, retrieve: the diff, changed files with surrounding context, relevant imports, touched public/semi-public APIs, touched settings/config schemas, touched CLI flags / env vars / table schemas / protocol fields / user-facing names, related tests, related docs, **and the PR description, linked issues, and commit messages** (the author may have justified deliberate divergence — weight stated intent before flagging). + +If repo access is unavailable, work from the supplied material and state the limitation in Scope. + +The main reviewer infers language/framework/conventions from the changed files. Do not assume — let the codebase tell you what "standard mechanism" means here. + +### 2. Identify review targets + +Extract **concept-bearing and convention-bearing** changes — not every changed line. + +**Cluster** related changes into one target when they cooperate to do one job (a lock spread across methods + a thread + a state field is one target, not four). + +**Skip trivial changes**: getters/setters, one-line delegations, plumbing, parameter forwarding, mechanical refactors, formatting, trivial renames, bug fixes that introduce no new concept, local helpers too small to plausibly have a canonical version. + +**Prioritize** targets that: +- create or change a contract surface (API, setting, flag, schema field, table column, metric, log field, error, persisted/wire format, documented behavior, user-visible name) +- implement a recognizable responsibility (lifecycle, retry, timeout, caching, scheduling, locking, validation, parsing, formatting, lookup, conversion, matching, resource management) +- introduce an abstraction, helper, registry entry, extension point, or composition pattern others may copy +- look like they bypass or recreate a standard codebase mechanism +- overlap semantically with concepts that may already exist under another name +- establish a name/API/setting/test/operational convention that should match precedent + +Pay special attention to "small" changes that encode durable conventions: a new column, config key, enum value, metric name, log attribute, error code, or any name users/operators/developers/downstream systems will see. + +For each selected target, separate: +- the **semantic concept** introduced +- the **literal identifiers** in the new code +- **likely existing names/synonyms** elsewhere +- the **contract surface** involved, if any +- **ownership/layering** boundaries that may affect reuse +- **what kind of precedent** would help: direct reuse / reusable abstraction / naming-API precedent / setting-config precedent / schema precedent / standard-mechanism precedent / related design precedent + +Filter aggressively on triviality, then investigate every remaining target. Do **not** artificially cap target count. + +See `references/review-rubric.md` for target-selection principles, clustering rule, triviality filter, severity definitions, finding categories, and false-positive guidance. + +### 3. Build search briefs + +For each target, create a concise brief. The **target is the concept**, not the names; new code may use the wrong name. Search both the new identifiers and the underlying concept. + +Brief contents and concrete query-generation tactics (synonyms, lifecycle pairs, broader/narrower terms, domain vocabulary scraped from nearby files): see `references/search-strategy.md`. + +### 4. Delegate lookup investigations to subagents + +For each non-trivial target, dispatch a **separate** subagent. Run them **in parallel** when subagent tooling supports it; otherwise sequentially (note this in the Search Appendix). + +Each subagent receives only: the target's search brief, the relevant diff excerpt, and enough surrounding context. They search for evidence and **return references and evidence, not final recommendations**. + +Subagent instructions and the compact report format: see `references/lookup-worker-instructions.md`. + +### 5. Search strategy (no vector search assumed) + +"Semantic search" here means: drive **lexical** tools (`rg`/`grep`) with semantically-generated queries, then **read candidates and judge semantic match by reasoning over the code**. Names are weak signals — open the file before believing the match. + +Search local-first (same file → adjacent files → module → tests → shared utilities → global). Use tests and docs as precedent sources (they reveal canonical constructors, fixtures, expected errors, user-facing terminology). Look explicitly for **bypassed standard mechanisms** (registries, factories, config schemas, error classes, logging/metrics helpers, lifecycle hooks, DI, test utilities, naming/schema conventions). + +Full strategy: `references/search-strategy.md`. + +### 6. Validate findings (do not trust subagents blindly) + +For each candidate finding, the main reviewer compares new code and precedent on: semantic purpose, behavior, inputs/outputs, side effects, lifecycle, error semantics, performance/concurrency constraints, ownership, layering, dependency direction, public-API implications, compatibility, test coverage, and whether the precedent is current vs. legacy/deprecated/special-purpose. Check whether the PR description already justifies divergence. + +Promote a target to a finding only when there's a **concrete, actionable recommendation or a meaningful consistency risk**. Park inconclusive or non-actionable matches in "Checked but not flagged". + +Do not: +- recommend reuse that creates bad coupling or violates ownership/layering +- flag superficial name similarity as duplication +- ignore naming/setting/schema/API consistency just because implementations differ +- recommend generalization for its own sake +- penalize deliberate divergence the PR description already explains + +### 7. Produce the report + +Use the structure in `references/output-template.md`: + +- **Scope** — inputs, changed files, capabilities used, parallel vs sequential, language/framework, limitations +- **Executive Summary** — most important findings with counts by category +- **Findings** — each with Severity, Category, Changed code refs, Existing precedent refs, Analysis, Recommendation, optional patch sketch, Confidence +- **Checked But Not Flagged** — significant targets investigated without raising a finding +- **Search Appendix** — targets, subagents (parallel or sequential), key anchors/query expansions, files/tests/docs inspected, uncertainty, limitations + +Sections may be omitted when empty. Small reviews may collapse the Search Appendix to one paragraph. + +## Style + +- Practical, specific, codebase-aware. Concrete recommendations over abstract advice. +- Don't overclaim. "Conceptually similar" stays "conceptually similar." +- Local conventions govern local code; global conventions govern shared/public surfaces. +- User-facing knobs (settings, names, APIs, schema fields, metrics, logs, errors, persisted formats) get extra attention — those inconsistencies are expensive to fix later. +- Keep the final report focused on **actionable** risks. Don't dump every weak similarity from search. + +## References + +- `references/review-rubric.md` — target selection, clustering, triviality, severity, categories, FP guidance +- `references/search-strategy.md` — briefs, query generation, lexical-driven semantic search, layering, anchors +- `references/lookup-worker-instructions.md` — subagent dispatch and report format +- `references/output-template.md` — final report structure and finding format diff --git a/.claude/skills/codebase-consistency-reviewer/references/lookup-worker-instructions.md b/.claude/skills/codebase-consistency-reviewer/references/lookup-worker-instructions.md new file mode 100644 index 000000000000..1150ddd675c4 --- /dev/null +++ b/.claude/skills/codebase-consistency-reviewer/references/lookup-worker-instructions.md @@ -0,0 +1,91 @@ +# Lookup Worker (Subagent) Instructions + +The main reviewer dispatches one **lookup subagent per important review target** (or per cluster). Subagents are evidence gatherers, not deciders. + +## Dispatch Rules + +- **One subagent per target.** Keep investigations independent — do not give a subagent multiple unrelated targets. +- **Run in parallel** when the available subagent tooling supports concurrent agents. If parallelism is unavailable, run sequentially and note this in the Search Appendix. +- Use **cheaper / faster** subagents (e.g., a general-purpose or Explore-style search agent) for breadth. +- Reserve the main reviewer for target selection, clustering, evidence validation, and final recommendations. + +## What Each Subagent Receives + +Only what is needed for the investigation: + +- the **target search brief** +- the relevant **diff excerpt** for the target +- enough **surrounding context** (imports, related symbols, neighboring files) to understand the target +- explicit instructions to **search for existing precedent** +- explicit instructions to **return references and evidence, not final recommendations** +- a pointer to `references/search-strategy.md` for how to search + +Do **not** pass other targets, the full PR, the full report scaffolding, or unrelated guidance. + +## What Subagents Should Search For + +- Exact or near-duplicate implementations of the target's responsibility. +- Existing helpers, abstractions, classes, registries, factories, or framework mechanisms that already do this job. +- Similar user-facing APIs, settings, flags, fields, columns, metrics, log keys, errors, or schema elements. +- Naming precedents (sibling settings, sibling columns, sibling metrics, sibling errors). +- Tests and docs that pin down canonical shapes or terminology. +- Patterns used by neighboring subsystems for the same kind of job. +- Semantically similar implementations under different names (use synonym/lifecycle/broader-narrower expansion). +- Standard mechanisms the new code may be **bypassing** or **recreating**. + +The strategy (lexical-driven semantic search, layered local-first searching, query expansion, anchor handling, candidate reading) is described in `search-strategy.md`. Follow it. + +## What Subagents Should NOT Do + +- Do **not** make final review decisions. +- Do **not** decide severity. +- Do **not** classify a finding as a Blocker / Strong recommendation / Suggestion / Observation. +- Do **not** recommend reuse vs. generalization vs. naming alignment as a final answer — present these as possibilities for the main reviewer. +- Do **not** invent code; if patch direction is unclear, say so. + +## Subagent Report Format + +Return a compact report with: + +- **Target summary** — one or two sentences in your own words. +- **Files, symbols, or contract surfaces changed** — what you understood the new code to be. +- **Search strategy used** — query set tried, layers covered, files inspected. +- **Important search anchors** — which queries were high-yield, which were noise. +- **Possible existing implementations or related patterns** — each with: + - file path, symbol/setting/field/API, line numbers when possible + - one-sentence description of what it does + - why it may be relevant + - what would have to be true for it to be a real match (anchors) +- **Similarity classification** for each candidate, picking from: + - exact duplicate + - near duplicate + - same abstraction exists + - same standard mechanism exists + - same concept implemented differently + - naming/API inconsistency + - setting/config inconsistency + - schema/data-model inconsistency + - related design precedent only + - no meaningful precedent found +- **Confidence** — high / medium / low. +- **Why the match may matter** — coupling cost, divergence cost, user-facing inconsistency cost. +- **Reuse / generalization / naming-only signals** — for the main reviewer's judgment, not as a recommendation: + - whether direct reuse may be possible + - whether generalization may be appropriate + - whether only naming/API/setting alignment may be relevant +- **Ownership / layering concerns.** +- **Uncertainties** — reasons the evidence may be weak, areas not searched, queries not tried. + +Keep the report compact. Reference file:line. Do not paste large code blocks unless essential. + +## Confidence and Uncertainty Guidance + +- **High confidence** — read both the new code and the candidate, semantic match is clear, ownership/layering compatible, anchors hold. +- **Medium confidence** — read both, semantic match is plausible but partial, or anchors are partial. +- **Low confidence** — name/shape resemblance only, candidate not fully read, anchors weak. + +State uncertainties explicitly. The main reviewer needs to know what was not searched as much as what was. + +## Independence + +Each subagent works without seeing other subagents' work. The main reviewer integrates and deduplicates. If two subagents independently surface the same precedent, that's a stronger signal — the main reviewer treats it as such during validation. diff --git a/.claude/skills/codebase-consistency-reviewer/references/output-template.md b/.claude/skills/codebase-consistency-reviewer/references/output-template.md new file mode 100644 index 000000000000..75822397575e --- /dev/null +++ b/.claude/skills/codebase-consistency-reviewer/references/output-template.md @@ -0,0 +1,118 @@ +# Output Template + +Produce the final review using this structure. Sections may be omitted when empty. For a small review, the Search Appendix may collapse to a single paragraph. + +--- + +# Codebase Consistency Review + +## Scope + +- **Input**: PR / commit range / diff source / patch source. +- **Changed files**: list. +- **Repository search available**: yes / no. +- **Subagents**: parallel / sequential / none. +- **Detected language(s) / framework(s)**: list. +- **Limitations**: anything that constrained the review (e.g., no repo access, partial diff, paywalled docs, no PR description). + +## Executive Summary + +A short paragraph naming the most important findings, followed by counts: + +- Exact reuse opportunities: N +- Likely duplication: N +- Abstraction / generalization opportunities: N +- Naming / API consistency issues: N +- Settings / config consistency issues: N +- Schema / data-model consistency issues: N +- Standard-mechanism consistency issues: N +- Checked but not flagged: N + +## Findings + +For each actionable finding: + +### Finding N: + +- **Severity**: Blocker / Strong recommendation / Suggestion / Observation +- **Category** (one of): + - Existing function/class/helper should be reused + - Existing abstraction should be extended + - Duplicate or near-duplicate implementation + - Inconsistent naming + - Inconsistent user-facing setting/API + - Inconsistent schema/data model + - Non-standard codebase mechanism + - Consider generalizing existing code + - Borrow existing design ideas + - Related precedent exists, but no direct reuse recommended +- **Changed code**: file path, symbol/setting/field/API, line references. +- **Existing precedent**: file path, symbol/setting/field/API/test/doc, line references. +- **Analysis**: what is similar, what is different, whether the precedent is direct reuse / a stronger abstraction / naming-API precedent / standard mechanism precedent / only related design; whether ownership or layering affects the recommendation. Be specific. Do not just say "similar code exists." +- **Recommendation**: a concrete action. +- **Suggested patch direction** (optional): short pseudo-diff or replacement sketch when the surrounding context is sufficient. Do not invent exact code otherwise. +- **Confidence**: High / Medium / Low. + +## Checked But Not Flagged + +Concise list of significant targets investigated without raising a finding. For each: + +- Target. +- What was checked. +- Why no finding was raised (e.g., precedent only superficially similar; PR description justifies divergence; existing code is legacy/unsuitable for reuse). + +## Search Appendix + +Compact appendix (single paragraph for small reviews): + +- Key targets investigated. +- Subagents used; parallel or sequential. +- Most relevant search anchors and query expansions (synonyms, lifecycle pairs, domain terms scraped from nearby files). +- Important files / tests / docs inspected. +- Important uncertainty (areas not fully searched, queries that returned noise, candidates not opened). +- Search limitations (no repo, no docs, no PR description, language unfamiliar to local tooling). + +--- + +## Severity Labels + +- **Blocker** — Strong duplication of working code, or a user-facing inconsistency expensive to undo once shipped. Reuse / alignment is clearly correct and feasible. +- **Strong recommendation** — Clear precedent exists; not aligning will create real maintenance cost or visible divergence. +- **Suggestion** — Precedent exists; alignment would be cleaner but divergence is defensible. +- **Observation** — Related precedent worth knowing about; no action required. + +## Category Labels + +Use the canonical names above so future readers can scan by category. + +## Examples of Good Recommendations + +- "Use `RetryPolicy::withExponentialBackoff` from `core/retry.rs:42` instead of the custom loop in `worker.rs:118-156`. Same semantics, already covered by tests in `core/retry_tests.rs`." +- "Move the path-normalization logic in `ingest/upload.go:88-104` into the existing `pathutil.Normalize` (`pathutil/normalize.go:12`) and call it from both ingest and export. Current divergence already produces different behavior on trailing slashes." +- "Rename setting `query_evaluation_timeout_seconds` to `max_query_evaluation_seconds` for consistency with `max_*_seconds` settings already in `Settings.h` (`max_execution_time`, `max_session_timeout_seconds`)." +- "Register the new background job through `BackgroundJobRegistry` (`bg/registry.cpp:55`) instead of starting a thread directly in `Refresher::start`. Every other background job in this codebase is registered through that mechanism." +- "This code cannot directly reuse `LRUCache` because it needs TTL eviction, but it should borrow the slot-eviction pattern from `LRUCache::evict` (`cache/lru.cpp:201`) rather than the linear scan in the new code." +- "The new column `last_refresh_ts` should be named `last_refresh_time` to match `last_modification_time` and `last_load_time` in sibling system tables." +- "No direct reuse is recommended; align the error message format with `formatRetryError` (`errors/format.cpp:30`) so operator-facing logs stay consistent." + +## Examples of Weak Findings That Should Stay Out + +These belong in *Checked But Not Flagged* (or be omitted entirely): + +- "There's another function called `validate` elsewhere in the codebase." — name match without semantic match. +- "This module also uses retries somewhere." — no concrete reuse, generalization, or naming alignment proposed. +- "Could possibly be combined with X" with no reason why combining would be correct or beneficial. +- "Style differs from another file" — unless tied to a documented convention or visible user-facing surface. +- "This pattern appears in five places" — without identifying the canonical one or proposing alignment. + +## Collapsing Sections for Small Reviews + +For a small change with one or two targets: + +- Keep **Scope**, **Findings**, and a one-paragraph **Search Appendix**. +- Drop **Executive Summary** counts; replace with a single sentence. +- Drop **Checked But Not Flagged** if empty. + +For a no-finding review: + +- Keep **Scope**, a one-paragraph **Executive Summary** stating no actionable findings, **Checked But Not Flagged**, and a one-paragraph **Search Appendix**. diff --git a/.claude/skills/codebase-consistency-reviewer/references/review-rubric.md b/.claude/skills/codebase-consistency-reviewer/references/review-rubric.md new file mode 100644 index 000000000000..44f570ecd1ba --- /dev/null +++ b/.claude/skills/codebase-consistency-reviewer/references/review-rubric.md @@ -0,0 +1,126 @@ +# Review Rubric + +## Target Selection Principles + +A **review target** is any non-trivial part (or cluster) of the change that may plausibly have: + +- an existing implementation elsewhere +- an established abstraction or helper to reuse +- a standard mechanism the codebase uses for this kind of job +- a conventional way of being named, configured, exposed, tested, logged, or documented +- a precedent in neighboring subsystems +- a risk of creating a parallel API, setting, schema field, lifecycle, error model, algorithm, or operational pattern +- a future maintenance cost if implemented inconsistently + +### Prioritize targets that + +- Create or change a **contract surface**: API, public/semi-public function, setting, flag, schema field, table column, metric, log field, error type, persisted format, wire format, documented behavior, user-visible name. +- Implement a **recognizable responsibility**: lifecycle, retry, timeout, caching, scheduling, locking/synchronization, validation, parsing, formatting, lookup, selection, matching, conversion, resource management. +- Introduce an **abstraction, helper, wrapper, adapter, registry entry, extension point, integration point, or composition pattern** future code may copy. +- Appear to **bypass or recreate a standard codebase mechanism**. +- **Overlap semantically** with a concept that may already exist under another name. +- Establish a **name, API shape, setting shape, test pattern, or operational convention** that should match nearby or global precedent. +- Make a **small but durable user/operator/developer/downstream-system-facing decision**. + +### Pay special attention to "small" changes that encode conventions + +- Adding a column, field, config key, enum value, metric, log attribute, or error code. +- Introducing a name users, operators, developers, or downstream systems will see. +- Adding a helper whose behavior sounds generic. +- Adding local logic that looks like parsing, normalization, validation, formatting, lookup, conversion, matching, scheduling, or selection. +- Adding a second path to do something that already has a canonical path. +- Exposing existing behavior through a new layer, interface, or name. + +## Clustering Rule + +Cluster cooperating changes into a single target when reviewing them separately would fragment the analysis. Cluster when: + +- the pieces only make sense together +- they share one user-visible / operator-visible concept +- they implement one responsibility distributed across files + +**Examples of clustered targets:** +- A lock implemented across several methods + a background thread + a state field → one target. +- A new retry policy spread across a wrapper + a config key + an error type → one target. +- A new metric defined in one place, emitted in another, documented in a third → one target. + +## Triviality Filter (skip these) + +- Getters, setters, simple accessors. +- One-line delegations / pass-throughs. +- Pure plumbing, parameter forwarding, formatting, trivial renames. +- Mechanical refactors with no new contract surface. +- Bug fixes inside existing logic that introduce no new concept. +- Local helpers too small to plausibly have a canonical version elsewhere. + +If the change is too trivial to plausibly have a "right way" elsewhere, skip it. + +## Examples + +**Concept-bearing changes (targets):** +- New retry/backoff policy. +- New cache or memoization layer. +- New scheduler, queue, or executor. +- New configuration key or CLI flag. +- New error class or error code. +- New metric, log field, or trace attribute. +- New schema field or table column. +- New parsing/validation/normalization helper. +- New lifecycle hook, registry, or extension point. +- New user-visible name (function in public API, setting, command, doc term). + +**Convention-bearing changes (targets):** +- A new option spelling that other options in the same subsystem already follow. +- A new column whose name diverges from sibling columns. +- A new error message style that diverges from existing style. +- A new test fixture pattern. + +**Non-targets:** +- Renaming a private variable. +- Splitting a function for readability with no new contract. +- Adding logging at a single existing call site, using existing logger. +- Tightening a type without behavior change. +- Inlining a constant. + +## Severity Definitions + +- **Blocker** — Strong duplication of working code, or a user-facing inconsistency (setting/API/schema/metric/error/log) that will be expensive to undo once shipped. Reuse or alignment is clearly correct and feasible. +- **Strong recommendation** — Clear precedent exists; not aligning will create real maintenance cost or visible divergence. Reasonable to push back on. +- **Suggestion** — Precedent exists and alignment would be cleaner, but divergence is defensible. +- **Observation** — Related precedent worth knowing about; no action required. + +## Finding Categories + +- Existing function/class/helper should be reused +- Existing abstraction should be extended +- Duplicate or near-duplicate implementation +- Inconsistent naming +- Inconsistent user-facing setting/API +- Inconsistent schema/data model +- Non-standard codebase mechanism (bypassed standard mechanism) +- Consider generalizing existing code +- Borrow existing design ideas (no direct reuse) +- Related precedent exists, but no direct reuse recommended + +## Distinguishing Match Types + +Don't collapse all similarity into "duplication". Differentiate: + +- **Direct reuse opportunity** — call existing code instead. +- **Abstraction opportunity** — generalize existing code; both new and old call it. +- **Naming/API precedent** — implementations may differ; align surface. +- **Setting/config precedent** — align option name/shape/parser. +- **Schema/data-model precedent** — align field/column naming and shape. +- **Standard-mechanism precedent** — register through the existing mechanism (registry, factory, hook, config schema, error/logging/metrics helper, etc.). +- **Related design precedent** — borrow the *idea*; direct reuse not appropriate. +- **Weak similarity** — surface resemblance only; do not flag. + +## Avoiding False Positives + +- Don't flag superficial name similarity. Open both files and compare semantics. +- Don't recommend reuse across ownership boundaries unless the boundary already permits it. +- Don't recommend reuse of legacy/deprecated/special-purpose code. +- Don't ignore the PR description: stated intent for divergence weighs against flagging. +- Don't recommend generalization for its own sake — only when it prevents divergent behavior or reduces real maintenance cost. +- A name match without a behavior match is *naming* precedent, not duplication. Classify accordingly. +- A behavior match with different ownership/layer may still warrant a *standard-mechanism* finding even if direct reuse is wrong. diff --git a/.claude/skills/codebase-consistency-reviewer/references/search-strategy.md b/.claude/skills/codebase-consistency-reviewer/references/search-strategy.md new file mode 100644 index 000000000000..ab981230ccc3 --- /dev/null +++ b/.claude/skills/codebase-consistency-reviewer/references/search-strategy.md @@ -0,0 +1,118 @@ +# Search Strategy + +Subagents have **lexical** tools (`rg`/`grep`, file reads, repo navigation) and reasoning. There is **no vector or embedding search**. "Semantic search" in this skill means: drive lexical search with semantically generated queries, then **read candidates and judge semantic match by reasoning over the code**. + +## Building a Search Brief + +For each target, the brief should contain: + +- **Target summary** — one or two sentences describing the concept. +- **Changed files and symbols** (when available). +- **Concept / responsibility** introduced or changed. +- **Why this could overlap** with existing code/conventions. +- **Observable behavior** — inputs, outputs, side effects, lifecycle. +- **Contract surface involved** — user-facing names, settings, fields, flags, metrics, errors, API shapes. +- **Literal identifiers from the new code.** +- **Possible synonyms / alternative names** the codebase may use. +- **Nearby anchors** likely to reveal true precedent (sibling files, related modules, related tests/docs). +- **Required anchors vs. optional hints** — which must hold for a match to be valid, which are just clues. +- **Ownership / module / layering boundaries** relevant to reuse. +- **What would count as** an exact duplicate / near duplicate / reusable abstraction / standard mechanism / naming-API precedent / related design / weak match. + +Convert each target into one or more **consistency questions**: + +- Is there already a canonical way to implement this responsibility? +- Is there a helper, abstraction, class, policy object, lifecycle hook, or framework mechanism for this? +- Does this subsystem already expose the same concept under another name? +- Are equivalent fields/columns/settings/parameters named differently elsewhere? +- Is this error/log/metric/config/test shape consistent with similar behavior? +- Do neighboring modules solve this with a registry/factory/visitor/enum/trait/interface/hook? +- Is this code introducing a custom mechanism for something the codebase usually centralizes? +- Should this be direct reuse, generalization, naming alignment, or design alignment? + +## Generating a Wide Query Set + +The target is the **concept**, not the names. New code may use the wrong name. Expand semantically before searching. + +Start from literal identifiers, then add: + +- **Synonyms / paraphrases**: + - retry → backoff, reattempt, redo, again + - lock → mutex, guard, exclusive, semaphore, latch + - cache → memoize, store, buffer, pool + - validate → check, verify, ensure, assert + - schedule → queue, dispatch, defer, plan +- **Action-noun / noun-verb forms**: schedule / scheduler / scheduling; convert / conversion / converter; refresh / refresher / refreshing. +- **Lifecycle and opposite pairs**: acquire/release, open/close, start/stop, register/unregister, subscribe/unsubscribe, init/shutdown, begin/end. +- **Broader and narrower terms**: "exponential backoff" → also try "backoff" and "retry"; "least-recently-used" → also try "eviction". +- **Domain vocabulary the codebase actually uses** — scan nearby files (sibling source, tests, docs) and add the words they use. This is the highest-yield expansion source. +- **Imports and library calls** the new code uses — they often anchor existing precedent. +- **Test names and doc strings** near the change. + +Generate the query set first, then search. Don't search one term, look at one result, and stop. + +## Driving Lexical Search + +- Use `rg` (or `grep -R`) with the query set. +- Search for symbols, type names, config keys, error text, log keys, metric names, test names. +- Search documentation, schema files, CLI definitions, settings tables. +- Use boundary-aware patterns when names are short or generic. +- Cast wide first; then narrow once a promising area is found. + +## Reading Candidates (Don't Match Names) + +A lexical hit is a **lead, not a verdict**. Open the candidate and judge semantic match by comparing: + +- purpose / responsibility +- shape (signature, fields, schema, flag spelling) +- behavior (inputs, outputs, side effects) +- lifecycle (when created, when destroyed, when called) +- error model and reporting +- concurrency / resource constraints +- ownership and layering + +A function called `validate_user` may or may not be the same concept as a new `check_user_input` — only reading both can tell. + +## Search in Layers, Local-First + +1. Same file and directly adjacent files. +2. Same module / package / subsystem. +3. Tests for the same subsystem. +4. Shared utilities, framework code, common abstractions. +5. Similar concepts elsewhere in the repository. + +Prefer local precedent over distant precedent unless the distant precedent is clearly a global standard (e.g., a project-wide error class, a project-wide config schema, a project-wide registry). + +## Tests and Docs as Precedent Sources + +- **Tests** reveal canonical constructors, fixtures, expected errors, expected config shapes, and behavior boundaries. They often pin down conventions code alone hides. +- **Docs** reveal user-facing terminology that should govern naming even when internal code uses different terms. + +## Required vs. Optional Anchors + +Some matches are only valid with specific anchors. Require them when relevant: + +- A **settings match** may require the same subsystem, config parser, or user-facing option layer. +- A **table/column naming** comparison may require the same table family or adjacent schema. +- An **algorithm** comparison may require the same input/output semantics. +- An **API consistency** comparison may require the same public API layer. +- A **direct-reuse** recommendation requires compatible ownership and dependency direction. + +If the required anchor doesn't hold, demote the finding (precedent only, not reuse) or drop it. + +## Avoid Overfitting to New Code's Names + +If you only search the new identifiers, you'll miss the precedent that uses the codebase's actual vocabulary. Always include synonyms and domain terms scraped from nearby files. + +## Detecting Bypassed Standard Mechanisms + +Look explicitly for hand-rolled versions of mechanisms the codebase usually centralizes: + +- registries, factories, visitors +- configuration schemas, feature flags +- error classes, logging helpers, metrics helpers +- serialization frameworks, lifecycle hooks +- dependency injection, test utilities +- naming conventions, schema conventions + +If the codebase routinely registers things through mechanism X and the new code adds a custom path that does the same job, that's a finding even if the implementation is correct. diff --git a/.claude/skills/ubrella-clickhose-review/SKILL.md b/.claude/skills/ubrella-clickhose-review/SKILL.md new file mode 100644 index 000000000000..d34d777671f5 --- /dev/null +++ b/.claude/skills/ubrella-clickhose-review/SKILL.md @@ -0,0 +1,470 @@ +--- +name: ubrella-clickhose-review +description: Use when the user asks for a multi-perspective review of a ClickHouse / C++ diff, PR, branch, commit range, or commit hash. Triggers on requests like "review this PR", "umbrella review", "ubrella review", "do a full review", "deep review of branch X", or any ClickHouse code review where multiple independent angles (security, perf, concurrency, lifetime, compat, tests, etc.) should be considered before producing a consolidated report. +--- + +# Umbrella ClickHouse Review + +Multi-perspective code review for ClickHouse / C++ diffs. Prepare neutral shared context, dispatch specialized review subagents in parallel, validate findings, and produce one consolidated high-signal report. + +## When to Use + +- User provides a PR number/URL, branch name, commit hash, commit range, or explicit diff spec and wants a thorough review. +- Change touches non-trivial ClickHouse code (server, storage, parsers, network, Keeper, formats, settings, SQL surface). +- User explicitly invokes "umbrella review" / "ubrella review". + +**Don't use for:** trivial doc-only changes, single-line fixes, or when user asks for a quick scan only. + +## Workflow + +### 1. Resolve the diff + +Determine input type and gather material: + +- **PR (number or URL):** `gh pr view --json title,body,baseRefName,headRefName,files,commits` and `gh pr diff `. +- **Branch:** `git diff ...` and `git log ..`. Default base is `master` unless specified. +- **Commit range / hash:** `git diff `, `git log `. +- **Explicit diff spec:** use as given. + +Record: base ref, head ref, file count, commit messages, PR title/description. + +### 2. Build neutral shared context + +Before launching reviewers, prepare a single context block. **Facts only — no judgement.** Include: + +**File / subsystem map:** +- changed files grouped by subsystem/component +- touched public APIs / settings / config / formats / protocols +- touched tests, docs, build files, scripts, generated files +- likely hot paths +- likely user-facing behavior +- new or changed shared state +- new or changed external inputs / SQL-visible parameters / file paths +- removed or relaxed tests +- large or binary files + +**Behavior map (when applicable):** +- user/system entrypoints +- validation and dispatch layers +- state / storage / cache interactions +- downstream integrations: filesystem, network, Keeper/ZooKeeper, background workers, services +- exception and error-propagation paths +- state transitions and side effects +- important invariants the change appears to rely on + +Only include facts visible from the diff, commit messages, PR metadata, and nearby code. + +### 3. Dispatch subagents in parallel + +Use the `Agent` tool with `subagent_type: "general-purpose"` and `model: "sonnet"` (cheaper but capable). Send all applicable subagents in a **single message with multiple Agent tool calls** so they run concurrently. + +Each subagent gets: +- the neutral shared context (verbatim) +- the diff +- its assigned scope and prompt (below) +- the **General review rules** and **Required output format** (below) + +Skip subagents that clearly do not apply (e.g. concurrency review on a docs-only change). Always run subagents 1, 2, 3, 7, 11, 12. Run 15 (deep audit) only for complex / high-risk changes. + +### 4. Aggregate + +- Deduplicate findings across subagents. +- Verify file/line references and evidence against actual diff. +- Re-score severity when a subagent over- or under-states risk. +- Merge related findings sharing a root cause. +- Drop speculative / unsupported findings. +- Keep important-but-unproven concerns under **Needs verification**. +- Prefer actionable fixes over general advice. +- Surface cross-cutting themes only after concrete findings. +- If deep audit ran, preserve its coverage summary; drop its narrative. +- Classify failure behavior when relevant: success, handled failure, fail-open, fail-closed, unexpected exception, cancellation-safe, partial update. + +### 5. Produce the final report + +Use the **Final report format** below. Omit empty sections except Summary, Reviewed scope, and Final verdict. Include Coverage summary when deep audit ran or when coverage gaps materially affect confidence. + +--- + +## General review rules (give to every subagent) + +``` +Each subagent must: +- Review only from its assigned perspective. +- Inspect surrounding context when the diff alone is insufficient. +- Prefer concrete findings over generic advice. +- Avoid style nitpicks unless they create real ambiguity, risk, or maintenance cost. +- Avoid duplicate findings already obvious from its own previous points. +- Mark uncertainty explicitly. +- Do not assume behavior not visible from code or provided context. +- If a finding depends on missing context, say exactly what context is missing. + +False positives are costly. Do not report a finding unless there is a plausible +concrete risk, maintenance problem, user impact, or reviewability problem. +``` + +## Required subagent output format + +``` +Subagent: +Scope: + +Findings: +1. Title: + Risk score: <0-100> + Confidence: high | medium | low + Severity: blocker | major | minor | nit | follow-up + Files/lines: + Evidence: + Risk: + Proposed fix: + Notes: + +2. ... + +Needs verification: +- +``` + +**Risk score guidance:** +- 90-100: likely blocker; correctness, data loss, security, serious compat, race/deadlock, severe hot-path regression. +- 70-89: major issue likely worth fixing before merge. +- 40-69: meaningful maintainability, operability, test, or perf concern. +- 20-39: minor but concrete improvement. +- 0-19: nit / optional cleanup; usually do not report. + +--- + +## Subagents catalog + +For each subagent: `Scope` is the one-line scope to put in the output header; `Prompt` is the body to pass. + +### 1. UX / feature sanity / public contract + +**Scope:** UX, feature coherence, public API and behavioral contract. + +**Focus:** feature UX and logical consistency; whether behavior matches user expectations; public API and contract changes; exposed interfaces, defaults, error semantics, invariants; misuse-prone APIs; narrow implementations where a general idiomatic alternative exists; long-term maintainability of user-visible model; surprising behavior, misleading errors, confusing edges. + +**Prompt:** +``` +Review the diff from a UX, feature-sanity, and public-contract perspective. +Look for ad hoc behavior, confusing semantics, poor defaults, surprising error +behavior, unclear invariants, misuse-prone APIs, and places where the +implementation solves a narrow case while a more idiomatic general design +exists. Report only concrete issues with user or caller impact. +``` + +### 2. Code architecture / design + +**Scope:** Architecture, design, component cooperation. + +**Focus:** design inconsistencies; accidental complexity; unclear or implicit behavior; SOLID / responsibility violations; leaky abstractions; tight coupling / low cohesion; non-idiomatic or hard-to-use internal APIs; encapsulation / layering; partial / asymmetric fixes across similar components. + +**Prompt:** +``` +Review the diff for architecture and design issues: inconsistencies, accidental +complexity, unclear or implicit behavior, responsibility/SOLID violations, +leaky abstractions, tight coupling, low cohesion, non-idiomatic or hard-to-use +APIs, improper encapsulation/layering, and partial/asymmetric fixes across +sibling components or similar code paths. +Explain how the moving parts interact and where the design creates future +maintenance risk. +``` + +### 3. Ockham / YAGNI / unnecessary diff + +**Scope:** Avoidable, unrelated, no-op, or premature changes. + +**Focus:** unrelated cleanup mixed with core change; no-improvement edits; speculative generality; unnecessary abstractions; no-op rewrites; review-impeding churn; changes belonging in separate commits or follow-ups. + +**Prompt:** +``` +Review the diff for unnecessary diff: avoidable, unrelated, speculative, or +no-op changes that do not improve behavior, performance, clarity, safety, or +reviewability. Identify which changes should be removed, split into separate +commits, or postponed. +``` + +### 4. Security / SQL access control / trust boundary + +**Scope:** Security-sensitive behavior and trust boundaries. + +**Focus:** SQL access control / privilege checks; auth, ACLs, row policies, quotas, resource limits; server-side file access / path traversal; user-controlled paths, URLs, identifiers, settings, formats, table functions, config; leaked file content / secrets in errors; trust-boundary expansion (internal code newly exposed via SQL/API/user input); unsafe wrappers around previously internal functions; malformed / empty / minimal / huge / adversarial inputs; injection in SQL construction, shell calls, paths, logs, diagnostics. + +**Prompt:** +``` +Review the diff for security issues and trust-boundary expansion. +Pay special attention to SQL access control, user-controlled inputs, +server-side file access/path traversal, resource-limit bypasses, leaked +secrets in errors/logs, and internal code newly exposed to SQL/API/user input. +For suspicious callee code, compare old callers vs new callers and test +boundary assumptions with concrete minimal inputs. +``` + +### 5. Performance / hot-path + +**Scope:** Runtime performance and scalability. + +**Focus:** hot-path regressions; unneeded copies/moves; avoidable allocations, missing reserve, allocation in loops; virtual dispatch, missed inlining, std::function / type-erasure overhead in hot paths; cache-unfriendly access, pointer chasing, locality; wrong containers, high constant factors; repeated computation, missed caching/hoisting; pathological complexity, accidental O(N²); unbounded growth/allocations; extra disk/network roundtrips, syscalls, fsyncs, sleeps, polling; logging in hot paths. + +**Prompt:** +``` +Review the diff from a performance perspective, especially hot paths. +Look for unnecessary copies, allocations, dispatch/inlining overhead, +cache-unfriendly access, poor container choices, repeated work, pathological +complexity, unbounded growth, and inefficient disk/network/syscall behavior. +Report only issues that plausibly affect realistic workloads or scalability. +``` + +### 6. Documentation / comments / changelog + +**Scope:** Understandability for users and maintainers. + +**Focus:** missing user-facing docs; misleading / incomplete docs; comments for non-obvious logic; outdated comments; unclear error / log / diagnostic messages; changelog / release-note quality; migration notes for incompatible behavior; typos in user-visible strings, logs, docs, comments, identifiers. + +**Prompt:** +``` +Review documentation and explanatory quality: user-facing docs, comments for +non-obvious logic, changelog/release-note quality, migration notes, +diagnostics, logs, error messages, and typos. Do not ask for comments on +obvious code; focus on places where missing or misleading explanation creates +user or maintenance risk. +``` + +### 7. Code quality / correctness / maintainability + +**Scope:** Local code quality and bug-proneness. + +**Focus:** likely bugs / edge-case errors; clarity / readability; naming; defensive coding; fragile assumptions; error-prone control flow; duplicated logic; confusing conditionals; magic constants; unchecked return values; integer overflow / truncation / lossy conversions; invalid assumptions about nullability, emptiness, ordering, sizes. + +**Prompt:** +``` +Review the diff for code quality, correctness, and maintainability: likely +bugs, unclear code, poor naming, fragile assumptions, missing defensive +checks, duplicated logic, magic constants, unchecked failures, and error-prone +edge cases. Prefer concrete bug risks over style preferences. +``` + +### 8. Operability / DevOps / observability + +**Scope:** Production introspection, debuggability, alerting. + +**Focus:** introspection of behavior; debuggability of failure modes; useful logs / errors; metrics and perf counters; alertability of exceptional situations; missing context in diagnostics; safe degradation / recovery; background task visibility; operational impact of config / default changes. + +**Prompt:** +``` +Review the diff from an operability / DevOps perspective: introspection, +debugging, metrics, logging, alerting, production diagnostics, recovery, and +visibility into exceptional or background behavior. +Flag cases where production issues would be hard to detect, debug, or react to. +``` + +### 9. Header / include / compile-time impact + +**Scope:** Header hygiene and build-time impact. + +**Focus:** unnecessary includes; missing forward decls; excessive header deps; heavy transitive includes in high-fan-out headers; non-trivial function bodies in headers; template defs causing broad instantiation; large constexpr work in headers; binary size / compile-time regressions; deps that should move from header to .cpp. + +**Prompt:** +``` +Review the diff for header/include hygiene and compile-time impact: +unnecessary includes, missing forward declarations, excessive header +dependencies, heavy transitive includes, non-trivial code in headers, +unnecessary template instantiations, large constexpr work in headers, and +dependencies that should move to implementation files. +``` + +### 10. Concurrency / synchronization + +**Scope:** Multithreading, shared state, synchronization. + +**Focus:** shared mutable state; readers/writers and guards; data and lifetime races; missing / excessive / inconsistent locking; lock ordering, deadlocks; TOCTOU; atomics and memory ordering; condition variables; reentrancy; blocking I/O or callbacks under locks; async / task / thread-pool interactions; thread-safety of exposed APIs; undocumented thread-affinity or implicit ordering assumptions. + +**Prompt:** +``` +Review the diff for concurrency issues: shared mutable state, data races, +lifetime races, locking consistency, lock ordering/deadlocks, TOCTOU, +atomics/memory ordering, condition variables, reentrancy, blocking under +locks, and async/callback/thread-pool safety. +For each touched shared object, identify readers/writers, guards, relevant +call paths, and risky interleavings. +``` + +### 11. Tests / regression risk + +**Scope:** Validation adequacy and regression coverage. + +**Focus:** missing tests for changed behavior; weak assertions; deleted / relaxed / over-normalized tests; missing negative tests; missing boundary cases (empty, length 1, huge, malformed, null, invalid config); missing compatibility tests; missing security / concurrency / perf tests where relevant; brittle tests; gaps between implementation risk and coverage. + +**Prompt:** +``` +Review test coverage and regression risk: missing tests, weak assertions, +untested edge cases, deleted/relaxed tests, missing negative tests, missing +compatibility/security/concurrency/performance coverage, brittle tests, and +gaps between changed behavior and validation. +Suggest specific tests to add or strengthen. +``` + +### 12. Compatibility / rollout / migration + +**Scope:** Cross-version, cross-deployment, cross-config safety. + +**Focus:** backward / forward compatibility; config and settings changes; defaults and behavior changes; wire / storage / serialization / metadata / protocol formats; migrations and downgrade behavior; mixed-version clusters; feature flags / experimental gates; safe rollout in OSS and Cloud-like environments; compatibility settings preserving old behavior; existing data or objects that may violate new validation. + +**Prompt:** +``` +Review the diff for compatibility and rollout risks: backward/forward +compatibility, config/settings changes, defaults, wire/storage/serialization/ +protocol formats, migrations, downgrade behavior, mixed-version clusters, +feature flags, experimental gates, and safe rollout. +Flag new validation or behavior changes that may break existing data, +queries, configs, or deployments. +``` + +### 13. C++ lifetime / ownership / exception safety + +**Scope:** Object lifetime, ownership, exception safety. + +**Focus:** dangling refs / pointers / views / spans / string_views; iterator/reference invalidation; move/copy semantics; moved-from object use; RAII correctness; cleanup on early returns / exceptions; exception safety / partial state updates; noexcept boundaries and new throw paths; destructor callbacks / scope guards / callback-triggered exceptions; shared_ptr / unique_ptr misuse; ownership ambiguity; cycles or double ownership. + +**Prompt:** +``` +Review the diff for C++ lifetime, ownership, and exception-safety issues: +dangling references/views, invalidation, move/copy mistakes, RAII +correctness, cleanup on early returns/exceptions, new throw paths crossing +noexcept/destructor boundaries, smart-pointer misuse, ownership ambiguity, +and partial-state updates. +Trace full caller chains when new exceptions or lifetime assumptions are +introduced. +``` + +### 14. Repository impact / generated / binary artifacts + +**Scope:** Repository hygiene; accidental artifacts. + +**Focus:** large files; binary files; accidentally committed generated files; vendored dependency blobs; archives / JARs / .so / executables / datasets / model artifacts; split / chunked binaries; build outputs; repo bloat; test data that should be generated or downloaded. + +**Prompt:** +``` +Review the diff for repository-impact issues: large files, binary artifacts, +generated files, vendored blobs, archives, compiled outputs, datasets, split +binaries, and unnecessary repository bloat. +Flag anything that should not be committed or should be generated/downloaded +at test time instead. +``` + +### 15. Deep audit / transition and fault-injection (high-risk only) + +**Scope:** Deep transition and fault analysis for complex / high-risk changes. + +**When to run:** state machines; config interactions; API/protocol flows; storage / metadata mutations; background tasks; replication / Keeper logic; security boundaries; concurrency-sensitive behavior. **Skip for trivial / local changes.** + +**Focus:** call graph and entrypoints; caller assumptions and trust-boundary changes; request/event flow through validation, dispatch, state changes, outputs, side effects; key invariants before/after each transition; logical branch coverage; fault categories derived from actual changed components; malformed / empty / minimal / huge input, missing config, timeout, failed IO/network, exception, cancellation, shutdown, concurrent update; fail-open vs fail-closed for security paths; rollback / cleanup / partial-update safety; parser/config/runtime/API parity; cross-partition / cross-component interactions. + +**Prompt:** +``` +Review the diff in deep audit mode. First build a lightweight call graph and +transition matrix for the changed behavior: entrypoints, validation/dispatch, +state/storage/cache interactions, downstream integrations, state changes, +outputs, side effects, and error/exception propagation. + +List the key invariants and check whether each transition preserves them. + +Define logical fault categories from the actual code under review, then test +them by reasoning through success, handled failure, fail-open, fail-closed, +exception, cancellation, timeout, malformed input, boundary input, shutdown, +and concurrency/timing paths as applicable. + +For mutation-heavy paths, analyze exception or cancellation after each +intermediate state change and verify rollback, cleanup, and invariants. + +For critical shared-state paths, write plausible interleavings and check for +race, deadlock, lifetime, and partial-update hazards. + +Report only confirmed defects. Keep speculative concerns under "Needs +verification". Include a short coverage summary: reviewed entrypoints, +transitions, fault categories, skipped areas, and assumptions. +``` + +--- + +## Final report format + +``` +# Review report: + +## Summary + + +## Reviewed scope +- Diff: +- Base: +- Files changed: +- Main moving parts: + - : + +## Missing context +- + +## Blockers +1. + Risk score: <0-100> + Sources: + Files/lines: <...> + Evidence: <...> + Impact: <...> + Proposed fix: <...> + +## Major issues +... + +## Minor issues / improvements +... + +## Needs verification +- + +## Suggested commit / diff split +- Core refactoring / behavior changes: + 1. + 2. +- Separate follow-ups: + - + +## Tests to add or strengthen +- + +## Coverage summary +- Entry points reviewed: +- Transitions reviewed: +- Fault categories checked: +- Deferred / not covered: +- Main assumptions: + +## Final verdict +Status: approve | request changes | block +Minimum required actions: +- +``` + +Omit empty sections except Summary, Reviewed scope, and Final verdict. + +--- + +## Tone and quality bar + +- Strict but neutral. +- High-signal findings only. +- No generic checklists dumped on the user. +- No praise for every checked area; no "looks good" sections. +- Findings must be specific enough to act on. +- Suggest small surgical fixes over broad rewrites. +- Keep subagent output plain and structured so aggregation is reliable. + +## Common mistakes + +- Launching subagents serially instead of in one parallel batch. +- Letting subagents write the final user report (they must not). +- Forwarding low-confidence speculation as findings instead of "Needs verification". +- Skipping the neutral context step — reviewers then duplicate exploration work. +- Running deep audit (#15) on trivial diffs — wasteful. +- Including "looks good" sections or padding the report.