From 098596170febf12c812f1c7a99b83b2f0d22969a Mon Sep 17 00:00:00 2001
From: Mikhail Filimonov <mfilimonov@altinity.com>
Date: Mon, 4 May 2026 18:32:35 +0200
Subject: [PATCH] Add ubrella-clickhose-review and
 codebase-consistency-reviewer skills

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../codebase-consistency-reviewer/SKILL.md    | 131 +++++
 .../references/lookup-worker-instructions.md  |  91 ++++
 .../references/output-template.md             | 118 +++++
 .../references/review-rubric.md               | 126 +++++
 .../references/search-strategy.md             | 118 +++++
 .../skills/ubrella-clickhose-review/SKILL.md  | 470 ++++++++++++++++++
 6 files changed, 1054 insertions(+)
 create mode 100644 .claude/skills/codebase-consistency-reviewer/SKILL.md
 create mode 100644 .claude/skills/codebase-consistency-reviewer/references/lookup-worker-instructions.md
 create mode 100644 .claude/skills/codebase-consistency-reviewer/references/output-template.md
 create mode 100644 .claude/skills/codebase-consistency-reviewer/references/review-rubric.md
 create mode 100644 .claude/skills/codebase-consistency-reviewer/references/search-strategy.md
 create mode 100644 .claude/skills/ubrella-clickhose-review/SKILL.md

diff --git a/.claude/skills/codebase-consistency-reviewer/SKILL.md b/.claude/skills/codebase-consistency-reviewer/SKILL.md
new file mode 100644
index 000000000000..2241ee6ef80b
--- /dev/null
+++ b/.claude/skills/codebase-consistency-reviewer/SKILL.md
@@ -0,0 +1,131 @@
+---
+name: codebase-consistency-reviewer
+description: Use when asked to review a PR, commit, commit range, branch, patch, or pasted diff for duplicated functionality, reinvented wheels, not-invented-here patterns, parallel abstractions, inconsistent naming, inconsistent settings/APIs/schemas/metrics/errors/logs, redundant config keys, non-standard implementations, or places where new code should reuse, generalize, extend, or align with existing codebase patterns and user-facing conventions. Triggers on phrases like "review for duplication", "consistency review", "is this already implemented", "does this match our conventions", "reinventing the wheel".
+---
+
+# Codebase Consistency Reviewer
+
+## Purpose
+
+This is **not** a correctness or bug review. The lens is:
+
+- Is this functionality already implemented somewhere else, possibly under another name?
+- Is there an established helper, abstraction, API, setting, schema, or naming pattern to reuse?
+- Does the new code follow the standard mechanism this codebase uses for this kind of job?
+- Do new user-facing names/settings/fields/metrics/errors/logs match existing conventions?
+- Should the change generalize an existing implementation instead of paralleling it?
+- If direct reuse isn't appropriate, should it borrow concepts from related code?
+
+Report **actionable consistency risks**. Skip weak similarities.
+
+## Inputs
+
+Accept any of: PR number, PR URL, commit SHA, commit range, branch name, local patch/diff text, pasted changed files, repo path or URL. Ask for missing input only when the review cannot proceed.
+
+## Workflow
+
+### 1. Intake & capability detection
+
+Detect what's available before planning:
+- Local repo? (`git`, file reads, `rg`/`grep`)
+- GitHub access? (`gh` CLI, GitHub MCP, web fetch)
+- Subagents available for parallel lookup?
+- Single language/framework or many?
+
+If repo access exists, retrieve: the diff, changed files with surrounding context, relevant imports, touched public/semi-public APIs, touched settings/config schemas, touched CLI flags / env vars / table schemas / protocol fields / user-facing names, related tests, related docs, **and the PR description, linked issues, and commit messages** (the author may have justified deliberate divergence — weight stated intent before flagging).
+
+If repo access is unavailable, work from the supplied material and state the limitation in Scope.
+
+The main reviewer infers language/framework/conventions from the changed files. Do not assume — let the codebase tell you what "standard mechanism" means here.
+
+### 2. Identify review targets
+
+Extract **concept-bearing and convention-bearing** changes — not every changed line.
+
+**Cluster** related changes into one target when they cooperate to do one job (a lock spread across methods + a thread + a state field is one target, not four).
+
+**Skip trivial changes**: getters/setters, one-line delegations, plumbing, parameter forwarding, mechanical refactors, formatting, trivial renames, bug fixes that introduce no new concept, local helpers too small to plausibly have a canonical version.
+
+**Prioritize** targets that:
+- create or change a contract surface (API, setting, flag, schema field, table column, metric, log field, error, persisted/wire format, documented behavior, user-visible name)
+- implement a recognizable responsibility (lifecycle, retry, timeout, caching, scheduling, locking, validation, parsing, formatting, lookup, conversion, matching, resource management)
+- introduce an abstraction, helper, registry entry, extension point, or composition pattern others may copy
+- look like they bypass or recreate a standard codebase mechanism
+- overlap semantically with concepts that may already exist under another name
+- establish a name/API/setting/test/operational convention that should match precedent
+
+Pay special attention to "small" changes that encode durable conventions: a new column, config key, enum value, metric name, log attribute, error code, or any name users/operators/developers/downstream systems will see.
+
+For each selected target, separate:
+- the **semantic concept** introduced
+- the **literal identifiers** in the new code
+- **likely existing names/synonyms** elsewhere
+- the **contract surface** involved, if any
+- **ownership/layering** boundaries that may affect reuse
+- **what kind of precedent** would help: direct reuse / reusable abstraction / naming-API precedent / setting-config precedent / schema precedent / standard-mechanism precedent / related design precedent
+
+Filter aggressively on triviality, then investigate every remaining target. Do **not** artificially cap target count.
+
+See `references/review-rubric.md` for target-selection principles, clustering rule, triviality filter, severity definitions, finding categories, and false-positive guidance.
+
+### 3. Build search briefs
+
+For each target, create a concise brief. The **target is the concept**, not the names; new code may use the wrong name. Search both the new identifiers and the underlying concept.
+
+Brief contents and concrete query-generation tactics (synonyms, lifecycle pairs, broader/narrower terms, domain vocabulary scraped from nearby files): see `references/search-strategy.md`.
+
+### 4. Delegate lookup investigations to subagents
+
+For each non-trivial target, dispatch a **separate** subagent. Run them **in parallel** when subagent tooling supports it; otherwise sequentially (note this in the Search Appendix).
+
+Each subagent receives only: the target's search brief, the relevant diff excerpt, and enough surrounding context. They search for evidence and **return references and evidence, not final recommendations**.
+
+Subagent instructions and the compact report format: see `references/lookup-worker-instructions.md`.
+
+### 5. Search strategy (no vector search assumed)
+
+"Semantic search" here means: drive **lexical** tools (`rg`/`grep`) with semantically-generated queries, then **read candidates and judge semantic match by reasoning over the code**. Names are weak signals — open the file before believing the match.
+
+Search local-first (same file → adjacent files → module → tests → shared utilities → global). Use tests and docs as precedent sources (they reveal canonical constructors, fixtures, expected errors, user-facing terminology). Look explicitly for **bypassed standard mechanisms** (registries, factories, config schemas, error classes, logging/metrics helpers, lifecycle hooks, DI, test utilities, naming/schema conventions).
+
+Full strategy: `references/search-strategy.md`.
+
+### 6. Validate findings (do not trust subagents blindly)
+
+For each candidate finding, the main reviewer compares new code and precedent on: semantic purpose, behavior, inputs/outputs, side effects, lifecycle, error semantics, performance/concurrency constraints, ownership, layering, dependency direction, public-API implications, compatibility, test coverage, and whether the precedent is current vs. legacy/deprecated/special-purpose. Check whether the PR description already justifies divergence.
+
+Promote a target to a finding only when there's a **concrete, actionable recommendation or a meaningful consistency risk**. Park inconclusive or non-actionable matches in "Checked but not flagged".
+
+Do not:
+- recommend reuse that creates bad coupling or violates ownership/layering
+- flag superficial name similarity as duplication
+- ignore naming/setting/schema/API consistency just because implementations differ
+- recommend generalization for its own sake
+- penalize deliberate divergence the PR description already explains
+
+### 7. Produce the report
+
+Use the structure in `references/output-template.md`:
+
+- **Scope** — inputs, changed files, capabilities used, parallel vs sequential, language/framework, limitations
+- **Executive Summary** — most important findings with counts by category
+- **Findings** — each with Severity, Category, Changed code refs, Existing precedent refs, Analysis, Recommendation, optional patch sketch, Confidence
+- **Checked But Not Flagged** — significant targets investigated without raising a finding
+- **Search Appendix** — targets, subagents (parallel or sequential), key anchors/query expansions, files/tests/docs inspected, uncertainty, limitations
+
+Sections may be omitted when empty. Small reviews may collapse the Search Appendix to one paragraph.
+
+## Style
+
+- Practical, specific, codebase-aware. Concrete recommendations over abstract advice.
+- Don't overclaim. "Conceptually similar" stays "conceptually similar."
+- Local conventions govern local code; global conventions govern shared/public surfaces.
+- User-facing knobs (settings, names, APIs, schema fields, metrics, logs, errors, persisted formats) get extra attention — those inconsistencies are expensive to fix later.
+- Keep the final report focused on **actionable** risks. Don't dump every weak similarity from search.
+
+## References
+
+- `references/review-rubric.md` — target selection, clustering, triviality, severity, categories, FP guidance
+- `references/search-strategy.md` — briefs, query generation, lexical-driven semantic search, layering, anchors
+- `references/lookup-worker-instructions.md` — subagent dispatch and report format
+- `references/output-template.md` — final report structure and finding format
diff --git a/.claude/skills/codebase-consistency-reviewer/references/lookup-worker-instructions.md b/.claude/skills/codebase-consistency-reviewer/references/lookup-worker-instructions.md
new file mode 100644
index 000000000000..1150ddd675c4
--- /dev/null
+++ b/.claude/skills/codebase-consistency-reviewer/references/lookup-worker-instructions.md
@@ -0,0 +1,91 @@
+# Lookup Worker (Subagent) Instructions
+
+The main reviewer dispatches one **lookup subagent per important review target** (or per cluster). Subagents are evidence gatherers, not deciders.
+
+## Dispatch Rules
+
+- **One subagent per target.** Keep investigations independent — do not give a subagent multiple unrelated targets.
+- **Run in parallel** when the available subagent tooling supports concurrent agents. If parallelism is unavailable, run sequentially and note this in the Search Appendix.
+- Use **cheaper / faster** subagents (e.g., a general-purpose or Explore-style search agent) for breadth.
+- Reserve the main reviewer for target selection, clustering, evidence validation, and final recommendations.
+
+## What Each Subagent Receives
+
+Only what is needed for the investigation:
+
+- the **target search brief**
+- the relevant **diff excerpt** for the target
+- enough **surrounding context** (imports, related symbols, neighboring files) to understand the target
+- explicit instructions to **search for existing precedent**
+- explicit instructions to **return references and evidence, not final recommendations**
+- a pointer to `references/search-strategy.md` for how to search
+
+Do **not** pass other targets, the full PR, the full report scaffolding, or unrelated guidance.
+
+## What Subagents Should Search For
+
+- Exact or near-duplicate implementations of the target's responsibility.
+- Existing helpers, abstractions, classes, registries, factories, or framework mechanisms that already do this job.
+- Similar user-facing APIs, settings, flags, fields, columns, metrics, log keys, errors, or schema elements.
+- Naming precedents (sibling settings, sibling columns, sibling metrics, sibling errors).
+- Tests and docs that pin down canonical shapes or terminology.
+- Patterns used by neighboring subsystems for the same kind of job.
+- Semantically similar implementations under different names (use synonym/lifecycle/broader-narrower expansion).
+- Standard mechanisms the new code may be **bypassing** or **recreating**.
+
+The strategy (lexical-driven semantic search, layered local-first searching, query expansion, anchor handling, candidate reading) is described in `search-strategy.md`. Follow it.
+
+## What Subagents Should NOT Do
+
+- Do **not** make final review decisions.
+- Do **not** decide severity.
+- Do **not** classify a finding as a Blocker / Strong recommendation / Suggestion / Observation.
+- Do **not** recommend reuse vs. generalization vs. naming alignment as a final answer — present these as possibilities for the main reviewer.
+- Do **not** invent code; if patch direction is unclear, say so.
+
+## Subagent Report Format
+
+Return a compact report with:
+
+- **Target summary** — one or two sentences in your own words.
+- **Files, symbols, or contract surfaces changed** — what you understood the new code to be.
+- **Search strategy used** — query set tried, layers covered, files inspected.
+- **Important search anchors** — which queries were high-yield, which were noise.
+- **Possible existing implementations or related patterns** — each with:
+  - file path, symbol/setting/field/API, line numbers when possible
+  - one-sentence description of what it does
+  - why it may be relevant
+  - what would have to be true for it to be a real match (anchors)
+- **Similarity classification** for each candidate, picking from:
+  - exact duplicate
+  - near duplicate
+  - same abstraction exists
+  - same standard mechanism exists
+  - same concept implemented differently
+  - naming/API inconsistency
+  - setting/config inconsistency
+  - schema/data-model inconsistency
+  - related design precedent only
+  - no meaningful precedent found
+- **Confidence** — high / medium / low.
+- **Why the match may matter** — coupling cost, divergence cost, user-facing inconsistency cost.
+- **Reuse / generalization / naming-only signals** — for the main reviewer's judgment, not as a recommendation:
+  - whether direct reuse may be possible
+  - whether generalization may be appropriate
+  - whether only naming/API/setting alignment may be relevant
+- **Ownership / layering concerns.**
+- **Uncertainties** — reasons the evidence may be weak, areas not searched, queries not tried.
+
+Keep the report compact. Reference file:line. Do not paste large code blocks unless essential.
+
+## Confidence and Uncertainty Guidance
+
+- **High confidence** — read both the new code and the candidate, semantic match is clear, ownership/layering compatible, anchors hold.
+- **Medium confidence** — read both, semantic match is plausible but partial, or anchors are partial.
+- **Low confidence** — name/shape resemblance only, candidate not fully read, anchors weak.
+
+State uncertainties explicitly. The main reviewer needs to know what was not searched as much as what was.
+
+## Independence
+
+Each subagent works without seeing other subagents' work. The main reviewer integrates and deduplicates. If two subagents independently surface the same precedent, that's a stronger signal — the main reviewer treats it as such during validation.
diff --git a/.claude/skills/codebase-consistency-reviewer/references/output-template.md b/.claude/skills/codebase-consistency-reviewer/references/output-template.md
new file mode 100644
index 000000000000..75822397575e
--- /dev/null
+++ b/.claude/skills/codebase-consistency-reviewer/references/output-template.md
@@ -0,0 +1,118 @@
+# Output Template
+
+Produce the final review using this structure. Sections may be omitted when empty. For a small review, the Search Appendix may collapse to a single paragraph.
+
+---
+
+# Codebase Consistency Review
+
+## Scope
+
+- **Input**: PR / commit range / diff source / patch source.
+- **Changed files**: list.
+- **Repository search available**: yes / no.
+- **Subagents**: parallel / sequential / none.
+- **Detected language(s) / framework(s)**: list.
+- **Limitations**: anything that constrained the review (e.g., no repo access, partial diff, paywalled docs, no PR description).
+
+## Executive Summary
+
+A short paragraph naming the most important findings, followed by counts:
+
+- Exact reuse opportunities: N
+- Likely duplication: N
+- Abstraction / generalization opportunities: N
+- Naming / API consistency issues: N
+- Settings / config consistency issues: N
+- Schema / data-model consistency issues: N
+- Standard-mechanism consistency issues: N
+- Checked but not flagged: N
+
+## Findings
+
+For each actionable finding:
+
+### Finding N: <short title>
+
+- **Severity**: Blocker / Strong recommendation / Suggestion / Observation
+- **Category** (one of):
+  - Existing function/class/helper should be reused
+  - Existing abstraction should be extended
+  - Duplicate or near-duplicate implementation
+  - Inconsistent naming
+  - Inconsistent user-facing setting/API
+  - Inconsistent schema/data model
+  - Non-standard codebase mechanism
+  - Consider generalizing existing code
+  - Borrow existing design ideas
+  - Related precedent exists, but no direct reuse recommended
+- **Changed code**: file path, symbol/setting/field/API, line references.
+- **Existing precedent**: file path, symbol/setting/field/API/test/doc, line references.
+- **Analysis**: what is similar, what is different, whether the precedent is direct reuse / a stronger abstraction / naming-API precedent / standard mechanism precedent / only related design; whether ownership or layering affects the recommendation. Be specific. Do not just say "similar code exists."
+- **Recommendation**: a concrete action.
+- **Suggested patch direction** (optional): short pseudo-diff or replacement sketch when the surrounding context is sufficient. Do not invent exact code otherwise.
+- **Confidence**: High / Medium / Low.
+
+## Checked But Not Flagged
+
+Concise list of significant targets investigated without raising a finding. For each:
+
+- Target.
+- What was checked.
+- Why no finding was raised (e.g., precedent only superficially similar; PR description justifies divergence; existing code is legacy/unsuitable for reuse).
+
+## Search Appendix
+
+Compact appendix (single paragraph for small reviews):
+
+- Key targets investigated.
+- Subagents used; parallel or sequential.
+- Most relevant search anchors and query expansions (synonyms, lifecycle pairs, domain terms scraped from nearby files).
+- Important files / tests / docs inspected.
+- Important uncertainty (areas not fully searched, queries that returned noise, candidates not opened).
+- Search limitations (no repo, no docs, no PR description, language unfamiliar to local tooling).
+
+---
+
+## Severity Labels
+
+- **Blocker** — Strong duplication of working code, or a user-facing inconsistency expensive to undo once shipped. Reuse / alignment is clearly correct and feasible.
+- **Strong recommendation** — Clear precedent exists; not aligning will create real maintenance cost or visible divergence.
+- **Suggestion** — Precedent exists; alignment would be cleaner but divergence is defensible.
+- **Observation** — Related precedent worth knowing about; no action required.
+
+## Category Labels
+
+Use the canonical names above so future readers can scan by category.
+
+## Examples of Good Recommendations
+
+- "Use `RetryPolicy::withExponentialBackoff` from `core/retry.rs:42` instead of the custom loop in `worker.rs:118-156`. Same semantics, already covered by tests in `core/retry_tests.rs`."
+- "Move the path-normalization logic in `ingest/upload.go:88-104` into the existing `pathutil.Normalize` (`pathutil/normalize.go:12`) and call it from both ingest and export. Current divergence already produces different behavior on trailing slashes."
+- "Rename setting `query_evaluation_timeout_seconds` to `max_query_evaluation_seconds` for consistency with `max_*_seconds` settings already in `Settings.h` (`max_execution_time`, `max_session_timeout_seconds`)."
+- "Register the new background job through `BackgroundJobRegistry` (`bg/registry.cpp:55`) instead of starting a thread directly in `Refresher::start`. Every other background job in this codebase is registered through that mechanism."
+- "This code cannot directly reuse `LRUCache` because it needs TTL eviction, but it should borrow the slot-eviction pattern from `LRUCache::evict` (`cache/lru.cpp:201`) rather than the linear scan in the new code."
+- "The new column `last_refresh_ts` should be named `last_refresh_time` to match `last_modification_time` and `last_load_time` in sibling system tables."
+- "No direct reuse is recommended; align the error message format with `formatRetryError` (`errors/format.cpp:30`) so operator-facing logs stay consistent."
+
+## Examples of Weak Findings That Should Stay Out
+
+These belong in *Checked But Not Flagged* (or be omitted entirely):
+
+- "There's another function called `validate` elsewhere in the codebase." — name match without semantic match.
+- "This module also uses retries somewhere." — no concrete reuse, generalization, or naming alignment proposed.
+- "Could possibly be combined with X" with no reason why combining would be correct or beneficial.
+- "Style differs from another file" — unless tied to a documented convention or visible user-facing surface.
+- "This pattern appears in five places" — without identifying the canonical one or proposing alignment.
+
+## Collapsing Sections for Small Reviews
+
+For a small change with one or two targets:
+
+- Keep **Scope**, **Findings**, and a one-paragraph **Search Appendix**.
+- Drop **Executive Summary** counts; replace with a single sentence.
+- Drop **Checked But Not Flagged** if empty.
+
+For a no-finding review:
+
+- Keep **Scope**, a one-paragraph **Executive Summary** stating no actionable findings, **Checked But Not Flagged**, and a one-paragraph **Search Appendix**.
diff --git a/.claude/skills/codebase-consistency-reviewer/references/review-rubric.md b/.claude/skills/codebase-consistency-reviewer/references/review-rubric.md
new file mode 100644
index 000000000000..44f570ecd1ba
--- /dev/null
+++ b/.claude/skills/codebase-consistency-reviewer/references/review-rubric.md
@@ -0,0 +1,126 @@
+# Review Rubric
+
+## Target Selection Principles
+
+A **review target** is any non-trivial part (or cluster) of the change that may plausibly have:
+
+- an existing implementation elsewhere
+- an established abstraction or helper to reuse
+- a standard mechanism the codebase uses for this kind of job
+- a conventional way of being named, configured, exposed, tested, logged, or documented
+- a precedent in neighboring subsystems
+- a risk of creating a parallel API, setting, schema field, lifecycle, error model, algorithm, or operational pattern
+- a future maintenance cost if implemented inconsistently
+
+### Prioritize targets that
+
+- Create or change a **contract surface**: API, public/semi-public function, setting, flag, schema field, table column, metric, log field, error type, persisted format, wire format, documented behavior, user-visible name.
+- Implement a **recognizable responsibility**: lifecycle, retry, timeout, caching, scheduling, locking/synchronization, validation, parsing, formatting, lookup, selection, matching, conversion, resource management.
+- Introduce an **abstraction, helper, wrapper, adapter, registry entry, extension point, integration point, or composition pattern** future code may copy.
+- Appear to **bypass or recreate a standard codebase mechanism**.
+- **Overlap semantically** with a concept that may already exist under another name.
+- Establish a **name, API shape, setting shape, test pattern, or operational convention** that should match nearby or global precedent.
+- Make a **small but durable user/operator/developer/downstream-system-facing decision**.
+
+### Pay special attention to "small" changes that encode conventions
+
+- Adding a column, field, config key, enum value, metric, log attribute, or error code.
+- Introducing a name users, operators, developers, or downstream systems will see.
+- Adding a helper whose behavior sounds generic.
+- Adding local logic that looks like parsing, normalization, validation, formatting, lookup, conversion, matching, scheduling, or selection.
+- Adding a second path to do something that already has a canonical path.
+- Exposing existing behavior through a new layer, interface, or name.
+
+## Clustering Rule
+
+Cluster cooperating changes into a single target when reviewing them separately would fragment the analysis. Cluster when:
+
+- the pieces only make sense together
+- they share one user-visible / operator-visible concept
+- they implement one responsibility distributed across files
+
+**Examples of clustered targets:**
+- A lock implemented across several methods + a background thread + a state field → one target.
+- A new retry policy spread across a wrapper + a config key + an error type → one target.
+- A new metric defined in one place, emitted in another, documented in a third → one target.
+
+## Triviality Filter (skip these)
+
+- Getters, setters, simple accessors.
+- One-line delegations / pass-throughs.
+- Pure plumbing, parameter forwarding, formatting, trivial renames.
+- Mechanical refactors with no new contract surface.
+- Bug fixes inside existing logic that introduce no new concept.
+- Local helpers too small to plausibly have a canonical version elsewhere.
+
+If the change is too trivial to plausibly have a "right way" elsewhere, skip it.
+
+## Examples
+
+**Concept-bearing changes (targets):**
+- New retry/backoff policy.
+- New cache or memoization layer.
+- New scheduler, queue, or executor.
+- New configuration key or CLI flag.
+- New error class or error code.
+- New metric, log field, or trace attribute.
+- New schema field or table column.
+- New parsing/validation/normalization helper.
+- New lifecycle hook, registry, or extension point.
+- New user-visible name (function in public API, setting, command, doc term).
+
+**Convention-bearing changes (targets):**
+- A new option spelling that other options in the same subsystem already follow.
+- A new column whose name diverges from sibling columns.
+- A new error message style that diverges from existing style.
+- A new test fixture pattern.
+
+**Non-targets:**
+- Renaming a private variable.
+- Splitting a function for readability with no new contract.
+- Adding logging at a single existing call site, using existing logger.
+- Tightening a type without behavior change.
+- Inlining a constant.
+
+## Severity Definitions
+
+- **Blocker** — Strong duplication of working code, or a user-facing inconsistency (setting/API/schema/metric/error/log) that will be expensive to undo once shipped. Reuse or alignment is clearly correct and feasible.
+- **Strong recommendation** — Clear precedent exists; not aligning will create real maintenance cost or visible divergence. Reasonable to push back on.
+- **Suggestion** — Precedent exists and alignment would be cleaner, but divergence is defensible.
+- **Observation** — Related precedent worth knowing about; no action required.
+
+## Finding Categories
+
+- Existing function/class/helper should be reused
+- Existing abstraction should be extended
+- Duplicate or near-duplicate implementation
+- Inconsistent naming
+- Inconsistent user-facing setting/API
+- Inconsistent schema/data model
+- Non-standard codebase mechanism (bypassed standard mechanism)
+- Consider generalizing existing code
+- Borrow existing design ideas (no direct reuse)
+- Related precedent exists, but no direct reuse recommended
+
+## Distinguishing Match Types
+
+Don't collapse all similarity into "duplication". Differentiate:
+
+- **Direct reuse opportunity** — call existing code instead.
+- **Abstraction opportunity** — generalize existing code; both new and old call it.
+- **Naming/API precedent** — implementations may differ; align surface.
+- **Setting/config precedent** — align option name/shape/parser.
+- **Schema/data-model precedent** — align field/column naming and shape.
+- **Standard-mechanism precedent** — register through the existing mechanism (registry, factory, hook, config schema, error/logging/metrics helper, etc.).
+- **Related design precedent** — borrow the *idea*; direct reuse not appropriate.
+- **Weak similarity** — surface resemblance only; do not flag.
+
+## Avoiding False Positives
+
+- Don't flag superficial name similarity. Open both files and compare semantics.
+- Don't recommend reuse across ownership boundaries unless the boundary already permits it.
+- Don't recommend reuse of legacy/deprecated/special-purpose code.
+- Don't ignore the PR description: stated intent for divergence weighs against flagging.
+- Don't recommend generalization for its own sake — only when it prevents divergent behavior or reduces real maintenance cost.
+- A name match without a behavior match is *naming* precedent, not duplication. Classify accordingly.
+- A behavior match with different ownership/layer may still warrant a *standard-mechanism* finding even if direct reuse is wrong.
diff --git a/.claude/skills/codebase-consistency-reviewer/references/search-strategy.md b/.claude/skills/codebase-consistency-reviewer/references/search-strategy.md
new file mode 100644
index 000000000000..ab981230ccc3
--- /dev/null
+++ b/.claude/skills/codebase-consistency-reviewer/references/search-strategy.md
@@ -0,0 +1,118 @@
+# Search Strategy
+
+Subagents have **lexical** tools (`rg`/`grep`, file reads, repo navigation) and reasoning. There is **no vector or embedding search**. "Semantic search" in this skill means: drive lexical search with semantically generated queries, then **read candidates and judge semantic match by reasoning over the code**.
+
+## Building a Search Brief
+
+For each target, the brief should contain:
+
+- **Target summary** — one or two sentences describing the concept.
+- **Changed files and symbols** (when available).
+- **Concept / responsibility** introduced or changed.
+- **Why this could overlap** with existing code/conventions.
+- **Observable behavior** — inputs, outputs, side effects, lifecycle.
+- **Contract surface involved** — user-facing names, settings, fields, flags, metrics, errors, API shapes.
+- **Literal identifiers from the new code.**
+- **Possible synonyms / alternative names** the codebase may use.
+- **Nearby anchors** likely to reveal true precedent (sibling files, related modules, related tests/docs).
+- **Required anchors vs. optional hints** — which must hold for a match to be valid, which are just clues.
+- **Ownership / module / layering boundaries** relevant to reuse.
+- **What would count as** an exact duplicate / near duplicate / reusable abstraction / standard mechanism / naming-API precedent / related design / weak match.
+
+Convert each target into one or more **consistency questions**:
+
+- Is there already a canonical way to implement this responsibility?
+- Is there a helper, abstraction, class, policy object, lifecycle hook, or framework mechanism for this?
+- Does this subsystem already expose the same concept under another name?
+- Are equivalent fields/columns/settings/parameters named differently elsewhere?
+- Is this error/log/metric/config/test shape consistent with similar behavior?
+- Do neighboring modules solve this with a registry/factory/visitor/enum/trait/interface/hook?
+- Is this code introducing a custom mechanism for something the codebase usually centralizes?
+- Should this be direct reuse, generalization, naming alignment, or design alignment?
+
+## Generating a Wide Query Set
+
+The target is the **concept**, not the names. New code may use the wrong name. Expand semantically before searching.
+
+Start from literal identifiers, then add:
+
+- **Synonyms / paraphrases**:
+  - retry → backoff, reattempt, redo, again
+  - lock → mutex, guard, exclusive, semaphore, latch
+  - cache → memoize, store, buffer, pool
+  - validate → check, verify, ensure, assert
+  - schedule → queue, dispatch, defer, plan
+- **Action-noun / noun-verb forms**: schedule / scheduler / scheduling; convert / conversion / converter; refresh / refresher / refreshing.
+- **Lifecycle and opposite pairs**: acquire/release, open/close, start/stop, register/unregister, subscribe/unsubscribe, init/shutdown, begin/end.
+- **Broader and narrower terms**: "exponential backoff" → also try "backoff" and "retry"; "least-recently-used" → also try "eviction".
+- **Domain vocabulary the codebase actually uses** — scan nearby files (sibling source, tests, docs) and add the words they use. This is the highest-yield expansion source.
+- **Imports and library calls** the new code uses — they often anchor existing precedent.
+- **Test names and doc strings** near the change.
+
+Generate the query set first, then search. Don't search one term, look at one result, and stop.
+
+## Driving Lexical Search
+
+- Use `rg` (or `grep -R`) with the query set.
+- Search for symbols, type names, config keys, error text, log keys, metric names, test names.
+- Search documentation, schema files, CLI definitions, settings tables.
+- Use boundary-aware patterns when names are short or generic.
+- Cast wide first; then narrow once a promising area is found.
+
+## Reading Candidates (Don't Match Names)
+
+A lexical hit is a **lead, not a verdict**. Open the candidate and judge semantic match by comparing:
+
+- purpose / responsibility
+- shape (signature, fields, schema, flag spelling)
+- behavior (inputs, outputs, side effects)
+- lifecycle (when created, when destroyed, when called)
+- error model and reporting
+- concurrency / resource constraints
+- ownership and layering
+
+A function called `validate_user` may or may not be the same concept as a new `check_user_input` — only reading both can tell.
+
+## Search in Layers, Local-First
+
+1. Same file and directly adjacent files.
+2. Same module / package / subsystem.
+3. Tests for the same subsystem.
+4. Shared utilities, framework code, common abstractions.
+5. Similar concepts elsewhere in the repository.
+
+Prefer local precedent over distant precedent unless the distant precedent is clearly a global standard (e.g., a project-wide error class, a project-wide config schema, a project-wide registry).
+
+## Tests and Docs as Precedent Sources
+
+- **Tests** reveal canonical constructors, fixtures, expected errors, expected config shapes, and behavior boundaries. They often pin down conventions code alone hides.
+- **Docs** reveal user-facing terminology that should govern naming even when internal code uses different terms.
+
+## Required vs. Optional Anchors
+
+Some matches are only valid with specific anchors. Require them when relevant:
+
+- A **settings match** may require the same subsystem, config parser, or user-facing option layer.
+- A **table/column naming** comparison may require the same table family or adjacent schema.
+- An **algorithm** comparison may require the same input/output semantics.
+- An **API consistency** comparison may require the same public API layer.
+- A **direct-reuse** recommendation requires compatible ownership and dependency direction.
+
+If the required anchor doesn't hold, demote the finding (precedent only, not reuse) or drop it.
+
+## Avoid Overfitting to New Code's Names
+
+If you only search the new identifiers, you'll miss the precedent that uses the codebase's actual vocabulary. Always include synonyms and domain terms scraped from nearby files.
+
+## Detecting Bypassed Standard Mechanisms
+
+Look explicitly for hand-rolled versions of mechanisms the codebase usually centralizes:
+
+- registries, factories, visitors
+- configuration schemas, feature flags
+- error classes, logging helpers, metrics helpers
+- serialization frameworks, lifecycle hooks
+- dependency injection, test utilities
+- naming conventions, schema conventions
+
+If the codebase routinely registers things through mechanism X and the new code adds a custom path that does the same job, that's a finding even if the implementation is correct.
diff --git a/.claude/skills/ubrella-clickhose-review/SKILL.md b/.claude/skills/ubrella-clickhose-review/SKILL.md
new file mode 100644
index 000000000000..d34d777671f5
--- /dev/null
+++ b/.claude/skills/ubrella-clickhose-review/SKILL.md
@@ -0,0 +1,470 @@
+---
+name: ubrella-clickhose-review
+description: Use when the user asks for a multi-perspective review of a ClickHouse / C++ diff, PR, branch, commit range, or commit hash. Triggers on requests like "review this PR", "umbrella review", "ubrella review", "do a full review", "deep review of branch X", or any ClickHouse code review where multiple independent angles (security, perf, concurrency, lifetime, compat, tests, etc.) should be considered before producing a consolidated report.
+---
+
+# Umbrella ClickHouse Review
+
+Multi-perspective code review for ClickHouse / C++ diffs. Prepare neutral shared context, dispatch specialized review subagents in parallel, validate findings, and produce one consolidated high-signal report.
+
+## When to Use
+
+- User provides a PR number/URL, branch name, commit hash, commit range, or explicit diff spec and wants a thorough review.
+- Change touches non-trivial ClickHouse code (server, storage, parsers, network, Keeper, formats, settings, SQL surface).
+- User explicitly invokes "umbrella review" / "ubrella review".
+
+**Don't use for:** trivial doc-only changes, single-line fixes, or when user asks for a quick scan only.
+
+## Workflow
+
+### 1. Resolve the diff
+
+Determine input type and gather material:
+
+- **PR (number or URL):** `gh pr view <N> --json title,body,baseRefName,headRefName,files,commits` and `gh pr diff <N>`.
+- **Branch:** `git diff <base>...<branch>` and `git log <base>..<branch>`. Default base is `master` unless specified.
+- **Commit range / hash:** `git diff <range>`, `git log <range>`.
+- **Explicit diff spec:** use as given.
+
+Record: base ref, head ref, file count, commit messages, PR title/description.
+
+### 2. Build neutral shared context
+
+Before launching reviewers, prepare a single context block. **Facts only — no judgement.** Include:
+
+**File / subsystem map:**
+- changed files grouped by subsystem/component
+- touched public APIs / settings / config / formats / protocols
+- touched tests, docs, build files, scripts, generated files
+- likely hot paths
+- likely user-facing behavior
+- new or changed shared state
+- new or changed external inputs / SQL-visible parameters / file paths
+- removed or relaxed tests
+- large or binary files
+
+**Behavior map (when applicable):**
+- user/system entrypoints
+- validation and dispatch layers
+- state / storage / cache interactions
+- downstream integrations: filesystem, network, Keeper/ZooKeeper, background workers, services
+- exception and error-propagation paths
+- state transitions and side effects
+- important invariants the change appears to rely on
+
+Only include facts visible from the diff, commit messages, PR metadata, and nearby code.
+
+### 3. Dispatch subagents in parallel
+
+Use the `Agent` tool with `subagent_type: "general-purpose"` and `model: "sonnet"` (cheaper but capable). Send all applicable subagents in a **single message with multiple Agent tool calls** so they run concurrently.
+
+Each subagent gets:
+- the neutral shared context (verbatim)
+- the diff
+- its assigned scope and prompt (below)
+- the **General review rules** and **Required output format** (below)
+
+Skip subagents that clearly do not apply (e.g. concurrency review on a docs-only change). Always run subagents 1, 2, 3, 7, 11, 12. Run 15 (deep audit) only for complex / high-risk changes.
+
+### 4. Aggregate
+
+- Deduplicate findings across subagents.
+- Verify file/line references and evidence against actual diff.
+- Re-score severity when a subagent over- or under-states risk.
+- Merge related findings sharing a root cause.
+- Drop speculative / unsupported findings.
+- Keep important-but-unproven concerns under **Needs verification**.
+- Prefer actionable fixes over general advice.
+- Surface cross-cutting themes only after concrete findings.
+- If deep audit ran, preserve its coverage summary; drop its narrative.
+- Classify failure behavior when relevant: success, handled failure, fail-open, fail-closed, unexpected exception, cancellation-safe, partial update.
+
+### 5. Produce the final report
+
+Use the **Final report format** below. Omit empty sections except Summary, Reviewed scope, and Final verdict. Include Coverage summary when deep audit ran or when coverage gaps materially affect confidence.
+
+---
+
+## General review rules (give to every subagent)
+
+```
+Each subagent must:
+- Review only from its assigned perspective.
+- Inspect surrounding context when the diff alone is insufficient.
+- Prefer concrete findings over generic advice.
+- Avoid style nitpicks unless they create real ambiguity, risk, or maintenance cost.
+- Avoid duplicate findings already obvious from its own previous points.
+- Mark uncertainty explicitly.
+- Do not assume behavior not visible from code or provided context.
+- If a finding depends on missing context, say exactly what context is missing.
+
+False positives are costly. Do not report a finding unless there is a plausible
+concrete risk, maintenance problem, user impact, or reviewability problem.
+```
+
+## Required subagent output format
+
+```
+Subagent: <name>
+Scope: <one-line scope>
+
+Findings:
+1. Title: <short finding title>
+   Risk score: <0-100>
+   Confidence: high | medium | low
+   Severity: blocker | major | minor | nit | follow-up
+   Files/lines: <file:line or file/range; "unknown" only if unavailable>
+   Evidence: <concrete evidence from diff/code>
+   Risk: <why this matters>
+   Proposed fix: <specific fix or next step>
+   Notes: <optional caveats>
+
+2. ...
+
+Needs verification:
+- <important but not fully proven concern, with exact missing context>
+```
+
+**Risk score guidance:**
+- 90-100: likely blocker; correctness, data loss, security, serious compat, race/deadlock, severe hot-path regression.
+- 70-89: major issue likely worth fixing before merge.
+- 40-69: meaningful maintainability, operability, test, or perf concern.
+- 20-39: minor but concrete improvement.
+- 0-19: nit / optional cleanup; usually do not report.
+
+---
+
+## Subagents catalog
+
+For each subagent: `Scope` is the one-line scope to put in the output header; `Prompt` is the body to pass.
+
+### 1. UX / feature sanity / public contract
+
+**Scope:** UX, feature coherence, public API and behavioral contract.
+
+**Focus:** feature UX and logical consistency; whether behavior matches user expectations; public API and contract changes; exposed interfaces, defaults, error semantics, invariants; misuse-prone APIs; narrow implementations where a general idiomatic alternative exists; long-term maintainability of user-visible model; surprising behavior, misleading errors, confusing edges.
+
+**Prompt:**
+```
+Review the diff from a UX, feature-sanity, and public-contract perspective.
+Look for ad hoc behavior, confusing semantics, poor defaults, surprising error
+behavior, unclear invariants, misuse-prone APIs, and places where the
+implementation solves a narrow case while a more idiomatic general design
+exists. Report only concrete issues with user or caller impact.
+```
+
+### 2. Code architecture / design
+
+**Scope:** Architecture, design, component cooperation.
+
+**Focus:** design inconsistencies; accidental complexity; unclear or implicit behavior; SOLID / responsibility violations; leaky abstractions; tight coupling / low cohesion; non-idiomatic or hard-to-use internal APIs; encapsulation / layering; partial / asymmetric fixes across similar components.
+
+**Prompt:**
+```
+Review the diff for architecture and design issues: inconsistencies, accidental
+complexity, unclear or implicit behavior, responsibility/SOLID violations,
+leaky abstractions, tight coupling, low cohesion, non-idiomatic or hard-to-use
+APIs, improper encapsulation/layering, and partial/asymmetric fixes across
+sibling components or similar code paths.
+Explain how the moving parts interact and where the design creates future
+maintenance risk.
+```
+
+### 3. Ockham / YAGNI / unnecessary diff
+
+**Scope:** Avoidable, unrelated, no-op, or premature changes.
+
+**Focus:** unrelated cleanup mixed with core change; no-improvement edits; speculative generality; unnecessary abstractions; no-op rewrites; review-impeding churn; changes belonging in separate commits or follow-ups.
+
+**Prompt:**
+```
+Review the diff for unnecessary diff: avoidable, unrelated, speculative, or
+no-op changes that do not improve behavior, performance, clarity, safety, or
+reviewability. Identify which changes should be removed, split into separate
+commits, or postponed.
+```
+
+### 4. Security / SQL access control / trust boundary
+
+**Scope:** Security-sensitive behavior and trust boundaries.
+
+**Focus:** SQL access control / privilege checks; auth, ACLs, row policies, quotas, resource limits; server-side file access / path traversal; user-controlled paths, URLs, identifiers, settings, formats, table functions, config; leaked file content / secrets in errors; trust-boundary expansion (internal code newly exposed via SQL/API/user input); unsafe wrappers around previously internal functions; malformed / empty / minimal / huge / adversarial inputs; injection in SQL construction, shell calls, paths, logs, diagnostics.
+
+**Prompt:**
+```
+Review the diff for security issues and trust-boundary expansion.
+Pay special attention to SQL access control, user-controlled inputs,
+server-side file access/path traversal, resource-limit bypasses, leaked
+secrets in errors/logs, and internal code newly exposed to SQL/API/user input.
+For suspicious callee code, compare old callers vs new callers and test
+boundary assumptions with concrete minimal inputs.
+```
+
+### 5. Performance / hot-path
+
+**Scope:** Runtime performance and scalability.
+
+**Focus:** hot-path regressions; unneeded copies/moves; avoidable allocations, missing reserve, allocation in loops; virtual dispatch, missed inlining, std::function / type-erasure overhead in hot paths; cache-unfriendly access, pointer chasing, locality; wrong containers, high constant factors; repeated computation, missed caching/hoisting; pathological complexity, accidental O(N²); unbounded growth/allocations; extra disk/network roundtrips, syscalls, fsyncs, sleeps, polling; logging in hot paths.
+
+**Prompt:**
+```
+Review the diff from a performance perspective, especially hot paths.
+Look for unnecessary copies, allocations, dispatch/inlining overhead,
+cache-unfriendly access, poor container choices, repeated work, pathological
+complexity, unbounded growth, and inefficient disk/network/syscall behavior.
+Report only issues that plausibly affect realistic workloads or scalability.
+```
+
+### 6. Documentation / comments / changelog
+
+**Scope:** Understandability for users and maintainers.
+
+**Focus:** missing user-facing docs; misleading / incomplete docs; comments for non-obvious logic; outdated comments; unclear error / log / diagnostic messages; changelog / release-note quality; migration notes for incompatible behavior; typos in user-visible strings, logs, docs, comments, identifiers.
+
+**Prompt:**
+```
+Review documentation and explanatory quality: user-facing docs, comments for
+non-obvious logic, changelog/release-note quality, migration notes,
+diagnostics, logs, error messages, and typos. Do not ask for comments on
+obvious code; focus on places where missing or misleading explanation creates
+user or maintenance risk.
+```
+
+### 7. Code quality / correctness / maintainability
+
+**Scope:** Local code quality and bug-proneness.
+
+**Focus:** likely bugs / edge-case errors; clarity / readability; naming; defensive coding; fragile assumptions; error-prone control flow; duplicated logic; confusing conditionals; magic constants; unchecked return values; integer overflow / truncation / lossy conversions; invalid assumptions about nullability, emptiness, ordering, sizes.
+
+**Prompt:**
+```
+Review the diff for code quality, correctness, and maintainability: likely
+bugs, unclear code, poor naming, fragile assumptions, missing defensive
+checks, duplicated logic, magic constants, unchecked failures, and error-prone
+edge cases. Prefer concrete bug risks over style preferences.
+```
+
+### 8. Operability / DevOps / observability
+
+**Scope:** Production introspection, debuggability, alerting.
+
+**Focus:** introspection of behavior; debuggability of failure modes; useful logs / errors; metrics and perf counters; alertability of exceptional situations; missing context in diagnostics; safe degradation / recovery; background task visibility; operational impact of config / default changes.
+
+**Prompt:**
+```
+Review the diff from an operability / DevOps perspective: introspection,
+debugging, metrics, logging, alerting, production diagnostics, recovery, and
+visibility into exceptional or background behavior.
+Flag cases where production issues would be hard to detect, debug, or react to.
+```
+
+### 9. Header / include / compile-time impact
+
+**Scope:** Header hygiene and build-time impact.
+
+**Focus:** unnecessary includes; missing forward decls; excessive header deps; heavy transitive includes in high-fan-out headers; non-trivial function bodies in headers; template defs causing broad instantiation; large constexpr work in headers; binary size / compile-time regressions; deps that should move from header to .cpp.
+
+**Prompt:**
+```
+Review the diff for header/include hygiene and compile-time impact:
+unnecessary includes, missing forward declarations, excessive header
+dependencies, heavy transitive includes, non-trivial code in headers,
+unnecessary template instantiations, large constexpr work in headers, and
+dependencies that should move to implementation files.
+```
+
+### 10. Concurrency / synchronization
+
+**Scope:** Multithreading, shared state, synchronization.
+
+**Focus:** shared mutable state; readers/writers and guards; data and lifetime races; missing / excessive / inconsistent locking; lock ordering, deadlocks; TOCTOU; atomics and memory ordering; condition variables; reentrancy; blocking I/O or callbacks under locks; async / task / thread-pool interactions; thread-safety of exposed APIs; undocumented thread-affinity or implicit ordering assumptions.
+
+**Prompt:**
+```
+Review the diff for concurrency issues: shared mutable state, data races,
+lifetime races, locking consistency, lock ordering/deadlocks, TOCTOU,
+atomics/memory ordering, condition variables, reentrancy, blocking under
+locks, and async/callback/thread-pool safety.
+For each touched shared object, identify readers/writers, guards, relevant
+call paths, and risky interleavings.
+```
+
+### 11. Tests / regression risk
+
+**Scope:** Validation adequacy and regression coverage.
+
+**Focus:** missing tests for changed behavior; weak assertions; deleted / relaxed / over-normalized tests; missing negative tests; missing boundary cases (empty, length 1, huge, malformed, null, invalid config); missing compatibility tests; missing security / concurrency / perf tests where relevant; brittle tests; gaps between implementation risk and coverage.
+
+**Prompt:**
+```
+Review test coverage and regression risk: missing tests, weak assertions,
+untested edge cases, deleted/relaxed tests, missing negative tests, missing
+compatibility/security/concurrency/performance coverage, brittle tests, and
+gaps between changed behavior and validation.
+Suggest specific tests to add or strengthen.
+```
+
+### 12. Compatibility / rollout / migration
+
+**Scope:** Cross-version, cross-deployment, cross-config safety.
+
+**Focus:** backward / forward compatibility; config and settings changes; defaults and behavior changes; wire / storage / serialization / metadata / protocol formats; migrations and downgrade behavior; mixed-version clusters; feature flags / experimental gates; safe rollout in OSS and Cloud-like environments; compatibility settings preserving old behavior; existing data or objects that may violate new validation.
+
+**Prompt:**
+```
+Review the diff for compatibility and rollout risks: backward/forward
+compatibility, config/settings changes, defaults, wire/storage/serialization/
+protocol formats, migrations, downgrade behavior, mixed-version clusters,
+feature flags, experimental gates, and safe rollout.
+Flag new validation or behavior changes that may break existing data,
+queries, configs, or deployments.
+```
+
+### 13. C++ lifetime / ownership / exception safety
+
+**Scope:** Object lifetime, ownership, exception safety.
+
+**Focus:** dangling refs / pointers / views / spans / string_views; iterator/reference invalidation; move/copy semantics; moved-from object use; RAII correctness; cleanup on early returns / exceptions; exception safety / partial state updates; noexcept boundaries and new throw paths; destructor callbacks / scope guards / callback-triggered exceptions; shared_ptr / unique_ptr misuse; ownership ambiguity; cycles or double ownership.
+
+**Prompt:**
+```
+Review the diff for C++ lifetime, ownership, and exception-safety issues:
+dangling references/views, invalidation, move/copy mistakes, RAII
+correctness, cleanup on early returns/exceptions, new throw paths crossing
+noexcept/destructor boundaries, smart-pointer misuse, ownership ambiguity,
+and partial-state updates.
+Trace full caller chains when new exceptions or lifetime assumptions are
+introduced.
+```
+
+### 14. Repository impact / generated / binary artifacts
+
+**Scope:** Repository hygiene; accidental artifacts.
+
+**Focus:** large files; binary files; accidentally committed generated files; vendored dependency blobs; archives / JARs / .so / executables / datasets / model artifacts; split / chunked binaries; build outputs; repo bloat; test data that should be generated or downloaded.
+
+**Prompt:**
+```
+Review the diff for repository-impact issues: large files, binary artifacts,
+generated files, vendored blobs, archives, compiled outputs, datasets, split
+binaries, and unnecessary repository bloat.
+Flag anything that should not be committed or should be generated/downloaded
+at test time instead.
+```
+
+### 15. Deep audit / transition and fault-injection (high-risk only)
+
+**Scope:** Deep transition and fault analysis for complex / high-risk changes.
+
+**When to run:** state machines; config interactions; API/protocol flows; storage / metadata mutations; background tasks; replication / Keeper logic; security boundaries; concurrency-sensitive behavior. **Skip for trivial / local changes.**
+
+**Focus:** call graph and entrypoints; caller assumptions and trust-boundary changes; request/event flow through validation, dispatch, state changes, outputs, side effects; key invariants before/after each transition; logical branch coverage; fault categories derived from actual changed components; malformed / empty / minimal / huge input, missing config, timeout, failed IO/network, exception, cancellation, shutdown, concurrent update; fail-open vs fail-closed for security paths; rollback / cleanup / partial-update safety; parser/config/runtime/API parity; cross-partition / cross-component interactions.
+
+**Prompt:**
+```
+Review the diff in deep audit mode. First build a lightweight call graph and
+transition matrix for the changed behavior: entrypoints, validation/dispatch,
+state/storage/cache interactions, downstream integrations, state changes,
+outputs, side effects, and error/exception propagation.
+
+List the key invariants and check whether each transition preserves them.
+
+Define logical fault categories from the actual code under review, then test
+them by reasoning through success, handled failure, fail-open, fail-closed,
+exception, cancellation, timeout, malformed input, boundary input, shutdown,
+and concurrency/timing paths as applicable.
+
+For mutation-heavy paths, analyze exception or cancellation after each
+intermediate state change and verify rollback, cleanup, and invariants.
+
+For critical shared-state paths, write plausible interleavings and check for
+race, deadlock, lifetime, and partial-update hazards.
+
+Report only confirmed defects. Keep speculative concerns under "Needs
+verification". Include a short coverage summary: reviewed entrypoints,
+transitions, fault categories, skipped areas, and assumptions.
+```
+
+---
+
+## Final report format
+
+```
+# Review report: <diff spec / PR / branch>
+
+## Summary
+<neutral summary of the change and high-level verdict>
+
+## Reviewed scope
+- Diff: <input>
+- Base: <base>
+- Files changed: <count>
+- Main moving parts:
+  - <component>: <files / role>
+
+## Missing context
+- <omit section if none>
+
+## Blockers
+1. <finding>
+   Risk score: <0-100>
+   Sources: <subagents>
+   Files/lines: <...>
+   Evidence: <...>
+   Impact: <...>
+   Proposed fix: <...>
+
+## Major issues
+...
+
+## Minor issues / improvements
+...
+
+## Needs verification
+- <concern + exact missing check/context>
+
+## Suggested commit / diff split
+- Core refactoring / behavior changes:
+  1. <step>
+  2. <step>
+- Separate follow-ups:
+  - <unrelated cleanup / docs / tests / bug fix>
+
+## Tests to add or strengthen
+- <specific test suggestions>
+
+## Coverage summary
+- Entry points reviewed: <omit if not applicable>
+- Transitions reviewed: <omit if not applicable>
+- Fault categories checked: <omit if not applicable>
+- Deferred / not covered: <omit if none>
+- Main assumptions: <omit if none>
+
+## Final verdict
+Status: approve | request changes | block
+Minimum required actions:
+- <action>
+```
+
+Omit empty sections except Summary, Reviewed scope, and Final verdict.
+
+---
+
+## Tone and quality bar
+
+- Strict but neutral.
+- High-signal findings only.
+- No generic checklists dumped on the user.
+- No praise for every checked area; no "looks good" sections.
+- Findings must be specific enough to act on.
+- Suggest small surgical fixes over broad rewrites.
+- Keep subagent output plain and structured so aggregation is reliable.
+
+## Common mistakes
+
+- Launching subagents serially instead of in one parallel batch.
+- Letting subagents write the final user report (they must not).
+- Forwarding low-confidence speculation as findings instead of "Needs verification".
+- Skipping the neutral context step — reviewers then duplicate exploration work.
+- Running deep audit (#15) on trivial diffs — wasteful.
+- Including "looks good" sections or padding the report.