Skip to content

feat(rca): generic multi-client RCA agent plugin#1

Open
ruturaj-browserstack wants to merge 12 commits into
mainfrom
feat/generic-rca-agent-plugin
Open

feat(rca): generic multi-client RCA agent plugin#1
ruturaj-browserstack wants to merge 12 commits into
mainfrom
feat/generic-rca-agent-plugin

Conversation

@ruturaj-browserstack

Copy link
Copy Markdown

What this is

A portable Claude-Code/Cursor plugin that drives BrowserStack's collaborative
RCA loop (tfaRcaTurn) over all failed tests of a build — generic across
product and infra. It wraps the stable bstack MCP tools (listTestIds +
tfaRcaTurn) and adds the harness that batches RCA over a whole build, clusters
failures by signature, routes evidence requests to whatever skills/tools the
client already has, and records a per-test RCA.

Generalizes the product-/infra-coupled obs-tfa-rca skill: the loop, routing,
digest, and report core are ported; the coupling points (BrowserStack discovery,
fixed kubectl/chitragupta/bifrost, /tmp, server name) become config +
runtime capability discovery.

Companion change (separate repo, not in this PR): @browserstack/mcp-server
listTestIds gains an opt-in includeFailureDetail flag returning a trimmed
per-test failure signature — the seed for clustering with no extra probe turns.

Architecture

Three roles over the stable MCP contract:

  • rca-build skill (build-level orchestrator) — mandatory pre-flight GitHub
    intake, discovery via listTestIds, the CSV/WAL state spine, failure-signature
    clustering, build-evidence pre-compute + capability manifest, and fan-out.
  • ai-tfa-coordinator agent (per-test) — drives the tfaRcaTurn loop
    (turn-cap, one-thread, soft-PENDING, digest-not-dump); routes each ask by
    capability (no hardcoded tools); runs suspect-PR falsification.
  • TFA (server-side, test-level) — owns the per-test logs, authors the RCA.

Two modes, one coordinator (only the injected gap-resolver differs):

  • autoworkflows/rca-batch.mjs dynamic workflow (5 concurrent, no user
    input; gap → "unavailable" → best-effort finalize).
  • interactive → subagents 5 at a time; on an evidence gap a subagent ends with
    a GAP_OUTPUT (resume handles) and the orchestrator asks the user, then
    re-dispatches with resume=.

Key decisions

  • Clustering seeds off discovery, not probe turns — signature =
    normalize(category | first-error-line | file_path); representative runs the
    full loop, siblings pre-seed a one-turn confirm against their own logs, with a
    fall-back-to-own-loop safeguard (never blindly inherit).
  • Evidence routing is config + capability manifest, never hardcoded tools.
  • No GitHub forensics harnessreferences/github-evidence.md specifies the
    exact evidence needed; the coordinator uses GitHub MCP → gh → degrade.
  • CSV is a write-ahead log (claim/heartbeat/flip + startup reaper) — in-session
    resumable; pending-resume rows stay re-claimable.
  • Coverage stamp — TFA confidence capped by evidence coverage, so a RESOLVED
    built with evidence unavailable reads as lower confidence because of the gap.

In scope: ideation #1–#5 + the v1 slice of #6 (coverage stamp), #7 (resume), #8
(conformance fixture). Deferred: #6 blast-radius digest, #8 git-forensics-MCP,
cross-session durability, Codex/Gemini orchestration parity.

Testing

  • 51 tests pass (npm testnode --test), dependency-free.
  • Unit coverage: routing registry + manifest, CSV/WAL codec + claim/heartbeat/flip/
    reaper + flip-guard + pending-resume resumability, signature normalization +
    clustering, evidence cache, coverage band, report renderer.
  • Conformance fixtures replay recorded tfaRcaTurn transcripts
    (resolved/blocked/pending/turn-cap) through the executable loop mirror
    (lib/loop.mjs, which doubles as the sequential thin-client harness) — proving
    rca capture, test_logs skip, soft-PENDING no-re-poll, turn-cap never submits a
    7th turn, and the degraded no-capability path still reaches a valid terminal RCA.
  • workflows/rca-batch.mjs follows the documented Workflow runtime shape (meta +
    pipeline/parallel/agent); it's validated via the conformance fixtures and
    the unit-tested libs it relies on (the runtime DSL globals can't be unit-loaded).

Install / usage

git clone … && cd browserstack-ai-tfa-demo
cp .env.example .env   # BROWSERSTACK_USERNAME / BROWSERSTACK_ACCESS_KEY
claude --plugin-dir ./
/rca-build <build-id>

Post-Deploy Monitoring & Validation

No production/runtime impact — this is a client-side plugin (skills/agents/workflow

  • a bundled MCP server config), not a deployed service. Validation is manual:
    run /rca-build against a known red build and confirm every failed test lands a
    terminal CSV row + a per-test RCA. The two things to validate live: the sibling
    one-turn-confirm cost win, and the "last green" baseline resolution (both have
    safeguards so correctness doesn't depend on them).

🤖 Generated with Claude Code

ruturaj-browserstack and others added 12 commits June 23, 2026 22:26
…EADME

Identity-only .claude-plugin/plugin.json; root .mcp.json wires the bstack MCP
server (stdio); config/rca.config.json centralizes all formerly-hardcoded
product/infra values (no kubectl/chitragupta/bifrost literals); /rca-build
command parses build id + mode and hands off to the skill.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Port the obs-tfa-rca loop decoupled: ai-tfa-coordinator drives tfaRcaTurn to a
terminal RCA (turn-cap, one-thread, soft-PENDING, digest-not-dump) with the
gather mechanism routed by capability (no kubectl/chitragupta/bifrost literals).
lib/routing.mjs classifies each ask skip/gather/gap against the config registry
+ capability manifest; the gap action is the only mode fork (auto=unavailable,
interactive=ask-user). references/evidence-routing.md carries the digest format
and size caps verbatim. Adds sibling pre-seed one-turn-confirm hook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
SKILL.md orchestrator spec: mandatory GitHub intake ('I don't have one' → RCA-only;
headless missing-input fail-fast), discovery via listTestIds(failed,
includeFailureDetail), then cluster/pre-compute/fan-out/report steps.
lib/csv-state.mjs is the resumable WAL spine — seed (idempotent, terminal-
preserving), claim/heartbeat/flip, reaper, pendingRows — with timestamps injected
(workflow-sandbox-safe) and an RFC4180 codec for multiline RCA fields.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…otocol

lib/signature.mjs computes signature = normalize(category|error|file) off the U1
discovery payload (folds timestamps/uuids/hex/line:col/numbers), groups rows by
signature, picks a deterministic representative (non-flaky, then smallest id),
and leaves signal-less rows as their own singletons. references/clustering.md
documents the O(causes) protocol: representative runs the full loop; siblings
pre-seed a one-turn confirm against their own logs with a fall-back-to-own-loop
safeguard (never blindly inherit).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
buildManifest enumerates the client's discovered capabilities once into
capability→{available,via}, declared to the user + TFA so no evidence is asked
for that the client provably can't get. lib/evidence-cache.mjs computes the
last-green→this-build delta once and caches by (repo,range,evidenceType) — fresh
per-run Map, no module globals (multi-tenant-safe) — with resolveBaseline for the
never-green fallback. Routes the same grounded window into every coordinator.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
workflows/rca-batch.mjs orchestrates the batch in auto mode: a pipeline over
clusters dispatches ai-tfa-coordinator agents — representative full loop →
siblings one-turn-confirm, no barrier between stages — with a structured RCA
schema. Sandbox-correct: does no state I/O itself (orchestrator passes the
clustered work-list + manifest + pre-computed build evidence via args; each
coordinator agent persists its own CSV row eagerly). Gap → 'unavailable' back to
TFA, no user prompt.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
references/interactive-mode.md specifies the orchestrator loop: spawn
ai-tfa-coordinator subagents 5 at a time; a subagent cannot pause to prompt the
user, so on an evidence gap it ends early with a GAP_OUTPUT carrying resume
handles (threadId+turnId); the orchestrator asks A1, then re-dispatches with
resume= and the answer. Same coordinator as auto — only the gap action differs.
Compact blocks not transcripts (lean main context); partial-first; auto-first/
escalate-the-residue noted.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
references/github-evidence.md specifies exactly what each github ask needs
(diff-since-baseline, PRs-in-window touching the failing path, blame, deploy
timing) and the discovery order GitHub MCP → gh → degrade — no shipped forensics
harness. Adds the adversarial falsification protocol (path overlap / deploy-state
guard / direction) so only verdict:supported suspects enter related_prs; ruled-out
suspects stay as disconfirming evidence. Coordinator runs it for product_code/
deploy/ci asks, reusing the pre-computed build evidence.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… in U4)

lib/coverage.mjs derives a per-row evidence-coverage band — TFA confidence capped
by coverage (full keeps it, partial→medium, thin→low) so a RESOLVED built with
evidence unavailable reads as lower confidence BECAUSE of the gap. lib/report.mjs
renders the CSV to markdown: status counts + per-test table + coverage caveats,
degrading missing fields to 'not available' and never crashing on an empty/partial
batch. report-format.md documents the stamp, layout, and the startup reaper resume
path. Blast-radius digest explicitly deferred.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…harness

lib/loop.mjs (runRcaLoop) is an executable mirror of the coordinator loop —
status branching, ask routing, gap resolution, turn-cap, one-thread, soft-PENDING
— driven by an injected submit(). It doubles as the D5 sequential thin-client
harness. tests/conformance.test.mjs replays recorded tfaRcaTurn transcripts
(resolved/blocked/pending/turn-cap fixtures) and proves: rca capture, test_logs
skip, soft-PENDING no-re-poll, turn-cap never submits a 7th turn, and the degraded
(no-capability auto) path still reaches a valid terminal RCA — same loop, same
result. 48 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…, skip turn-cap gather

Code-review fixes (suggested, non-blocking):
- pending-resume removed from TERMINAL_STATES → soft-PENDING rows are now
  re-claimable, listed by pendingRows, and skipped by the reaper (they cleared
  in_flight), so the retained threadId/turnId actually drive an in-session resume
  instead of being stranded as a permanent non-terminal terminal.
- flip() now rejects a missing/non-terminal rca_done without mutating, so a partial
  flip can't clear the claim yet leave the row pending (duplicate-RCA clobber).
- loop checks the turn-cap BEFORE gathering, so evidence on the never-submitted
  final turn isn't gathered for nothing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ruturaj-browserstack ruturaj-browserstack requested a review from a team as a code owner June 23, 2026 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant