feat(rca): generic multi-client RCA agent plugin by ruturaj-browserstack · Pull Request #1 · browserstack/browserstack-ai-tfa-demo

ruturaj-browserstack · 2026-06-23T16:59:44Z

What this is

A portable Claude-Code/Cursor plugin that drives BrowserStack's collaborative
RCA loop (tfaRcaTurn) over all failed tests of a build — generic across
product and infra. It wraps the stable bstack MCP tools (listTestIds +
tfaRcaTurn) and adds the harness that batches RCA over a whole build, clusters
failures by signature, routes evidence requests to whatever skills/tools the
client already has, and records a per-test RCA.

Generalizes the product-/infra-coupled obs-tfa-rca skill: the loop, routing,
digest, and report core are ported; the coupling points (BrowserStack discovery,
fixed kubectl/chitragupta/bifrost, /tmp, server name) become config +
runtime capability discovery.

Companion change (separate repo, not in this PR): @browserstack/mcp-server
listTestIds gains an opt-in includeFailureDetail flag returning a trimmed
per-test failure signature — the seed for clustering with no extra probe turns.

Architecture

Three roles over the stable MCP contract:

rca-build skill (build-level orchestrator) — mandatory pre-flight GitHub
intake, discovery via listTestIds, the CSV/WAL state spine, failure-signature
clustering, build-evidence pre-compute + capability manifest, and fan-out.
ai-tfa-coordinator agent (per-test) — drives the tfaRcaTurn loop
(turn-cap, one-thread, soft-PENDING, digest-not-dump); routes each ask by
capability (no hardcoded tools); runs suspect-PR falsification.
TFA (server-side, test-level) — owns the per-test logs, authors the RCA.

Two modes, one coordinator (only the injected gap-resolver differs):

auto → workflows/rca-batch.mjs dynamic workflow (5 concurrent, no user
input; gap → "unavailable" → best-effort finalize).
interactive → subagents 5 at a time; on an evidence gap a subagent ends with
a GAP_OUTPUT (resume handles) and the orchestrator asks the user, then
re-dispatches with resume=.

Key decisions

Clustering seeds off discovery, not probe turns — signature =
normalize(category | first-error-line | file_path); representative runs the
full loop, siblings pre-seed a one-turn confirm against their own logs, with a
fall-back-to-own-loop safeguard (never blindly inherit).
Evidence routing is config + capability manifest, never hardcoded tools.
No GitHub forensics harness — references/github-evidence.md specifies the
exact evidence needed; the coordinator uses GitHub MCP → gh → degrade.
CSV is a write-ahead log (claim/heartbeat/flip + startup reaper) — in-session
resumable; pending-resume rows stay re-claimable.
Coverage stamp — TFA confidence capped by evidence coverage, so a RESOLVED
built with evidence unavailable reads as lower confidence because of the gap.

In scope: ideation #1–#5 + the v1 slice of #6 (coverage stamp), #7 (resume), #8
(conformance fixture). Deferred: #6 blast-radius digest, #8 git-forensics-MCP,
cross-session durability, Codex/Gemini orchestration parity.

Testing

51 tests pass (npm test → node --test), dependency-free.
Unit coverage: routing registry + manifest, CSV/WAL codec + claim/heartbeat/flip/
reaper + flip-guard + pending-resume resumability, signature normalization +
clustering, evidence cache, coverage band, report renderer.
Conformance fixtures replay recorded tfaRcaTurn transcripts
(resolved/blocked/pending/turn-cap) through the executable loop mirror
(lib/loop.mjs, which doubles as the sequential thin-client harness) — proving
rca capture, test_logs skip, soft-PENDING no-re-poll, turn-cap never submits a
7th turn, and the degraded no-capability path still reaches a valid terminal RCA.
workflows/rca-batch.mjs follows the documented Workflow runtime shape (meta +
pipeline/parallel/agent); it's validated via the conformance fixtures and
the unit-tested libs it relies on (the runtime DSL globals can't be unit-loaded).

Install / usage

git clone … && cd browserstack-ai-tfa-demo
cp .env.example .env   # BROWSERSTACK_USERNAME / BROWSERSTACK_ACCESS_KEY
claude --plugin-dir ./
/rca-build <build-id>

Post-Deploy Monitoring & Validation

No production/runtime impact — this is a client-side plugin (skills/agents/workflow

a bundled MCP server config), not a deployed service. Validation is manual:
run /rca-build against a known red build and confirm every failed test lands a
terminal CSV row + a per-test RCA. The two things to validate live: the sibling
one-turn-confirm cost win, and the "last green" baseline resolution (both have
safeguards so correctness doesn't depend on them).

🤖 Generated with Claude Code

…EADME Identity-only .claude-plugin/plugin.json; root .mcp.json wires the bstack MCP server (stdio); config/rca.config.json centralizes all formerly-hardcoded product/infra values (no kubectl/chitragupta/bifrost literals); /rca-build command parses build id + mode and hands off to the skill. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Port the obs-tfa-rca loop decoupled: ai-tfa-coordinator drives tfaRcaTurn to a terminal RCA (turn-cap, one-thread, soft-PENDING, digest-not-dump) with the gather mechanism routed by capability (no kubectl/chitragupta/bifrost literals). lib/routing.mjs classifies each ask skip/gather/gap against the config registry + capability manifest; the gap action is the only mode fork (auto=unavailable, interactive=ask-user). references/evidence-routing.md carries the digest format and size caps verbatim. Adds sibling pre-seed one-turn-confirm hook. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

SKILL.md orchestrator spec: mandatory GitHub intake ('I don't have one' → RCA-only; headless missing-input fail-fast), discovery via listTestIds(failed, includeFailureDetail), then cluster/pre-compute/fan-out/report steps. lib/csv-state.mjs is the resumable WAL spine — seed (idempotent, terminal- preserving), claim/heartbeat/flip, reaper, pendingRows — with timestamps injected (workflow-sandbox-safe) and an RFC4180 codec for multiline RCA fields. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…otocol lib/signature.mjs computes signature = normalize(category|error|file) off the U1 discovery payload (folds timestamps/uuids/hex/line:col/numbers), groups rows by signature, picks a deterministic representative (non-flaky, then smallest id), and leaves signal-less rows as their own singletons. references/clustering.md documents the O(causes) protocol: representative runs the full loop; siblings pre-seed a one-turn confirm against their own logs with a fall-back-to-own-loop safeguard (never blindly inherit). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

buildManifest enumerates the client's discovered capabilities once into capability→{available,via}, declared to the user + TFA so no evidence is asked for that the client provably can't get. lib/evidence-cache.mjs computes the last-green→this-build delta once and caches by (repo,range,evidenceType) — fresh per-run Map, no module globals (multi-tenant-safe) — with resolveBaseline for the never-green fallback. Routes the same grounded window into every coordinator. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

workflows/rca-batch.mjs orchestrates the batch in auto mode: a pipeline over clusters dispatches ai-tfa-coordinator agents — representative full loop → siblings one-turn-confirm, no barrier between stages — with a structured RCA schema. Sandbox-correct: does no state I/O itself (orchestrator passes the clustered work-list + manifest + pre-computed build evidence via args; each coordinator agent persists its own CSV row eagerly). Gap → 'unavailable' back to TFA, no user prompt. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

references/interactive-mode.md specifies the orchestrator loop: spawn ai-tfa-coordinator subagents 5 at a time; a subagent cannot pause to prompt the user, so on an evidence gap it ends early with a GAP_OUTPUT carrying resume handles (threadId+turnId); the orchestrator asks A1, then re-dispatches with resume= and the answer. Same coordinator as auto — only the gap action differs. Compact blocks not transcripts (lean main context); partial-first; auto-first/ escalate-the-residue noted. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

references/github-evidence.md specifies exactly what each github ask needs (diff-since-baseline, PRs-in-window touching the failing path, blame, deploy timing) and the discovery order GitHub MCP → gh → degrade — no shipped forensics harness. Adds the adversarial falsification protocol (path overlap / deploy-state guard / direction) so only verdict:supported suspects enter related_prs; ruled-out suspects stay as disconfirming evidence. Coordinator runs it for product_code/ deploy/ci asks, reusing the pre-computed build evidence. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… in U4) lib/coverage.mjs derives a per-row evidence-coverage band — TFA confidence capped by coverage (full keeps it, partial→medium, thin→low) so a RESOLVED built with evidence unavailable reads as lower confidence BECAUSE of the gap. lib/report.mjs renders the CSV to markdown: status counts + per-test table + coverage caveats, degrading missing fields to 'not available' and never crashing on an empty/partial batch. report-format.md documents the stamp, layout, and the startup reaper resume path. Blast-radius digest explicitly deferred. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…harness lib/loop.mjs (runRcaLoop) is an executable mirror of the coordinator loop — status branching, ask routing, gap resolution, turn-cap, one-thread, soft-PENDING — driven by an injected submit(). It doubles as the D5 sequential thin-client harness. tests/conformance.test.mjs replays recorded tfaRcaTurn transcripts (resolved/blocked/pending/turn-cap fixtures) and proves: rca capture, test_logs skip, soft-PENDING no-re-poll, turn-cap never submits a 7th turn, and the degraded (no-capability auto) path still reaches a valid terminal RCA — same loop, same result. 48 tests green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…, skip turn-cap gather Code-review fixes (suggested, non-blocking): - pending-resume removed from TERMINAL_STATES → soft-PENDING rows are now re-claimable, listed by pendingRows, and skipped by the reaper (they cleared in_flight), so the retained threadId/turnId actually drive an in-session resume instead of being stranded as a permanent non-terminal terminal. - flip() now rejects a missing/non-terminal rca_done without mutating, so a partial flip can't clear the claim yet leave the row pending (duplicate-RCA clobber). - loop checks the turn-cap BEFORE gathering, so evidence on the never-submitted final turn isn't gathered for nothing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ruturaj-browserstack and others added 12 commits June 23, 2026 22:26

chore(rca): gitignore local planning docs

3d22dfe

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ruturaj-browserstack requested a review from a team as a code owner June 23, 2026 16:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rca): generic multi-client RCA agent plugin#1

feat(rca): generic multi-client RCA agent plugin#1
ruturaj-browserstack wants to merge 12 commits into
mainfrom
feat/generic-rca-agent-plugin

ruturaj-browserstack commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ruturaj-browserstack commented Jun 23, 2026

What this is

Architecture

Key decisions

Testing

Install / usage

Post-Deploy Monitoring & Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant