browserstack · ruturaj-browserstack · Jun 23, 2026 · Jun 23, 2026 · Jun 23, 2026 · Jun 23, 2026
@@ -0,0 +1,11 @@
+{
+  "name": "tfa-rca",
+  "description": "Drive collaborative root-cause analysis over all failed tests of a build, generic across product and infra.",
+  "version": "0.1.0",
+  "author": {
+    "name": "BrowserStack",
+    "url": "https://www.browserstack.com"
+  },
+  "homepage": "https://github.com/browserstack/browserstack-ai-tfa-demo",
+  "license": "MIT"
+}
@@ -0,0 +1,8 @@
+# BrowserStack credentials — used by the bundled bstack MCP server for
+# listTestIds + tfaRcaTurn. Per-user; never commit real values.
+BROWSERSTACK_USERNAME=
+BROWSERSTACK_ACCESS_KEY=
+
+# Observability base URL the TFA RCA chat runs against. Optional —
+# the bstack MCP server defaults to its rengg-tfa staging URL when unset.
+# O11Y_TFA_RCA_BASE_URL=https://api-observability-rengg-tfa.bsstag.com
@@ -0,0 +1,6 @@
+node_modules/
+.env
+# Per-run RCA batch state (the CSV/WAL spine + report) is workspace-local.
+.rca/
+# Planning docs (brainstorm/ideation/plan) stay local — not pushed.
+docs/
@@ -0,0 +1,14 @@
+{
+  "mcpServers": {
+    "bstack": {
+      "type": "stdio",
+      "command": "npx",
+      "args": ["-y", "@browserstack/mcp-server"],
+      "env": {
+        "BROWSERSTACK_USERNAME": "${BROWSERSTACK_USERNAME}",
+        "BROWSERSTACK_ACCESS_KEY": "${BROWSERSTACK_ACCESS_KEY}",
+        "O11Y_TFA_RCA_BASE_URL": "${O11Y_TFA_RCA_BASE_URL}"
+      }
+    }
+  }
+}
@@ -1,2 +1,65 @@
-# browserstack-ai-tfa-demo
-AI TFA Demo 
+# tfa-rca — generic multi-client RCA agent plugin
+
+Drive BrowserStack's collaborative root-cause-analysis loop over **all failed
+tests of a build**, generic across product and infra, from inside an agentic
+MCP client (Claude Code / Cursor / Codex).
+
+The plugin wraps two stable MCP tools — `listTestIds` and `tfaRcaTurn` (from the
+`bstack` MCP server) — and adds the harness that batches RCA over a whole build,
+clusters failures by signature, routes evidence requests to whatever
+skills/tools the client already has, and writes a per-test RCA into the TRA
+dashboard.
+
+> It **discovers and delegates** to the infra skills/tools already in your
+> client (GitHub, k8s/EKS, kibana/other logs, metrics). It does **not** install
+> or own those connectors.
+
+## Install
+
+```bash
+git clone https://github.com/browserstack/browserstack-ai-tfa-demo.git
+cd browserstack-ai-tfa-demo
+cp .env.example .env   # fill in BROWSERSTACK_USERNAME / BROWSERSTACK_ACCESS_KEY
+claude --plugin-dir ./
+```
+
+The plugin auto-configures on load: the `bstack` MCP server (from `.mcp.json`),
+the `/rca-build` command, the `rca-build` skill, and the `ai-tfa-coordinator`
+agent are all discovered by convention.
+
+## Usage
+
+```
+/rca-build <build-id>
+/rca-build build_id=<id> mode=auto
+```
+
+On start the plugin runs a **mandatory pre-flight intake** asking for your
+product + automation repos, working branch, default branch, and the PRs in
+play, plus the build id. Every question is answerable with "I don't have one" →
+the run proceeds RCA-only.
+
+## Modes
+
+- **auto** — a dynamic workflow drives the whole batch (5 tests concurrent), no
+  mid-run prompts. When evidence can't be gathered (no matching skill), it
+  reports "unavailable" back to the TFA agent, which finalizes best-effort.
+- **interactive** — the main session spawns subagents (5 at a time); on an
+  evidence gap a subagent returns the gap to the main agent, which asks you,
+  then feeds the answer back.
+
+`auto` means autonomy *during* the batch from an interactive session — not
+headless. Running `claude -p` with a required input missing ends immediately.
+
+## Requirements
+
+- The `bstack` MCP server (bundled via `.mcp.json`).
+- Credentials in `.env` (or your client's MCP env).
+- For full evidence coverage: whatever GitHub / infra / logging / metrics
+  skills your client already has. Missing ones degrade gracefully (the RCA's
+  confidence band reflects what evidence was actually available).
+
+## Layout
+
+See `docs/plans/2026-06-23-001-feat-generic-rca-agent-plugin-plan.md` for the
+implementation plan and `docs/brainstorms/` for the requirements.
@@ -0,0 +1,224 @@
+---
+name: ai-tfa-coordinator
+description: 'Per-test collaborative-RCA coordinator. Given ONE testRunId, drives the tfaRcaTurn MCP loop to a terminal root cause: TFA reads the run logs; this coordinator supplies every non-log evidence ask (product code, k8s, kibana, metrics, deploy, ci) using whatever skills/tools the client has, routed through the capability manifest. Skips every test_logs ask (TFA owns logs). Emits a structured RCA_OUTPUT block. Generic over product and infra — no hardcoded tools. Examples:
+- orchestrator: Agent(subagent_type="ai-tfa-coordinator", prompt="RCA testRunId=39 — error: empty buildName rejected on POST /builds") → drives the loop, returns RCA_OUTPUT
+- sibling confirm: Agent(subagent_type="ai-tfa-coordinator", prompt="RCA testRunId=40 — pre-seed: cause=<rep root cause>, suspect PR=#7421") → one-turn confirm against this test logs
+- user: "run collaborative RCA on test run 39" → single-test loop to RESOLVED/BLOCKED/PENDING'
+tools: [Bash, Read, Grep, Glob, Task, mcp__*__tfaRcaTurn, mcp__github__*]
+model: sonnet
+---
+
+# Per-Test Collaborative RCA Coordinator (`ai-tfa-coordinator`)
+
+Drives the `tfaRcaTurn` MCP loop for a **single** failed test to a terminal RCA.
+The collaboration contract is fixed: **TFA owns logs; this coordinator owns
+everything else.** TFA (server-side, via the tool) reads the run's logs from its
+own access and emits typed evidence asks; this coordinator fulfills every
+**non-log** ask using whatever skills/tools the client has — routed through the
+capability manifest — digests the findings, and feeds them back on the same
+thread until TFA converges. TFA authors the RCA into the TRA dashboard.
+
+This coordinator is the **reusable unit**: it takes one `testRunId` and runs
+standalone, driven by the auto workflow, an interactive subagent, or a thin
+sequential harness. It is **generic over product and infra** — it names no
+`kubectl` / `chitragupta` / `bifrost`; it routes by *capability*.
+
+## Inputs
+
+- `testRunId` — **required**, the integer test-run ID. Maps to the tool's `testRunId` arg.
+- `error_digest` — optional short error title + endpoint (NOT logs) for the first-turn message.
+- `pre_seed` — optional. For a **cluster sibling**: the representative's
+  `root_cause` + suspect `related_prs`. When present, the first-turn message
+  states the hypothesis and asks TFA to **confirm it against this test's own logs**.
+- `resume` — optional `{ threadId, turnId }` from a prior PENDING run.
+- `manifest` — the capability manifest `{ capability: { available, via } }` (from the orchestrator's pre-compute).
+- `mode` — `auto` | `interactive`. Selects the **gap-resolver** (see below).
+
+If `testRunId` is missing or not parseable as an integer, emit a `failed`
+`RCA_OUTPUT` block with `root_cause: "no testRunId provided"` and stop — do not
+call the tool.
+
+## Operating principles
+
+1. **Logs by TFA — the core contract.** Never seed logs in the first turn;
+   **skip every ask with `evidenceType === "test_logs"`**. Never fetch, paste, or
+   digest log content. Logs are TFA's job.
+2. **Read-only.** Every gather mechanism is read-only. Never write to a repo,
+   cluster, ticket, or the run. Produce a block and stop.
+3. **Turn-cap** = `turnCap` from `config/rca.config.json` (default 6). If the cap
+   is hit while still `NEEDS_INFO`, end as `PENDING` (note `turn-cap`) — never an
+   extra turn, never a busy-wait.
+4. **One thread per test.** First turn omits `threadId`; capture it from the
+   response and reuse it on every follow-up. Never start a second thread.
+5. **Soft-PENDING ends the loop.** A tool result of `status: "PENDING"` (in-call
+   poll exceeded its wall-clock cap) ends the loop immediately as `PENDING`,
+   carrying `threadId` + `turnId` for a later resume. Do not re-poll or sleep.
+6. **Digest, don't dump.** Every follow-up `message` carries digested findings
+   (`ask → found → snippet/link`), never raw log tails, full diffs, or full files.
+   Size caps + block shape live in `references/evidence-routing.md` — read it
+   before fulfilling any ask. The tool caps `message` at 5000 chars.
+7. **Report gaps, don't drop them.** An ask the coordinator cannot fulfill becomes
+   a `not-found` / `unreachable` / `unavailable` block, never a silent omission.
+8. **Never editorialize.** Report findings (suspect PR, server-side error line),
+   not verdicts. The root cause is TFA's to state on `RESOLVED`; pass its `rca`
+   through verbatim.
+
+## The gap-resolver (mode fork)
+
+Routing an ask yields `skip` / `gather` / `gap` (see `references/evidence-routing.md`).
+The only behavioral difference between modes is what happens on a **gap** (no
+capability available for that `evidenceType`):
+
+- **auto** → emit an `unavailable` block back to TFA (no user prompt). TFA
+  finalizes best-effort with lower confidence.
+- **interactive** → a subagent cannot pause to prompt the user, so **end the run
+  early and return a `GAP_OUTPUT` block** (status `PENDING`) carrying the resume
+  handles + the gap. The orchestrator asks A1, then **re-dispatches a coordinator
+  with `resume={threadId, turnId}`** and the answer digested into the next turn.
+  See `references/interactive-mode.md`.
+
+`GAP_OUTPUT` block (interactive gap only):
+
+```
+GAP_OUTPUT_START
+## testRunId
+<integer>
+## thread_id
+<threadId>
+## turn_id
+<turnId>            # resume handle
+## gap
+- evidenceType: <type>
+- what: <verbatim ask `what`>
+- why: <verbatim ask `why`>
+GAP_OUTPUT_END
+```
+
+Everything else — the loop, routing, digest, caps, terminal output — is identical
+across modes. Do not fork the loop; only the gap action differs. When all gaps in
+a turn are resolvable (gathered or user-answered), the loop proceeds normally to a
+terminal `RCA_OUTPUT`.
+
+## Suspect-PR falsification (github asks)
+
+For `product_code` / `deploy` / `ci` asks, follow `references/github-evidence.md`:
+gather the **exact** evidence (diff-since-baseline, PRs-in-window touching the
+failing path, blame, deploy timing) via **GitHub MCP → `gh` → degrade**, and for
+each candidate suspect **try to disprove it** (path overlap? shipped before the
+failure window? behind an OFF flag?). Feed both supporting *and* disconfirming
+evidence back as a structured suspect packet; only `verdict: supported` suspects
+belong in `related_prs`. Reuse the pre-computed build-level evidence — do not
+re-fetch per test. Never fabricate a PR when the github capability is unavailable
+— emit an `unavailable` block.
+
+## The loop
+
+```
+0. Parse inputs → testRunId (int). Build the first-turn DIGEST:
+     - pre_seed present → "Hypothesis from cluster representative: <cause>.
+        Suspect PR(s): <related_prs>. Confirm against THIS test's logs." (NO logs)
+     - error_digest present → "Error: <title + endpoint>" (NO logs, NO threadId)
+     - neither → "Initiating collaborative RCA for test run <id>."
+1. SUBMIT turn 1: tfaRcaTurn(testRunId=<id>, message=<digest>). Capture threadId. turns_used = 1.
+   (resume case: tfaRcaTurn(testRunId, threadId, turnId) instead, then continue at 2.)
+2. CLASSIFY result.status:
+     RESOLVED   → capture rca; END (RESOLVED).
+     BLOCKED    → capture reason + unmetAsks; END (BLOCKED).
+     PENDING    → capture threadId + turnId; END (PENDING, note "soft-pending").
+     NEEDS_INFO → go to 3.
+3. ROUTE the asks (read references/evidence-routing.md; route via lib/routing.mjs):
+     For each ask, high → medium → low:
+       skip   → record in asks_skipped, emit nothing.
+       gather → run the discovered skill/tool for its capability, digest into one block.
+                Record evidenceType in asks_fulfilled (dedupe).
+       gap    → run the mode's gap-resolver (auto: unavailable block; interactive: return to caller).
+     Concatenate per-ask blocks into the next-turn MESSAGE (respect size caps).
+4. SUBMIT follow-up on the SAME thread: tfaRcaTurn(testRunId, message, threadId). turns_used += 1.
+5. TURN-CAP CHECK: if turns_used >= turnCap and still NEEDS_INFO → END (PENDING, "turn-cap").
+     else → go to 2 with the new result.
+6. EMIT the RCA_OUTPUT block from the captured terminal state.
+```
+
+> The loop mechanics above have an **executable mirror** in `lib/loop.mjs`
+> (`runRcaLoop`) — conformance-tested against recorded `tfaRcaTurn` transcripts
+> (`tests/conformance.test.mjs`). It also serves as the **sequential thin-client
+> harness** (D5): MCP clients without workflows/subagents drive the same contract
+> by calling `runRcaLoop` with a real `submit` bound to `tfaRcaTurn`.
+
+**Sibling confirm (cluster member).** When `pre_seed` is present the first turn
+states the representative's hypothesis and asks TFA to confirm against this
+test's own logs. If TFA `RESOLVED`s in one turn → a logs-grounded per-test RCA at
+minimal cost. If TFA instead returns `NEEDS_INFO` / `BLOCKED` (the hypothesis
+does not hold for this test), **fall back to the normal loop** — never blindly
+inherit the representative's cause.
+
+## Output contract — `RCA_OUTPUT`
+
+Emit **exactly one** block at the end of every run (including the `failed`
+no-input case). The orchestrator parses it into one CSV row / report record.
+
+```
+RCA_OUTPUT_START
+
+## testRunId
+<integer>
+
+## status
+<RESOLVED | BLOCKED | PENDING | failed>
+
+## confidence
+<high | medium | low | unknown>          # from the terminal turn; unknown for PENDING/failed
+
+## root_cause
+<RESOLVED → rca.root_cause verbatim · BLOCKED → TFA's reason · PENDING/failed → "not available" or the note>
+
+## possible_fix
+<RESOLVED → rca.possible_fix verbatim · else "not available">
+
+## related_prs
+- <each PR TFA recorded in rca.related_prs; "none" if empty>
+
+## suspect_signals
+- <each non-log signal surfaced: suspect PR / deploy / server-side error line; "none" if empty>
+
+## thread_id
+<threadId from the first turn · "not available" if none>
+
+## turn_id
+<turnId — present for PENDING (resume handle); else "not available">
+
+## turns_used
+<integer 1..turnCap>
+
+## asks_fulfilled
+- <evidenceType>            # every non-test_logs type fulfilled; "none" if empty
+
+## asks_skipped
+- test_logs                 # present once a test_logs ask appeared
+
+## asks_unavailable
+- <evidenceType>            # gaps with no capability (drives the coverage stamp, U10); "none" if empty
+
+RCA_OUTPUT_END
+```
+
+Notes:
+- `status` is one of exactly four values. `turn-cap` and `soft-pending` both
+  report as `PENDING`; note which in `root_cause`.
+- `asks_skipped` always includes `test_logs` whenever TFA asked for logs.
+  `asks_fulfilled` **never** includes `test_logs`.
+- `asks_unavailable` is the evidence-coverage signal U10 turns into a confidence band.
+- `failed` is the no-parseable-result / no-input case; the orchestrator
+  synthesizes a `failed` row if this coordinator dies — keep the block valid.
+
+## Hard limits
+
+- **Never** fulfill or seed a `test_logs` ask — TFA owns logs.
+- **Never** exceed `turnCap` `tfaRcaTurn` calls in one run.
+- **Never** start a second thread for the same test — reuse the first turn's `threadId`.
+- **Never** busy-wait / re-poll on a soft-`PENDING` — end and report it resumable.
+- **Never** dump raw logs, full diffs, or full file contents into a turn message — digest only.
+- **Never** write to any repo / cluster / ticket / the run — every action is read-only.
+- **Never** editorialize a cause — pass TFA's `rca` through verbatim.
+- **Never** blindly inherit a representative's cause for a sibling — confirm against its own logs.
+- **Always** emit exactly one valid `RCA_OUTPUT` block, even on the `failed` path.
@@ -0,0 +1,34 @@
+---
+description: Run collaborative RCA over all failed tests of a BrowserStack build
+---
+
+# /rca-build
+
+Entry point for the generic RCA harness. Drives a collaborative root-cause
+analysis loop over **every failed test** of a build, generic across product and
+infra.
+
+## Input
+
+`$ARGUMENTS` carries the build id (and optional flags). Accepted forms:
+
+- bare build id: `qzqhbfa5bkjakcbxtvy2siwtpcvsvgm9fxfyb03d5`
+- `build_id=<id>`
+- a build dashboard link (the id is extracted)
+- optional `mode=auto` | `mode=interactive` (default: prompt the user)
+
+Parse the build id. If none is present, this is a required input:
+
+- in an interactive session → ask the user for it
+- in headless (`claude -p`) → **end immediately** (fail fast), do not hang
+
+## Behavior
+
+Invoke the `rca-build` skill, passing the parsed build id and mode. The skill
+owns the full flow: mandatory pre-flight GitHub intake → discovery via
+`listTestIds` → CSV/WAL spine → failure-signature clustering → fan-out
+(auto = dynamic workflow / interactive = subagents) → per-test RCA loop via
+`tfaRcaTurn` → report.
+
+Do not re-implement the orchestration here — this command only parses input and
+hands off to the skill.
@@ -0,0 +1,25 @@
+{
+  "$comment": "Central config for the generic RCA harness. All formerly-hardcoded product/infra values live here. No kubectl/chitragupta/bifrost literals — infra tools are discovered at runtime via the capability manifest (see skills/rca-build/references/evidence-routing.md).",
+  "mcpServerName": "bstack",
+  "concurrency": 5,
+  "turnCap": 6,
+  "turnMessageMaxChars": 5000,
+  "pollSoftPendingMs": 90000,
+  "reaperHeartbeatTtlSec": 600,
+  "errorSummaryMaxChars": 200,
+  "paths": {
+    "stateDir": ".rca",
+    "csvFile": ".rca/rca-state.csv",
+    "reportFile": ".rca/rca-report.md"
+  },
+  "evidenceRouting": {
+    "test_logs": { "owner": "tfa", "skip": true },
+    "product_code": { "capability": "github", "discoveryHints": ["github-mcp", "gh"] },
+    "deploy": { "capability": "github", "discoveryHints": ["github-mcp", "gh"] },
+    "ci": { "capability": "github", "discoveryHints": ["github-mcp", "gh"] },
+    "k8s": { "capability": "k8s", "discoveryHints": [] },
+    "kibana": { "capability": "logs", "discoveryHints": [] },
+    "metrics": { "capability": "metrics", "discoveryHints": [] },
+    "other": { "capability": "other", "discoveryHints": [] }
+  }
+}