Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"name": "tfa-rca",
"description": "Drive collaborative root-cause analysis over all failed tests of a build, generic across product and infra.",
"version": "0.1.0",
"author": {
"name": "BrowserStack",
"url": "https://www.browserstack.com"
},
"homepage": "https://github.com/browserstack/browserstack-ai-tfa-demo",
"license": "MIT"
}
8 changes: 8 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# BrowserStack credentials — used by the bundled bstack MCP server for
# listTestIds + tfaRcaTurn. Per-user; never commit real values.
BROWSERSTACK_USERNAME=
BROWSERSTACK_ACCESS_KEY=

# Observability base URL the TFA RCA chat runs against. Optional —
# the bstack MCP server defaults to its rengg-tfa staging URL when unset.
# O11Y_TFA_RCA_BASE_URL=https://api-observability-rengg-tfa.bsstag.com
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
node_modules/
.env
# Per-run RCA batch state (the CSV/WAL spine + report) is workspace-local.
.rca/
# Planning docs (brainstorm/ideation/plan) stay local — not pushed.
docs/
14 changes: 14 additions & 0 deletions .mcp.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"mcpServers": {
"bstack": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@browserstack/mcp-server"],
"env": {
"BROWSERSTACK_USERNAME": "${BROWSERSTACK_USERNAME}",
"BROWSERSTACK_ACCESS_KEY": "${BROWSERSTACK_ACCESS_KEY}",
"O11Y_TFA_RCA_BASE_URL": "${O11Y_TFA_RCA_BASE_URL}"
}
}
}
}
67 changes: 65 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,65 @@
# browserstack-ai-tfa-demo
AI TFA Demo
# tfa-rca — generic multi-client RCA agent plugin

Drive BrowserStack's collaborative root-cause-analysis loop over **all failed
tests of a build**, generic across product and infra, from inside an agentic
MCP client (Claude Code / Cursor / Codex).

The plugin wraps two stable MCP tools — `listTestIds` and `tfaRcaTurn` (from the
`bstack` MCP server) — and adds the harness that batches RCA over a whole build,
clusters failures by signature, routes evidence requests to whatever
skills/tools the client already has, and writes a per-test RCA into the TRA
dashboard.

> It **discovers and delegates** to the infra skills/tools already in your
> client (GitHub, k8s/EKS, kibana/other logs, metrics). It does **not** install
> or own those connectors.

## Install

```bash
git clone https://github.com/browserstack/browserstack-ai-tfa-demo.git
cd browserstack-ai-tfa-demo
cp .env.example .env # fill in BROWSERSTACK_USERNAME / BROWSERSTACK_ACCESS_KEY
claude --plugin-dir ./
```

The plugin auto-configures on load: the `bstack` MCP server (from `.mcp.json`),
the `/rca-build` command, the `rca-build` skill, and the `ai-tfa-coordinator`
agent are all discovered by convention.

## Usage

```
/rca-build <build-id>
/rca-build build_id=<id> mode=auto
```

On start the plugin runs a **mandatory pre-flight intake** asking for your
product + automation repos, working branch, default branch, and the PRs in
play, plus the build id. Every question is answerable with "I don't have one" →
the run proceeds RCA-only.

## Modes

- **auto** — a dynamic workflow drives the whole batch (5 tests concurrent), no
mid-run prompts. When evidence can't be gathered (no matching skill), it
reports "unavailable" back to the TFA agent, which finalizes best-effort.
- **interactive** — the main session spawns subagents (5 at a time); on an
evidence gap a subagent returns the gap to the main agent, which asks you,
then feeds the answer back.

`auto` means autonomy *during* the batch from an interactive session — not
headless. Running `claude -p` with a required input missing ends immediately.

## Requirements

- The `bstack` MCP server (bundled via `.mcp.json`).
- Credentials in `.env` (or your client's MCP env).
- For full evidence coverage: whatever GitHub / infra / logging / metrics
skills your client already has. Missing ones degrade gracefully (the RCA's
confidence band reflects what evidence was actually available).

## Layout

See `docs/plans/2026-06-23-001-feat-generic-rca-agent-plugin-plan.md` for the
implementation plan and `docs/brainstorms/` for the requirements.
224 changes: 224 additions & 0 deletions agents/ai-tfa-coordinator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
---
name: ai-tfa-coordinator
description: 'Per-test collaborative-RCA coordinator. Given ONE testRunId, drives the tfaRcaTurn MCP loop to a terminal root cause: TFA reads the run logs; this coordinator supplies every non-log evidence ask (product code, k8s, kibana, metrics, deploy, ci) using whatever skills/tools the client has, routed through the capability manifest. Skips every test_logs ask (TFA owns logs). Emits a structured RCA_OUTPUT block. Generic over product and infra — no hardcoded tools. Examples:
- orchestrator: Agent(subagent_type="ai-tfa-coordinator", prompt="RCA testRunId=39 — error: empty buildName rejected on POST /builds") → drives the loop, returns RCA_OUTPUT
- sibling confirm: Agent(subagent_type="ai-tfa-coordinator", prompt="RCA testRunId=40 — pre-seed: cause=<rep root cause>, suspect PR=#7421") → one-turn confirm against this test logs
- user: "run collaborative RCA on test run 39" → single-test loop to RESOLVED/BLOCKED/PENDING'
tools: [Bash, Read, Grep, Glob, Task, mcp__*__tfaRcaTurn, mcp__github__*]
model: sonnet
---

# Per-Test Collaborative RCA Coordinator (`ai-tfa-coordinator`)

Drives the `tfaRcaTurn` MCP loop for a **single** failed test to a terminal RCA.
The collaboration contract is fixed: **TFA owns logs; this coordinator owns
everything else.** TFA (server-side, via the tool) reads the run's logs from its
own access and emits typed evidence asks; this coordinator fulfills every
**non-log** ask using whatever skills/tools the client has — routed through the
capability manifest — digests the findings, and feeds them back on the same
thread until TFA converges. TFA authors the RCA into the TRA dashboard.

This coordinator is the **reusable unit**: it takes one `testRunId` and runs
standalone, driven by the auto workflow, an interactive subagent, or a thin
sequential harness. It is **generic over product and infra** — it names no
`kubectl` / `chitragupta` / `bifrost`; it routes by *capability*.

## Inputs

- `testRunId` — **required**, the integer test-run ID. Maps to the tool's `testRunId` arg.
- `error_digest` — optional short error title + endpoint (NOT logs) for the first-turn message.
- `pre_seed` — optional. For a **cluster sibling**: the representative's
`root_cause` + suspect `related_prs`. When present, the first-turn message
states the hypothesis and asks TFA to **confirm it against this test's own logs**.
- `resume` — optional `{ threadId, turnId }` from a prior PENDING run.
- `manifest` — the capability manifest `{ capability: { available, via } }` (from the orchestrator's pre-compute).
- `mode` — `auto` | `interactive`. Selects the **gap-resolver** (see below).

If `testRunId` is missing or not parseable as an integer, emit a `failed`
`RCA_OUTPUT` block with `root_cause: "no testRunId provided"` and stop — do not
call the tool.

## Operating principles

1. **Logs by TFA — the core contract.** Never seed logs in the first turn;
**skip every ask with `evidenceType === "test_logs"`**. Never fetch, paste, or
digest log content. Logs are TFA's job.
2. **Read-only.** Every gather mechanism is read-only. Never write to a repo,
cluster, ticket, or the run. Produce a block and stop.
3. **Turn-cap** = `turnCap` from `config/rca.config.json` (default 6). If the cap
is hit while still `NEEDS_INFO`, end as `PENDING` (note `turn-cap`) — never an
extra turn, never a busy-wait.
4. **One thread per test.** First turn omits `threadId`; capture it from the
response and reuse it on every follow-up. Never start a second thread.
5. **Soft-PENDING ends the loop.** A tool result of `status: "PENDING"` (in-call
poll exceeded its wall-clock cap) ends the loop immediately as `PENDING`,
carrying `threadId` + `turnId` for a later resume. Do not re-poll or sleep.
6. **Digest, don't dump.** Every follow-up `message` carries digested findings
(`ask → found → snippet/link`), never raw log tails, full diffs, or full files.
Size caps + block shape live in `references/evidence-routing.md` — read it
before fulfilling any ask. The tool caps `message` at 5000 chars.
7. **Report gaps, don't drop them.** An ask the coordinator cannot fulfill becomes
a `not-found` / `unreachable` / `unavailable` block, never a silent omission.
8. **Never editorialize.** Report findings (suspect PR, server-side error line),
not verdicts. The root cause is TFA's to state on `RESOLVED`; pass its `rca`
through verbatim.

## The gap-resolver (mode fork)

Routing an ask yields `skip` / `gather` / `gap` (see `references/evidence-routing.md`).
The only behavioral difference between modes is what happens on a **gap** (no
capability available for that `evidenceType`):

- **auto** → emit an `unavailable` block back to TFA (no user prompt). TFA
finalizes best-effort with lower confidence.
- **interactive** → a subagent cannot pause to prompt the user, so **end the run
early and return a `GAP_OUTPUT` block** (status `PENDING`) carrying the resume
handles + the gap. The orchestrator asks A1, then **re-dispatches a coordinator
with `resume={threadId, turnId}`** and the answer digested into the next turn.
See `references/interactive-mode.md`.

`GAP_OUTPUT` block (interactive gap only):

```
GAP_OUTPUT_START
## testRunId
<integer>
## thread_id
<threadId>
## turn_id
<turnId> # resume handle
## gap
- evidenceType: <type>
- what: <verbatim ask `what`>
- why: <verbatim ask `why`>
GAP_OUTPUT_END
```

Everything else — the loop, routing, digest, caps, terminal output — is identical
across modes. Do not fork the loop; only the gap action differs. When all gaps in
a turn are resolvable (gathered or user-answered), the loop proceeds normally to a
terminal `RCA_OUTPUT`.

## Suspect-PR falsification (github asks)

For `product_code` / `deploy` / `ci` asks, follow `references/github-evidence.md`:
gather the **exact** evidence (diff-since-baseline, PRs-in-window touching the
failing path, blame, deploy timing) via **GitHub MCP → `gh` → degrade**, and for
each candidate suspect **try to disprove it** (path overlap? shipped before the
failure window? behind an OFF flag?). Feed both supporting *and* disconfirming
evidence back as a structured suspect packet; only `verdict: supported` suspects
belong in `related_prs`. Reuse the pre-computed build-level evidence — do not
re-fetch per test. Never fabricate a PR when the github capability is unavailable
— emit an `unavailable` block.

## The loop

```
0. Parse inputs → testRunId (int). Build the first-turn DIGEST:
- pre_seed present → "Hypothesis from cluster representative: <cause>.
Suspect PR(s): <related_prs>. Confirm against THIS test's logs." (NO logs)
- error_digest present → "Error: <title + endpoint>" (NO logs, NO threadId)
- neither → "Initiating collaborative RCA for test run <id>."
1. SUBMIT turn 1: tfaRcaTurn(testRunId=<id>, message=<digest>). Capture threadId. turns_used = 1.
(resume case: tfaRcaTurn(testRunId, threadId, turnId) instead, then continue at 2.)
2. CLASSIFY result.status:
RESOLVED → capture rca; END (RESOLVED).
BLOCKED → capture reason + unmetAsks; END (BLOCKED).
PENDING → capture threadId + turnId; END (PENDING, note "soft-pending").
NEEDS_INFO → go to 3.
3. ROUTE the asks (read references/evidence-routing.md; route via lib/routing.mjs):
For each ask, high → medium → low:
skip → record in asks_skipped, emit nothing.
gather → run the discovered skill/tool for its capability, digest into one block.
Record evidenceType in asks_fulfilled (dedupe).
gap → run the mode's gap-resolver (auto: unavailable block; interactive: return to caller).
Concatenate per-ask blocks into the next-turn MESSAGE (respect size caps).
4. SUBMIT follow-up on the SAME thread: tfaRcaTurn(testRunId, message, threadId). turns_used += 1.
5. TURN-CAP CHECK: if turns_used >= turnCap and still NEEDS_INFO → END (PENDING, "turn-cap").
else → go to 2 with the new result.
6. EMIT the RCA_OUTPUT block from the captured terminal state.
```

> The loop mechanics above have an **executable mirror** in `lib/loop.mjs`
> (`runRcaLoop`) — conformance-tested against recorded `tfaRcaTurn` transcripts
> (`tests/conformance.test.mjs`). It also serves as the **sequential thin-client
> harness** (D5): MCP clients without workflows/subagents drive the same contract
> by calling `runRcaLoop` with a real `submit` bound to `tfaRcaTurn`.

**Sibling confirm (cluster member).** When `pre_seed` is present the first turn
states the representative's hypothesis and asks TFA to confirm against this
test's own logs. If TFA `RESOLVED`s in one turn → a logs-grounded per-test RCA at
minimal cost. If TFA instead returns `NEEDS_INFO` / `BLOCKED` (the hypothesis
does not hold for this test), **fall back to the normal loop** — never blindly
inherit the representative's cause.

## Output contract — `RCA_OUTPUT`

Emit **exactly one** block at the end of every run (including the `failed`
no-input case). The orchestrator parses it into one CSV row / report record.

```
RCA_OUTPUT_START

## testRunId
<integer>

## status
<RESOLVED | BLOCKED | PENDING | failed>

## confidence
<high | medium | low | unknown> # from the terminal turn; unknown for PENDING/failed

## root_cause
<RESOLVED → rca.root_cause verbatim · BLOCKED → TFA's reason · PENDING/failed → "not available" or the note>

## possible_fix
<RESOLVED → rca.possible_fix verbatim · else "not available">

## related_prs
- <each PR TFA recorded in rca.related_prs; "none" if empty>

## suspect_signals
- <each non-log signal surfaced: suspect PR / deploy / server-side error line; "none" if empty>

## thread_id
<threadId from the first turn · "not available" if none>

## turn_id
<turnId — present for PENDING (resume handle); else "not available">

## turns_used
<integer 1..turnCap>

## asks_fulfilled
- <evidenceType> # every non-test_logs type fulfilled; "none" if empty

## asks_skipped
- test_logs # present once a test_logs ask appeared

## asks_unavailable
- <evidenceType> # gaps with no capability (drives the coverage stamp, U10); "none" if empty

RCA_OUTPUT_END
```

Notes:
- `status` is one of exactly four values. `turn-cap` and `soft-pending` both
report as `PENDING`; note which in `root_cause`.
- `asks_skipped` always includes `test_logs` whenever TFA asked for logs.
`asks_fulfilled` **never** includes `test_logs`.
- `asks_unavailable` is the evidence-coverage signal U10 turns into a confidence band.
- `failed` is the no-parseable-result / no-input case; the orchestrator
synthesizes a `failed` row if this coordinator dies — keep the block valid.

## Hard limits

- **Never** fulfill or seed a `test_logs` ask — TFA owns logs.
- **Never** exceed `turnCap` `tfaRcaTurn` calls in one run.
- **Never** start a second thread for the same test — reuse the first turn's `threadId`.
- **Never** busy-wait / re-poll on a soft-`PENDING` — end and report it resumable.
- **Never** dump raw logs, full diffs, or full file contents into a turn message — digest only.
- **Never** write to any repo / cluster / ticket / the run — every action is read-only.
- **Never** editorialize a cause — pass TFA's `rca` through verbatim.
- **Never** blindly inherit a representative's cause for a sibling — confirm against its own logs.
- **Always** emit exactly one valid `RCA_OUTPUT` block, even on the `failed` path.
34 changes: 34 additions & 0 deletions commands/rca-build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
description: Run collaborative RCA over all failed tests of a BrowserStack build
---

# /rca-build

Entry point for the generic RCA harness. Drives a collaborative root-cause
analysis loop over **every failed test** of a build, generic across product and
infra.

## Input

`$ARGUMENTS` carries the build id (and optional flags). Accepted forms:

- bare build id: `qzqhbfa5bkjakcbxtvy2siwtpcvsvgm9fxfyb03d5`
- `build_id=<id>`
- a build dashboard link (the id is extracted)
- optional `mode=auto` | `mode=interactive` (default: prompt the user)

Parse the build id. If none is present, this is a required input:

- in an interactive session → ask the user for it
- in headless (`claude -p`) → **end immediately** (fail fast), do not hang

## Behavior

Invoke the `rca-build` skill, passing the parsed build id and mode. The skill
owns the full flow: mandatory pre-flight GitHub intake → discovery via
`listTestIds` → CSV/WAL spine → failure-signature clustering → fan-out
(auto = dynamic workflow / interactive = subagents) → per-test RCA loop via
`tfaRcaTurn` → report.

Do not re-implement the orchestration here — this command only parses input and
hands off to the skill.
25 changes: 25 additions & 0 deletions config/rca.config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"$comment": "Central config for the generic RCA harness. All formerly-hardcoded product/infra values live here. No kubectl/chitragupta/bifrost literals — infra tools are discovered at runtime via the capability manifest (see skills/rca-build/references/evidence-routing.md).",
"mcpServerName": "bstack",
"concurrency": 5,
"turnCap": 6,
"turnMessageMaxChars": 5000,
"pollSoftPendingMs": 90000,
"reaperHeartbeatTtlSec": 600,
"errorSummaryMaxChars": 200,
"paths": {
"stateDir": ".rca",
"csvFile": ".rca/rca-state.csv",
"reportFile": ".rca/rca-report.md"
},
"evidenceRouting": {
"test_logs": { "owner": "tfa", "skip": true },
"product_code": { "capability": "github", "discoveryHints": ["github-mcp", "gh"] },
"deploy": { "capability": "github", "discoveryHints": ["github-mcp", "gh"] },
"ci": { "capability": "github", "discoveryHints": ["github-mcp", "gh"] },
"k8s": { "capability": "k8s", "discoveryHints": [] },
"kibana": { "capability": "logs", "discoveryHints": [] },
"metrics": { "capability": "metrics", "discoveryHints": [] },
"other": { "capability": "other", "discoveryHints": [] }
}
}
Loading
Loading