diff --git a/.agents/skills/kane-cli/SKILL.md b/.agents/skills/kane-cli/SKILL.md index e5bf710..9190215 100644 --- a/.agents/skills/kane-cli/SKILL.md +++ b/.agents/skills/kane-cli/SKILL.md @@ -5,804 +5,255 @@ description: Browser automation via kane-cli — run objectives, parse NDJSON ou # Kane CLI — Browser Automation Skill -Use `kane-cli` for **any task that requires a real browser**: navigating websites, clicking elements, filling forms, searching, testing web UI, taking screenshots, or verifying deployments. - -**Do NOT** use Playwright, Puppeteer, or Selenium directly. `kane-cli` manages Chrome, auth, and the AI automation agent. - -**Always run with `--agent` flag.** This gives structured NDJSON output that you parse and present to the user with rich formatting. - ---- - -## 1. Decision Tree - -When the user's request involves a browser, follow this flow: - -**Is kane-cli installed?** -├─ Unknown → Check with `kane-cli --version` -├─ No → `npm install -g @testmuai/kane-cli` then §2 -└─ Yes ↓ - -**Is kane-cli set up?** -├─ Unknown → Run `kane-cli whoami` to check auth status -├─ No → Go to §2 (Pre-flight Setup) -└─ Yes ↓ - -**What does the user want?** -├─ Single browser task → Build one `kane-cli run --agent` command (§3, §4) -├─ Test/verify something → Same, but use assertion objectives (§4) -├─ Extract data from a page → Same, but use "store as" extraction pattern (§4) -├─ Save / re-run / commit the test → Use `kane-cli testmd` (§7) -├─ Multiple independent tasks → Decompose into sub-objectives, run in parallel via Agent tool (§9) -├─ Debug a failed run → Inspect logs (§8) -└─ Configure kane-cli → Run config commands (§10) - -**After every run:** -1. Parse the NDJSON output (§5) -2. Present rich results with emojis (§6) -3. If failed, inspect logs and diagnose (§8) - ---- - -## 2. Pre-flight Setup - -Before first use, verify installation and auth. - -### Install - -```bash -npm install -g @testmuai/kane-cli -``` - -### Check Auth Status - -```bash -kane-cli whoami -``` - -If this shows "not configured" or errors, run login: - -### Login (Basic Auth) - -```bash -kane-cli login --username --access-key -``` - -This creates the default profile with basic auth, auto-selects the KaneAI project, and marks setup complete. Credentials come from the user's TestmuAI dashboard (Settings → Keys). - -Optional flag: -- `--profile ` — profile name (default: last selected profile check using `config show`) - -### Login (OAuth) - -```bash -kane-cli login --oauth -``` - -This opens the browser for OAuth consent and waits for the callback. Works in both TTY and non-TTY (agent) mode. - -### Login (Interactive — TTY only) - -In a terminal, run `kane-cli login` with no flags for the interactive wizard (auth method → project picker → folder picker). If the user needs this, ask them to run it directly: - -> Please run `! kane-cli login` and complete the sign-in. - -### Verify - -```bash -kane-cli whoami # Auth status -kane-cli config show # Current configuration -``` +Use `kane-cli` for **any task that requires a real browser**: navigating websites, clicking elements, filling forms, searching, testing web UI, taking screenshots, or verifying deployments. Do NOT use Playwright, Puppeteer, or Selenium directly. Always run with `--agent` so output is structured NDJSON you can parse. --- -## 3. Building the Command +## 1. Live narration and results presentation — READ THIS FIRST -Every run uses this pattern: - -```bash -kane-cli run "" --agent [options] -``` - -`--agent` is **mandatory** — it outputs structured NDJSON that you parse and present to the user. - -### Flags - -| Flag | Purpose | Default | -|------|---------|---------| -| `--headless` | No visible browser window | Off (browser visible) | -| `--max-steps ` | Limit agent reasoning steps | 30 | -| `--timeout ` | Kill run after N seconds | No limit | -| `--variables ` | Inline variables JSON | None | -| `--variables-file ` | Load variables from a JSON file | None | -| `--global-context ` | Override global agent context markdown | `~/.testmuai/kaneai/global-memory.md` | -| `--local-context ` | Override local project context markdown | `.testmuai/context.md` | -| `--ws-endpoint ` | Remote browser via WebSocket (e.g. LambdaTest grid) | Local Chrome | -| `--cdp-endpoint ` | Connect to existing Chrome via CDP | Auto-launch Chrome | -| `--code-export` | Generate code export after upload | Off | - -### Exit Codes - -| Code | Meaning | -|------|---------| -| 0 | ✅ Passed | -| 1 | ❌ Failed | -| 2 | ⚠️ Error (auth, setup, infra) | -| 3 | ⏱️ Timeout or cancelled | - -### Variables - -Variables parameterize objectives with reusable values and secrets. Use `{{key}}` syntax in objectives. - -**Format:** -```json -{ - "username": { "value": "alice", "secret": false }, - "password": { "value": "s3cret!", "secret": true } -} -``` +The user is watching this happen in real time. Silence during a kane-cli run is a bug; a one-line "Test passed" instead of the results table is a bug. Both happen because this section used to be buried at line 353 of an 800-line file. It's first now. Follow it exactly. -`secret: true` masks the value in logs and routes it to TestmuAI's secrets store instead of being synced as plain TMS variables. +### 1.1 How to launch kane-cli — Monitor (Claude Code) or Bash (Codex / Gemini) -**Loading order** (later wins): -1. `~/.testmuai/kaneai/variables/*.json` (global, alphabetical) -2. `{cwd}/.testmuai/variables/*.json` (local project overrides) -3. `--variables-file ` -4. `--variables '{...}'` (inline JSON) +**Bash is synchronous — it blocks until kane-cli exits, then hands you the whole stdout at once. That means you cannot narrate event-by-event from a Bash call.** To narrate live, the launch tool must stream stdout line-by-line. -**Always parameterize:** credentials, API keys, tokens, environment-specific URLs. -**OK to hardcode:** one-off URLs, static UI text, navigation paths. - -### Context Files - -Context files provide additional instructions to the agent: -- **Global:** `~/.testmuai/kaneai/global-memory.md` — shared across all runs -- **Local:** `.testmuai/context.md` in cwd — project-specific - -Override per-run with `--global-context` / `--local-context` flags. - -### Examples - -```bash -# Simple browser task -kane-cli run "Go to https://www.amazon.in and search for 'laptop'" --agent - -# Headless with timeout -kane-cli run "Go to https://app.example.com and verify login page loads" --agent --headless --timeout 60 - -# With variables -kane-cli run "Go to https://app.example.com and login with {{username}} and {{password}}" --agent \ - --variables '{"username": {"value": "alice"}, "password": {"value": "secret123", "secret": true}}' - -# Remote browser (LambdaTest grid) -kane-cli run "Go to https://shop.example.com and add item to cart" --agent \ - --ws-endpoint "wss://cdp.lambdatest.com/playwright?capabilities=..." - -# With variables file -kane-cli run "Go to https://staging.myapp.com, login and verify dashboard" --agent \ - --variables-file ./test-creds.json --headless --timeout 120 -``` - ---- - -## 4. Writing Objectives - -The objective string is the most important input. How you phrase it determines what the agent does. - -### Three Patterns - -| Pattern | Trigger Phrases | Agent Behavior | -|---------|----------------|----------------| -| 🎯 **Action** | "go to", "click", "type", "search", "fill", "scroll" | Performs browser actions | -| ✅ **Assertion** | "assert", "verify", "confirm", "check that" | Validates a condition (pass/fail) | -| 📦 **Extraction** | "store X as 'name'" | Reads a value from the page and persists it in structured output | - -### Extraction: The "store as" Pattern - -**Critical.** Vague phrasing like "read", "report", or "tell me" does NOT reliably extract data. The agent may observe the value visually but won't persist it in structured output. - -❌ **Bad** — agent looks but doesn't capture: -``` -"go to example.com and read the page title" -"go to example.com and tell me the price" -``` - -✅ **Good** — agent extracts and persists in `final_state`: -``` -"go to example.com, store the page title as 'page_title'" -"go to example.com, store the price of the first item as 'price'" -``` - -Stored values appear in the `run_end` event's `final_state` and `context.memory` fields. - -### Combining Patterns +| Agent | Launch tool | Live narration possible? | +|---|---|---| +| **Claude Code** | `Monitor` — streams each stdout line as its own notification | ✅ Yes — narrate per event as it arrives | +| **Codex CLI** | `Bash` (or shell equivalent) | ❌ No — narrate post-run from captured stdout | +| **Gemini CLI** | `Bash` (or shell equivalent) | ❌ No — narrate post-run from captured stdout | -Chain action → extraction → assertion in a single objective: +**In Claude Code, you MUST use `Monitor` (not Bash) to launch `kane-cli run` / `kane-cli testmd run`.** Pattern: +```yaml +description: "kane-cli: " +command: kane-cli run "" --agent +timeout_ms: 600000 +persistent: false ``` -"go to {{app_url}}/dashboard, - store the welcome message as 'welcome_text', - store the user role in the sidebar as 'role', - assert the role is 'Admin'" -``` - -### Assertion Specificity - -| Type | Example | -|------|---------| -| **Exact match** | `"assert the cart total shows '$29.99'"` | -| **Flexible match** | `"assert a price is displayed for each product"` | -| **State** | `"assert the Submit button is disabled until all fields are filled"` | -| **Conditional** | `"if a cookie banner appears, dismiss it, then assert the homepage loads"` | -| **Negative** | `"assert no error message or red banner is visible"` | -| **Positional** | `"assert 'Settings' appears in the left sidebar navigation"` | - -### Dos and Don'ts - -| ✅ Do | ❌ Don't | -|-------|---------| -| Use imperative verbs: "go to", "click", "store as" | Use vague verbs: "check out", "look at", "explore" | -| Be specific: "click the 'Add to Cart' button" | Be vague: "add the item" | -| Name extractions: "store X as 'price'" | Hope for values: "tell me the price" | -| Use `{{variables}}` for credentials/URLs | Hardcode secrets in the objective | -| Include starting URL in the objective: "Go to https://..." | Assume the agent knows where to start | -| Split mega-objectives (>15 steps) into multiple runs | Cram everything into one massive objective | ---- - -## 5. Parsing Output (--agent mode) +Every NDJSON line from kane-cli arrives as a notification. The watch ends when kane-cli exits (you'll see the exit code in the final notification). Do NOT also call Bash for the same run — that double-launches kane-cli. -> **Internal reference only.** Everything in this section (field names, event types, JSON structure) is for you to parse programmatically. **Never expose these internal terms to the user.** The user should see plain-language summaries, not `run_end`, `final_state`, `bifurcation`, `NDJSON`, `session_dir`, or any raw JSON fields. +In Codex/Gemini, use Bash with the same `kane-cli ... --agent` command. After it returns, parse the captured stdout as if you had received the events in sequence. -With `--agent`, kane-cli outputs one JSON object per line to **stdout**. Progress UI renders to **stderr**. +### 1.2 Before you launch — emit start line and create todos -### Event Types +**Before** invoking Monitor (or Bash), emit: -**Progress events** (bulk of the output — one per step): - -```json -{"step": 1, "status": "passed", "remark": "Navigated to amazon.in"} -{"step": 2, "status": "passed", "remark": "Typed 'laptop' in search box"} -{"step": 3, "status": "failed", "remark": "Could not find Add to Cart button"} +```text +Starting browser task: . ``` -| Field | Type | Description | -|-------|------|-------------| -| `step` | number | Step index (1-based) | -| `status` | string | `"passed"` or `"failed"` | -| `remark` | string | What the agent did or why it failed | - -These are **untyped** — they have no `type` field. Do **not** key on `event.type === 'step_start'` or `'step_end'`; those event types are not emitted. - -**Flow events:** - -| Event (`type` field) | Key Fields | Purpose | -|-------|-----------|---------| -| `bifurcation` | `flows[]`, `count` | Agent split objective into sub-flows | -| `child_agent_start` | `child_id`, `objective`, `parent_step` | Child agent spawned | -| `child_agent_end` | `child_id`, `success`, `steps_taken`, `summary` | Child agent finished | -| `ask_user` | `question`, `step_index`, `options?` | Agent needs user input | -| `error` | `message` | Error occurred | - -**Note:** There is no `run_start` event — the first line is either a `bifurcation` or a progress object. +Then create these TodoWrite items (skip on Gemini CLI where TodoWrite is unavailable): -**Note:** `ask_user` is auto-disabled when stdin is not a TTY. Since agents typically run kane-cli as a subprocess, ask_user events will not be emitted. Write objectives that don't require interactive input. +1. `Narrate start of ` — mark `in_progress` immediately +2. `Narrate each step as NDJSON arrives` +3. `Present results table after run_end` -### Parsing Strategy +The todos exist so that after Monitor/Bash returns control, the in-context reminder pulls you back into narration mode rather than a generic "parse stdout" mode. -Since progress events lack a `type` field, distinguish them from typed events like this: +### 1.3 During the run — narrate every event -``` -for each line of NDJSON: - if obj.type === "run_end" → terminal event, stop parsing - if obj.type === "bifurcation" → flow split - if obj.type exists → other typed event - if obj.step exists → progress event (step/status/remark) -``` +Progress events have `step`/`status`/`remark` fields and **no `type` field**. Each one gets ONE narration line. -**Build automation on `run_end`** — it is the only event guaranteed to have a stable schema across versions. Use progress events for live status display only. - -**Terminal event** (always the last line): - -```json -{ - "type": "run_end", - "status": "passed", - "summary": "Searched for laptop and added first result to cart", - "one_liner": "Searched for laptop on Amazon and added to cart", - "reason": "Objective completed", - "duration": 45.2, - "credits": 12, - "final_state": { - "price": "$29.99", - "product_name": "Wireless Headphones" - }, - "context": { - "memory": {}, - "variables": {}, - "pointer": "(passed) Searched for laptop and added first result to cart" - }, - "session_dir": "~/.testmuai/kaneai/sessions/a1b2c3d4-e5f6-7890-abcd-ef1234567890", - "run_dir": "~/.testmuai/kaneai/sessions/a1b2c3d4-e5f6-7890-abcd-ef1234567890/runs/0", - "test_url": "https://test-manager.lambdatest.com/projects/123/test-cases/456" -} -``` +**Claude Code (Monitor):** Each Monitor notification IS one event. Narrate it the moment the notification arrives. Do not batch. Do not wait for more events. One notification → one narration line. -Key `run_end` fields: -- `status` — `"passed"` or `"failed"` -- `summary` — what the agent did -- `one_liner` — short summary for display -- `reason` — why it stopped -- `credits` — credits consumed by the run (when reported) -- `final_state` — extracted values from "store as" objectives -- `test_url` — link to KaneAI dashboard (if upload succeeded) -- `session_dir` / `run_dir` — paths to log files +**Codex / Gemini (Bash post-run):** Iterate the captured stdout line-by-line in order. Emit one narration per progress event in sequence before moving on to the results table. -### Responding to `ask_user` (if stdin is a TTY) +Template (both cases): -```json -{"type": "user_response", "answer": "Medium size"} +```text +Step : ``` -To cancel a run: +If `status` is `"failed"`, flag it immediately: -```json -{"type": "cancel"} +```text +Step failed: — the agent is retrying. ``` ---- - -## 6. Presenting Results to the User - -> **Golden rule:** The user should feel like they're watching a browser task happen, not reading a log file. Use plain language, never expose internal field names, JSON keys, file paths, or technical jargon. Translate everything into what the user cares about. - -### 📢 Live Progress (During the Run) +Never expose internal field names (`step`, `status`, `remark`, `run_end`, `final_state`, `bifurcation`, `session_dir`, etc.) to the user. Translate to plain language. -**Do not stay silent while kane-cli runs.** As the command executes, keep the user informed: +### 1.4 After run_end — present the results table -1. **Before starting** — Tell the user what you're about to do: - > Starting browser task: searching for 'laptop' on Amazon... +The terminal event has `type: "run_end"` and stable fields: `status`, `summary`, `one_liner`, `duration`, `credits`, `final_state`, `test_url`, `session_dir`, `run_dir`. -2. **As steps complete** — Relay each step's outcome in plain language as it happens. Parse the progress events from stdout and narrate them: - > Step 1: Opened Amazon homepage - > Step 2: Typed 'laptop' in the search bar - > Step 3: Clicked the search button - > Step 4: Search results loaded — found product listings - -3. **If something goes wrong mid-run** — Flag it immediately, don't wait for the final result: - > Step 5: Could not find the 'Add to Cart' button — the agent is retrying... - -This keeps the user engaged and lets them intervene early if the task is going in the wrong direction. - -### 📋 Results Summary (After the Run) - -**After every run, present a clear summary.** Never just say "it passed" — show the full picture in a user-friendly format. - -**Successful run:** +**For a passing run, always emit this exact table** (substituting the field values): +```markdown | | | |-------|-------| | 🟢 **Result** | Passed | -| 🎯 **Task** | Search for 'laptop' on Amazon | -| ⏱️ **Duration** | 45.2s | -| 👣 **Steps taken** | 7 | -| 📝 **What happened** | Opened Amazon, typed 'laptop' in search, clicked search, results loaded with 48 products | -| 🔗 **View details** | [Open in KaneAI Dashboard](https://test-manager.lambdatest.com/...) | - -**If data was extracted** (from "store as" objectives), show it as a clean results table: - -| 📦 What was found | Value | -|-------------|----------------| -| Top repository | freeCodeCamp/freeCodeCamp | -| Star count | 413k | -| Price | $29.99 | +| 🎯 **Task** | | +| ⏱️ **Duration** | s | +| 👣 **Steps taken** | | +| 📝 **What happened** | | +| 🔗 **View details** | [Open in KaneAI Dashboard]() | +``` -**If assertions were checked**, show pass/fail for each: +**If `final_state` has values** (the user used "store as X" — see §4), append a second table: -| ✅ Check | Result | -|-------------|--------| -| Dashboard shows welcome message | 🟢 Passed | -| User role is Admin | 🔴 Failed | -### ❌ When Things Go Wrong -For failed runs, explain **what went wrong in plain language**: +```markdown +| 📦 What was found | Value | +|-------------|----------------| +| | | +``` -- 🔍 **What failed** — describe the step that failed and why, in the user's terms (not "step_003.json shows dom_action error") -- 📸 **Screenshot** — if a screenshot exists, read and show it so the user can see what the browser looked like at the point of failure -- 💡 **Why it likely failed** — your diagnosis: was the element missing? Did the page not load? Was the objective ambiguous? -- 🔧 **Suggested fix** — a concrete next step: rephrase the objective, increase timeout, check auth, etc. +**If the objective used assertions** ("assert …", "verify …"), append a pass/fail table per assertion derived from the run summary and step remarks. -**Example of a good failure report:** +### 1.5 On failure -> 🔴 **Failed** at step 5 of 9 (after 25s) -> -> **What happened:** The agent clicked "Proceed to Checkout" but the payment form never appeared. The page showed a loading spinner for 15 seconds before the agent timed out. -> -> **Likely cause:** The checkout page may require authentication, or the site's payment service was slow/down. -> -> **Suggested fix:** Try adding an explicit login step before checkout, or increase the timeout to 120s. +For exit code 1 (or `status: "failed"` in `run_end`), present a plain-language failure report — never raw paths or NDJSON. Template: -### 🐛 Suggesting a Bug Report +```markdown +🔴 **Failed** at step of (after s) -If the failure looks like a **kane-cli bug** (not auth, timeout, or a vague objective), offer to file a report: +**What happened:** . -> This looks like it might be a bug in kane-cli. Want me to file a report? +**Likely cause:** -File at: **https://github.com/LambdaTest/kane-cli/issues**. Gather the details automatically — don't ask the user to dig through log files. +**Suggested fix:** . +``` -**Do NOT suggest bug reports for:** auth issues, low timeouts, vague objectives, or website errors (500s, CAPTCHAs). +If a screenshot exists at `/run-test/screenshots/step_.png`, Read it and show it inline before the suggested fix. For deeper diagnosis, see `references/debug.md`. --- -## 7. Saving & Replaying Tests (`testmd`) +## 2. Decision tree -The §3 `run` command is the **primary** mode — one-shot, ephemeral. `testmd` is the **secondary** mode: tests live as `_test.md` files on disk, each step is cached on the first run, and every later run **replays from cache** with no LLM cost. +When the user's request involves a browser: -Use `testmd` whenever the user wants the test to persist. The decision is binary — once a test exists as a file, every later invocation is `testmd run`, never `run`. +**Is kane-cli installed and authenticated?** +- Unknown → `kane-cli whoami` +- No / errors → Read `references/setup-and-config.md` +- Yes ↓ -### When to switch from `run` to `testmd` - -| User says | Use | -|---|---| -| "save this test", "commit this", "keep this", "add this to the suite" | `testmd` | -| "regression test", "smoke test", "make this replayable" | `testmd` | -| "this is a test", "test the X flow end-to-end" (suite-shaped) | `testmd` | -| "run this once", "check if X works right now", "try X" | `run` (§3) | -| "search for", "click", "fill", "verify" (one-shot) | `run` (§3) | - -If unclear, ask: "Do you want me to save this test so you can re-run it later?" - -### Quick start +**What does the user want?** +- A single one-shot browser task → build a `kane-cli run --agent` command (§3 + §4) +- A test they want to save / re-run / commit → Read `references/testmd.md` first, then use `kane-cli testmd` +- Multiple independent browser tasks → Read `references/parallel.md` first +- Debug a failed run → Read `references/debug.md` +- Configure kane-cli or check directory layout → Read `references/setup-and-config.md` +- You need the full NDJSON event schema (rare — §5's summary covers 90% of cases) → Read `references/parsing.md` -Write the file (any path; filename must end in `_test.md`): +**Every run, always:** follow §1 above. -```markdown ---- -mode: testing -max_steps: 30 --- -# Amazon search - -## Open Amazon -Open https://www.amazon.com. - -## Search for headphones -Type "wireless headphones" into the search box and submit. -Verify at least one product result is visible. -``` - -Run it: +## 3. Building a `run` command ```bash -kane-cli testmd run amazon_test.md --agent -``` - -### File format - -Four parts in order: - -1. **YAML frontmatter** — between `--- ... ---` at the very top. -2. **`# Title`** — decorative; everything before the first `## ` is ignored. -3. **`## H2` step headings** — one per step. The agent reads the step body, not the heading. -4. **Step body** — either prose **or** a single `@import ` line. Never both. - -Per-step `yaml` overrides go immediately under the heading, in a fenced block: - -````markdown -## Submit the form -```yaml -timeout: 90 -optional: true +kane-cli run "" --agent [options] ``` -Click submit and verify the confirmation banner. -```` -**Frontmatter keys to use:** +`--agent` is mandatory — it switches stdout to NDJSON. Most-used flags: -| Key | Scope | Description | -|---|---|---| -| `mode` | root | `action` (halts on auth walls) or `testing` (default — pushes through so negative-test assertions can fire) | -| `max_steps` | root + step | Max agent reasoning steps. Default `30`. | -| `timeout` | root + step | Hard kill per step in seconds. | -| `headless` | root | No browser window. | -| `variables` | root + step | `{{name}}` params, same shape as §3, with `secret: true` for credentials | -| `global_context` / `local_context` | root + step | Inline Markdown or path | -| `code_export` / `code_language` | root + step | Generate Playwright after the run; language `python` or `javascript` | - -Files ending in `_test.md` are tests (valid entry points). Any other `.md` is a helper — reachable only via `@import`. - -### The replay & cascade rule (CRITICAL) - -On the **first** run of a test, the agent authors each step and saves a recording. On **every later run**, each step replays from its recording — no agent, no LLM cost, much faster. +| Flag | Purpose | Default | +|------|---------|---------| +| `--headless` | No visible browser window | Off | +| `--max-steps ` | Cap agent reasoning steps | 30 | +| `--timeout ` | Hard kill after N seconds | No limit | +| `--variables ` | Inline variables JSON (for `{{key}}` in objective) | None | +| `--variables-file ` | Load variables from a JSON file | None | +| `--ws-endpoint ` | Remote browser (LambdaTest grid) | Local Chrome | +| `--code-export` | Generate code export after upload | Off | -A step replays only if **all** of these hold: -- A recording for that step exists, -- Its prose is unchanged since the recording, -- Its `yaml` block is unchanged, -- No earlier step in the file invalidated it. +Other flags (`--global-context`, `--local-context`, `--cdp-endpoint`) and the full variables precedence chain live in `references/setup-and-config.md`. -**Editing step N re-authors step N AND every step after it in the same file.** Each step starts where the previous step left off (URL, login, tabs). When step 3 changes, step 4 cannot safely replay against state that no longer exists. +**Exit codes:** `0` passed · `1` failed · `2` auth/infra error · `3` timeout/cancelled. -Consequences when editing tests: -- A one-line tweak at the top of a 20-step test re-authors all 20 steps on the next run. -- To re-record only one step, edit only that step (or steps after it). -- `--author` forces full authoring for one run (debugging only). -- `rm -rf output-/` wipes the cache entirely. +### Examples -### `@import` for reusing flows +```bash +# One-shot +kane-cli run "Go to https://www.amazon.in and search for 'laptop'" --agent -Extract a repeating flow (login, setup, cookie banner dismissal) into a helper file: +# Headless with timeout +kane-cli run "Go to https://app.example.com and verify login page loads" --agent --headless --timeout 60 -```markdown -## Sign in -@import ./helpers/login.md +# With inline credentials +kane-cli run "Go to https://app.example.com and login with {{username}} and {{password}}" --agent \ + --variables '{"username":{"value":"alice"},"password":{"value":"s3cret","secret":true}}' ``` -Rules: -- Helper filename **must not** end in `_test.md`. -- Path resolves relative to the **importing file**, not the shell's cwd. -- The step body must be exactly `@import ` — no mixed prose, no extra lines. -- The step's `yaml` block may contain **only** `optional`. Other keys are rejected. -- `optional: true` on `@import` is allowed only at the root file, not on a nested import. - -Variables and context propagate into helpers. Chrome / `mode` / auth do not (root-only). - -Editing a helper re-authors that step in **every test that imports it**, plus everything after the import in those tests. Same cascade rule. +--- -### Commands +## 4. Writing objectives -| Command | Use | -|---|---| -| `kane-cli testmd run --agent [flags]` | Run a test | -| `kane-cli testmd list` | List `*_test.md` files under cwd (NDJSON when non-TTY) | -| `kane-cli testmd status ` | Test Manager identity + local-sync state | -| `kane-cli testmd export [--code-language python\|javascript]` | Regenerate code export from existing recordings (no browser launch) | -| `kane-cli testmd delete ` | Local-only delete: removes source + `output-/`. Does NOT delete from Test Manager. | +How you phrase the objective string determines what the agent does. Three patterns: -**Flags on `testmd run` that don't exist on §3 `run`:** +> For the full catalog — every action verb, every assertion analyze method (Visual / Textual-DOM / URL / Title / DevTools→Network/Console/Performance/Cookies/localStorage), operators, chaining, conditional/negative patterns, and worked examples — Read `references/objectives-cookbook.md`. Same grammar applies to one-shot `kane-cli run` objectives and `_test.md` step bodies. -| Flag | Default | Description | +| Pattern | Trigger words | Behavior | |---|---|---| -| `--name ` | none | Persist the run under this name. Regex `[a-zA-Z0-9_-]+`. | -| `--on-lock-conflict ` | none | Behavior when another user holds the test's edit lock. `readonly` = replay-only / no upload, `fail` = exit 2, `wait` = block until released | -| `--retry` | off | On replay failure, restart with a shrinking replay window | -| `--retry-count ` | `3` | Max retry restarts before falling back to full re-author | -| `--author` | off | Force authoring every step (skip replay decision) | +| 🎯 **Action** | "go to", "click", "type", "search", "fill" | Performs browser actions | +| ✅ **Assertion** | "assert", "verify", "confirm", "check that" | Pass/fail check on a condition | +| 📦 **Extraction** | "store X as 'name'" | Persists a value into `run_end.final_state` | -All §3 `run` flags also apply (`--agent`, `--headless`, `--max-steps`, `--timeout`, `--variables`, etc.). +### The "store as" rule (critical for extraction) -Flag wins over frontmatter for everything **except** `variables` — the file owns variables; you can add new keys via flags but cannot override file-defined ones. +Vague phrasing like "read", "tell me", "report" does NOT reliably extract data — the agent may see the value but won't capture it. Use "store as". -### Output: `output-/` and `Result.md` +❌ `"go to example.com and read the page title"` +✅ `"go to example.com, store the page title as 'page_title'"` -After a run: - -``` -amazon_test.md -output-amazon/ - Result.md # human-readable run report - .internal/ # cached recordings — do not edit - playwright-python-code/ # only if code_export enabled -``` +Stored values appear in `run_end.final_state` and become the second results table per §1.4. -**`output-/` is commit-safe and should be committed to git.** That's how teammates and CI replay the same recordings. +### Chaining -For tests using `@import`, helper recordings land next to the helper file in `helper-output---/` directories. Also commit-safe. +Action → extraction → assertion in one objective: -**`Result.md`** opens in any Markdown viewer. It contains: -- Frontmatter — `status`, `started`, `duration_s`, `session_id` -- One entry per root step with one of `✓ passed`, `✗ failed`, `⏭ skipped`, optionally suffixed `(optional)` when a soft-failing step failed but the run continued -- For `@import` steps that failed, a path to the failing sub-step inside the helper - -When the user asks "did the test pass?" or "where did it fail?" for a previously-run test, read `Result.md` rather than re-running the test. - -### Recording a `_test.md` from a live session - -If the user runs an ad-hoc objective with §3 `run` and decides to keep it: - -```bash -kane-cli run "Search for noise-cancelling headphones on amazon.com" --name amazon-search -``` - -On exit, kane-cli writes `/.testmuai/tests/amazon-search_test.md`. Move that file into the user's repo and re-run it with `testmd run`. Without `--name`, an ad-hoc `run` is ephemeral and nothing is written. - -### CI invocation - -```bash -kane-cli testmd run ./tests/checkout_test.md \ - --agent \ - --headless \ - --on-lock-conflict wait \ - --retry +```text +"go to {{app_url}}/dashboard, + store the welcome message as 'welcome_text', + assert the user role in the sidebar is 'Admin'" ``` -- `--agent` — NDJSON to stdout (auto-enabled when stdin is not a TTY; pass explicitly anyway). -- `--headless` — no window. -- `--on-lock-conflict wait` — block instead of failing if a teammate is editing the same test. -- `--retry` — automatically recover transient replay failures. - -Exit codes follow §3 with new semantics: -- `2` now includes parse errors and `--on-lock-conflict fail` -- `3` now includes `--on-lock-conflict wait` timeout - -### Parse errors (when writing a `_test.md`) - -Parse errors abort **before** any browser launch with exit `2`. Common ones and the fix: +### Dos and don'ts -| Message | Fix | +| ✅ Do | ❌ Don't | |---|---| -| `frontmatter is missing closing '---'` | Add the trailing `---` | -| `invalid YAML in frontmatter` | Re-validate the YAML block | -| `step body must be exactly one of prose / @import` | Split into two steps | -| `step config on @import may only contain 'optional'` | Remove other keys from the yaml block | -| `cannot @import a test file` | Imports may only reference helpers (not ending in `_test.md`) | -| `cyclic reference` | Restructure helpers to break the loop | -| `chrome config is global-only` | Move Chrome key to root frontmatter | -| `'' is run-level and cannot be set per-step` | Move `mode` / `on_lock_conflict` to root frontmatter | -| `unknown config key` | Remove or fix the key | -| `auth/identity keys are CLI-only` | Pass `username` / `access_key` as CLI flags, not in frontmatter | - -When the user reports a parse error, fix the file before retrying — don't loop on the same error. - ---- - -## 8. Failure Handling & Log Inspection - -When a run fails, diagnose before suggesting fixes. - -### Log Locations - -The `run_end` event provides `session_dir` and `run_dir` paths. Use those directly. - -``` -{session_dir}/ -├── session.json # Session metadata, run list, upload status -├── tui.log # Timeline: session start, run start/end, errors -└── runs/{n}/ - └── run-test/ - └── actions.ndjson # Step-by-step record of agent actions -``` - -### Debugging Flow - -1. **Parse the `run_end` event** from stdout — it has `status`, `reason`, and `summary` plus the `session_dir` / `run_dir` paths. -2. **Read `actions.ndjson`** in `{run_dir}/run-test/` — each line is one agent action with its intent and outcome. -3. **Check `tui.log`** in `{session_dir}/` — for session-level issues (Chrome launch, auth, upload). - -### Common Failure Patterns - -| Symptom | Likely Cause | Fix | -|---------|-------------|-----| -| 🔄 Agent repeats same action | Stuck in a loop / page didn't change | Rephrase objective, add explicit wait or assertion | -| 🎯 Agent clicks wrong element | Ambiguous UI, multiple similar elements | Be more specific: "click the **blue** 'Submit' button in the **checkout form**" | -| 👁️ Agent says done but didn't finish | Objective too vague | Add explicit assertions: "assert the confirmation page shows order number" | -| 💀 Exit code 2, no steps | Auth or Chrome failure | Check `kane-cli whoami`, verify Chrome is available | -| ⏱️ Exit code 3 | Timeout or cancelled | Increase `--timeout` or `--max-steps`, or split into smaller objectives | -| 🚫 "CDP endpoint not reachable" | Chrome not running | Let kane-cli manage Chrome (remove `--cdp-endpoint`) | +| Imperative verbs: "go to", "click", "store as" | Vague verbs: "check out", "look at", "explore" | +| Specific: "click the 'Add to Cart' button" | Vague: "add the item" | +| Name extractions: "store X as 'price'" | Hope for values: "tell me the price" | +| `{{variables}}` for credentials/URLs | Hardcode secrets in the objective | +| Always include starting URL | Assume the agent knows where to start | +| Split mega-objectives (>15 steps) into multiple runs | Cram everything into one | --- -## 9. Parallel Execution - -For multiple independent browser tasks, decompose and run in parallel using the Agent tool. - -### When to Split - -- **>15 steps** — long runs drift and get stuck -- **Independent flows** — login test and search test don't depend on each other -- **Different pages/features** — settings vs checkout vs admin -- **Different user roles** — admin flow vs regular user flow - -### How to Split - -Each sub-objective must be **self-contained**: navigates to its own URL, authenticates independently, asserts its own outcomes. No sub-objective depends on another having run first. - -### Execution Pattern +## 5. Parsing `--agent` output — essentials -1. Decompose the user's request into N independent sub-objectives -2. Spawn N Agent tool calls in a **single message** — each runs: - ```bash - kane-cli run "Go to and " --agent --headless --timeout 120 - ``` -3. Each agent parses the NDJSON output, waits for `run_end`, returns: status, steps, duration, summary, session path -4. After ALL agents complete, format the batch summary +> Internal reference only. Never expose these field names to the user — translate them per §1. -### Agent Prompt Template +Stdout is NDJSON, one event per line. There are two shapes: -``` -Run this kane-cli browser test and report results: +- **Progress events** (most events) have `step` (1-based), `status` (`passed`/`failed`), `remark` — and **no `type` field**. +- **Typed events** have a `type` field: `bifurcation`, `child_agent_start`, `child_agent_end`, `ask_user`, `error`, and finally `run_end`. - kane-cli run "Go to and " --agent --headless --timeout 120 +Parsing strategy: -After the command completes: -1. Capture the exit code -2. Parse the run_end NDJSON event from stdout -3. If failed, read the failing step's screenshot from run_dir -4. Return: {status, steps, duration, summary, session_dir, failure_step, screenshot_path} +```text +for each line: + if obj.type === "run_end" → terminal, stop parsing + else if obj.type exists → typed flow event (rare) + else if obj.step exists → progress event → narrate per §1.3 ``` -### Batch Summary Format - -```markdown -## 🧪 Test Suite: - -| # | Test | Status | Steps | Time | What happened | -|---|------|--------|-------|------|---------| -| 1 | Login + dashboard | ✅ | 5 | 12s | Welcome banner visible | -| 2 | Product search | ✅ | 7 | 18s | 3 results for 'shoes' | -| 3 | Checkout flow | ❌ | 9 | 25s | Payment form did not load | -| 4 | Admin CSV export | ✅ | 6 | 15s | CSV downloaded (42 rows) | - -### 📊 Overall -- **Pass rate:** 3/4 (75%) -- **Total steps:** 27 · **Total time:** 1m10s - -### ❌ Failures -**#3 Checkout flow** — Payment form did not load after clicking "Credit Card". -📸 [screenshot of the failure shown inline] -``` +`run_end` is the only event with a stable cross-version schema — build all post-run logic on it. -Status icons: ✅ passed · ❌ failed · ⚠️ stuck/timeout - -**Do not** show raw file paths (like `~/.testmuai/kaneai/sessions/...`) in the summary. Instead, read the screenshot and show it inline, or offer to inspect logs only if the user asks. +For full event schemas (`bifurcation` flow fields, `child_agent_*`, `ask_user` semantics, `cancel`/`user_response` outbound events, complete `run_end` field list), Read `references/parsing.md`. --- -## 10. Configuration & Reference - -### Config Commands - -```bash -kane-cli config show # Show all current settings -kane-cli config set-window x # Browser window size (e.g. 1920x1080) -kane-cli config chrome-profile # Chrome profile path (or interactive picker in TTY) -kane-cli config project # TMS project ID (or interactive picker in TTY) -kane-cli config folder # TMS folder ID (or interactive picker in TTY) -``` - -### Feedback - -Submit feedback on a completed test run: -```bash -kane-cli feedback --test-id --feedback-type --details "..." -``` - -### Directory Structure +## 6. When to read which reference -``` -~/.testmuai/kaneai/ -├── tui-config.json # Persistent CLI settings -├── config.json # Shared auth configuration -├── global-memory.md # Global agent context -├── chrome-profile/ # Default Chrome user profile -├── profiles/ # Stored credentials -│ └── {profile}/{env}/ -│ └── credentials -├── sessions/ # Session history -│ └── {session-id}/ -│ ├── session.json # Metadata, run list, upload status -│ ├── tui.log # Session event log -│ ├── runs/{n}/ -│ │ └── run-test/ -│ │ └── actions.ndjson # Step-by-step record of agent actions -│ └── code-export/ # (when --code-export) generated code files -└── variables/ # Global variable files - └── *.json - -# Project-local overrides (in cwd): -.testmuai/ -├── context.md # Project-specific agent context -└── variables/ - └── *.json # Project-specific variables -``` - -### Chrome Management - -kane-cli auto-launches Chrome with CDP (DevTools Protocol) on ports 9222–9230. Chrome runs as a detached process and outlives the CLI. - -- `--headless` — runs Chrome in headless mode (no visible window) -- `--cdp-endpoint ` — connect to an already-running Chrome instance -- `--ws-endpoint ` — connect to a remote browser (LambdaTest grid) - -If Chrome fails to launch, ensure Google Chrome is installed and no other process is using CDP ports 9222–9230. +| Situation | Read | +|---|---| +| User wants to save/persist/re-run a test | `references/testmd.md` | +| Run failed, need to diagnose | `references/debug.md` | +| Multiple independent browser tasks | `references/parallel.md` | +| Need full NDJSON event schema | `references/parsing.md` | +| First-time install, auth, or full config | `references/setup-and-config.md` | diff --git a/.agents/skills/kane-cli/references/debug.md b/.agents/skills/kane-cli/references/debug.md new file mode 100644 index 0000000..d599f31 --- /dev/null +++ b/.agents/skills/kane-cli/references/debug.md @@ -0,0 +1,45 @@ + + +# Failure Handling & Log Inspection + +When a run fails, diagnose before suggesting fixes. + +## Log Locations + +The `run_end` event provides `session_dir` and `run_dir` paths. Use those directly. + +```text +{session_dir}/ +├── session.json # Session metadata, run list, upload status +├── tui.log # Timeline: session start, run start/end, errors +└── runs/{n}/ + └── run-test/ + └── actions.ndjson # Step-by-step record of agent actions +``` + +## Debugging Flow + +1. **Parse the `run_end` event** from stdout — it has `status`, `reason`, and `summary` plus the `session_dir` / `run_dir` paths. +2. **Read `actions.ndjson`** in `{run_dir}/run-test/` — each line is one agent action with its intent and outcome. +3. **Check `tui.log`** in `{session_dir}/` — for session-level issues (Chrome launch, auth, upload). + +## Common Failure Patterns + +| Symptom | Likely Cause | Fix | +|---------|-------------|-----| +| 🔄 Agent repeats same action | Stuck in a loop / page didn't change | Rephrase objective, add explicit wait or assertion | +| 🎯 Agent clicks wrong element | Ambiguous UI, multiple similar elements | Be more specific: "click the **blue** 'Submit' button in the **checkout form**" | +| 👁️ Agent says done but didn't finish | Objective too vague | Add explicit assertions: "assert the confirmation page shows order number" | +| 💀 Exit code 2, no steps | Auth or Chrome failure | Check `kane-cli whoami`, verify Chrome is available | +| ⏱️ Exit code 3 | Timeout or cancelled | Increase `--timeout` or `--max-steps`, or split into smaller objectives | +| 🚫 "CDP endpoint not reachable" | Chrome not running | Let kane-cli manage Chrome (remove `--cdp-endpoint`) | + +## Filing a bug report + +If the failure looks like a **kane-cli bug** (not auth, timeout, or a vague objective), offer to file a report: + +> This looks like it might be a bug in kane-cli. Want me to file a report? + +File at: **https://github.com/LambdaTest/kane-cli/issues**. Gather the details automatically — don't ask the user to dig through log files. + +**Do NOT suggest bug reports for:** auth issues, low timeouts, vague objectives, or website errors (500s, CAPTCHAs). diff --git a/.agents/skills/kane-cli/references/objectives-cookbook.md b/.agents/skills/kane-cli/references/objectives-cookbook.md new file mode 100644 index 0000000..c4ee42f --- /dev/null +++ b/.agents/skills/kane-cli/references/objectives-cookbook.md @@ -0,0 +1,372 @@ + + +# Writing Kane-CLI Objectives — Pattern Cookbook + +Read this whenever you're constructing the prose objective for `kane-cli run ""` or the body of a `## Step` in a `_test.md` file. Both surfaces feed the same agent and accept the same grammar. + +--- + +## 1. Anatomy of a good objective + +Three properties make an objective reliable: + +- **Specific** — name the site, the action, and the field values where they matter. +- **Action-oriented** — lead with a verb (`go to`, `search`, `open`, `fill`, `click`, `verify`). +- **Has a success criterion** — state what "done" looks like so the agent knows when to stop. + +Bad → better: + +| | Objective | +|---|---| +| ❌ | Test the login page. | +| ✅ | Open `https://app.example.com/login`, log in as `{{tester}}`, and verify the dashboard URL contains `/home`. | + +The bad version leaves "test" undefined and gives the agent no end state. The better version names the URL, the credentials, and the assertion that closes the loop. + +--- + +## 2. Action verbs — quick catalog + +Reference list. Use these in your prose; the agent recognizes them all. + +| Category | Verbs | +|---|---| +| **Navigation** | go to, open, navigate to, visit, reload, go back, switch to tab/window | +| **Input** | type, fill, enter, paste, clear, select (dropdown), check (checkbox), uncheck, toggle | +| **Click/hover** | click, double-click, right-click, hover, long-press | +| **Scroll/drag** | scroll to, scroll down/up, drag to, drop on | +| **Wait** | wait for, wait until, pause for | +| **File** | upload, attach, download | +| **Misc** | dismiss, accept dialog, switch frame, take screenshot | + +Always include a **starting URL** somewhere in the first action verb if the agent needs to navigate. Never assume the agent knows where to start. + +--- + +## 3. Assertions, extractions, and if/else — using checkpoints + +Checkpoints are the agent's verification primitives. There are three kinds, and each one works with every analyze method below: + +| Kind | Phrasing | What happens | +|---|---|---| +| **Assertion** | "Assert: …", "Verify …", "Confirm …" | Fails the run if the condition is false. | +| **Extraction** | "Store …", "Extract …", "Get …" | Saves a value into `run_end.final_state` for later use. | +| **If/Else** | "If … then … else …" | Branches the run based on a condition. | + +### 3.1 Analyze methods — where the agent looks + +The agent automatically picks the right method based on phrasing. To get the method you want, use the language column. + +| Method | Use it for | Phrasing the agent recognizes | +|---|---|---| +| **Visual** (default) | Visible text, prices, labels, counts, color names, visibility | "the price …", "the heading …", "is visible", "displays", "is shown" | +| **Textual (DOM)** | Element states, CSS properties, HTML attributes, exact CSS color values | "is disabled / enabled / checked / readonly", "the placeholder of …", "the aria-label of …", "the font-size of …", "rgb(…)" / "#hex" | +| **URL** | Address bar — path, query, fragment, redirects | "URL contains …", "URL path is …", "URL has param …", "redirected to …" | +| **Title** | Browser tab `document.title` | "page title contains …", "title is …" | +| **DevTools** | Things not visible on screen — network, console, performance, cookies, localStorage | see §3.2 below | + +### 3.2 DevTools analyze methods + +Five subdomains. Each one is the right choice when the data you care about lives in the browser's internals rather than on the page. + +#### Network (HTTP traffic) + +The agent captures every HTTP request/response per step. **Resets each step** — assert on traffic in the same step it happens (or extract and carry forward). + +Queryable fields: `method`, `url`, `domain`, `path`, `query_params`, `resource_type`, `request_headers`, `request_body`, `response_status`, `response_headers`, `response_body`, `timing.duration_ms`, `timing.ttfb_ms`, `failed`, `failure_reason`. + +```text +Assert: no API calls returned 5xx status codes +Assert: the POST /api/login returned HTTP status 200 +Assert: all API responses completed in under 2 seconds +Assert: no network requests failed with connection errors +Assert: the /posts endpoint returned at least 10 items in the response body + +Store the response body of the POST /api/login request +Extract the status code of the last API call to /api/users +Store all API request URLs + +If the /api/auth returned 200 then proceed to dashboard, else show error message +``` + +Limits: up to 5,000 requests per step, response bodies capped at 64KB, binary content (images/fonts/videos) skipped. + +#### Console (browser console output) + +Captures every `console.log/warn/error/info/debug` and every uncaught JS exception. **Resets each step**. Top frame only — iframes (payment widgets, third-party embeds) are not captured. + +Levels normalize to: `log`, `warning`, `error`, `info`, `debug`. `errors` includes both `console.error()` and uncaught exceptions; `exceptions` is just the uncaught-exception subset (where `is_exception: true`). + +```text +Assert: no console errors on the page +Assert: no uncaught JavaScript exceptions +Assert: no JS errors after clicking Submit +Assert: console contains "Amplitude SDK triggered" +Assert: no console warnings + +Store all console error messages +Extract the first console error text + +If console contains "feature_flag_enabled" then use new flow, else use legacy flow +``` + +#### Performance (Core Web Vitals) + +Point-in-time read of the **last full page navigation's** metrics. Place the assertion after the page has loaded; use a wait step if the page needs time to settle. + +Available metrics with good thresholds: + +| Metric | Measures | Good | +|---|---|---| +| **LCP** | Largest Contentful Paint | < 2,500ms | +| **CLS** | Cumulative Layout Shift | < 0.1 | +| **INP** | Interaction to Next Paint (requires user interaction) | < 200ms | +| **FCP** | First Contentful Paint | < 1,800ms | +| **TTFB** | Time to First Byte | < 800ms | + +```text +Assert: page LCP is under 2500ms +Assert: CLS is below 0.1 +Assert: TTFB is under 800ms +Assert: page performance meets Core Web Vitals thresholds + +Store the page LCP value +Extract all web vitals metrics +``` + +#### Cookies + +Snapshot at assertion time. Sees `httpOnly` cookies too (unlike `document.cookie`). Cookies persist across steps; asserting on a different domain may show different cookies. + +Fields: `name`, `value`, `domain`, `path`, `expires`, `http_only`, `secure`, `same_site` (`Strict`/`Lax`/`None`). + +```text +Assert: a cookie named "session_id" exists +Assert: the session cookie is httpOnly +Assert: no cookies are set without the Secure flag +Assert: the auth cookie has sameSite set to "Strict" + +Store all cookies +Extract the value of the "session_id" cookie + +If a cookie named "auth_token" exists then go to dashboard, else go to login +``` + +#### localStorage + +Snapshot at assertion time. Per-origin (protocol + domain + port). Persists across steps as long as you stay on the same origin. Values are always strings — if the app stores JSON, the value is the raw JSON string but the agent will parse it to drill into fields. + +```text +Assert: auth_token exists in localStorage +Assert: the theme preference in localStorage is "dark" +Assert: localStorage has fewer than 10 items +Assert: the "theme" field in the user_prefs localStorage item is "dark" + +Store all localStorage items +Extract the auth_token from localStorage +Get all localStorage keys + +If localStorage has "onboarding_complete" then show dashboard, else start onboarding +``` + +### 3.3 Operators + +Assertions support these comparisons. Phrase them naturally — the agent maps to the right operator. + +| Operator | Meaning | Example | +|---|---|---| +| `equals` | Exact match | "price equals $29.99", "title is 'Home'" | +| `contains` | Substring match | "URL contains /checkout" | +| `not_contains` | Does not contain | "title not contains 'Error'" | +| `gt` / `gte` | Greater than / or equal | "items greater than 5" | +| `lt` / `lte` | Less than / or equal | "LCP less than 2500" | +| `not_equals` | Not equal | "status not equals 'failed'" | + +### 3.4 Picking the right method when in doubt + +- "Is the price $29.99?" — **Visual** (it's on screen). +- "Is the submit button disabled?" — **Textual/DOM** (state, not visible text). +- "Does this red background match exactly `rgb(220, 38, 38)`?" — **Textual/DOM** (exact CSS). +- "Are we on the checkout page?" — **URL** (address bar). +- "Did the page send any failed API calls?" — **DevTools/Network**. +- "Are there console errors?" — **DevTools/Console**. +- "Is the page fast?" — **DevTools/Performance** (LCP/FCP/TTFB). +- "Did the login set a session cookie?" — **DevTools/Cookies**. +- "Did the app store the auth token?" — **DevTools/localStorage**. + +If you're not sure which method, default to **Visual** — that's what the agent does too. + +--- + +## 4. Extraction — the "store as" rule + +Vague phrasing like "read", "tell me", "report" does NOT reliably persist data. The agent may *observe* the value but won't *capture* it into `run_end.final_state`. + +```text +❌ "go to example.com and read the page title" +❌ "go to example.com and tell me the price" + +✅ "go to example.com, store the page title as 'page_title'" +✅ "go to example.com, store the price of the first item as 'price'" +``` + +For DevTools extractions, the same rule applies — use "store" or "extract": + +```text +✅ "store the response body of the POST /api/login as 'login_response'" +✅ "extract the value of the session_id cookie as 'session'" +``` + +Stored values land in `run_end.final_state` and feed the second results table per `SKILL.md §1.4`. + +--- + +## 5. Chaining — action → extraction → assertion + +Multi-clause objectives are fine — and often preferable to splitting into multiple steps when the operations are tightly coupled. + +```text +"go to {{app_url}}/dashboard, + store the welcome message as 'welcome_text', + store the user role in the sidebar as 'role', + assert the role is 'Admin'" +``` + +```text +"open https://shop.example.com, + add the first 'Wireless Headphones' result to the cart, + navigate to the cart, + store the cart total as 'total', + assert the cart contains exactly one item" +``` + +```text +"go to {{app_url}}/api-health, + store the API response body as 'health', + assert no console errors, + assert no API calls returned 5xx" +``` + +When chaining, keep each clause as a complete instruction. The agent processes them in order. + +### Splitting vs. chaining — when to break into multiple steps + +| Chain in one objective | Split into separate steps | +|---|---| +| ≤ 15 clauses, related state | > 15 reasoning steps expected | +| All happen on one page or flow | Different flows / different user roles | +| Extraction needed for the assertion in the same objective | Each step is independently testable | + +For `_test.md` step bodies, each step is its own objective — split aggressively. For one-shot `kane-cli run`, chain when the operations share state. + +--- + +## 6. Variables and context + +Use `{{name}}` syntax for values that should be parameterized: + +```text +"Log in as {{username}} with password {{password}}, then verify the dashboard loads" +``` + +**Always parameterize:** credentials, API keys, tokens, environment-specific URLs. +**OK to hardcode:** one-off URLs, static UI text, navigation paths. + +Mark credentials with `secret: true` in the variables JSON so they're masked in logs and routed to the secrets store: + +```json +{ + "username": { "value": "alice", "secret": false }, + "password": { "value": "s3cret!", "secret": true } +} +``` + +For the full variables-loading precedence and context-file behavior, Read `references/setup-and-config.md`. + +--- + +## 7. Conditional and negative patterns + +Conditional objectives let the agent handle optional UI states without failing: + +```text +"go to {{app_url}}, if a cookie banner appears then dismiss it, then assert the homepage loads" + +"open the dashboard, if a 'What's new' modal is visible then close it, then click Settings" +``` + +Negative assertions verify the *absence* of something: + +```text +"after submitting, assert no error message or red banner is visible" +"assert no console errors after clicking Save" +"assert no API calls failed during the checkout flow" +``` + +Positional assertions check where something is on the page: + +```text +"assert 'Settings' appears in the left sidebar navigation" +"assert the 'Cancel' button is on the right side of the modal footer" +``` + +--- + +## 8. Common pitfalls + +| ❌ Don't | ✅ Do | Why | +|---|---|---| +| "Test the checkout flow" | "Go to /cart, click Checkout, fill the address form with {{tester}}, click Pay, assert the order confirmation page loads" | "Test" has no end state — the agent doesn't know when to stop. | +| "Add the item" | "Click the 'Add to Cart' button on the first product card" | Vague target — agent may click the wrong element. | +| "Tell me the price" | "Store the cart total as 'total'" | Vague verbs don't extract — use "store" / "extract" / "get". | +| Hardcode credentials in the objective | Use `{{username}}` / `{{password}}` from `--variables-file` | Credentials in plain text leak into logs and TMS. | +| Omit the URL | "Go to https://example.com/login first, then …" | Agent doesn't know where to start. | +| Cram 25 operations into one objective | Split at logical boundaries (login, navigate, action, verify) | Long runs drift and stall. | +| "Check the page is fast" | "Assert LCP is under 2500ms and CLS is below 0.1" | Use the explicit web-vital metric, not a vague "fast." | +| "Make sure no errors" | "Assert no console errors and no API calls returned 5xx" | Be explicit about which kind of error you're checking. | + +--- + +## 9. Worked end-to-end examples + +### Example A — Single-page assertion suite + +```text +"go to https://shop.example.com/products/42, + assert the product title is 'Wireless Headphones', + assert the price is $129.99, + store the SKU as 'sku', + assert URL contains /products/42, + assert page LCP is under 2500ms, + assert no console errors" +``` + +This exercises Visual (title, price), Extraction (SKU), URL, Performance, and Console — all in one objective. + +### Example B — Login + dashboard verification + +```text +"open https://app.example.com/login, + log in with email {{tester.email}} and password {{tester.password}}, + assert the URL redirected to /dashboard, + assert a cookie named 'session_id' exists and is httpOnly, + assert no API calls returned 5xx during login, + store the user role from the sidebar as 'role', + assert the role is 'Admin'" +``` + +### Example C — testmd step body (same grammar) + +In a `_test.md` file: + +```markdown +## Verify checkout flow happy path +Open https://shop.example.com, log in as {{tester}}, add the first +'Wireless Headphones' result to the cart, navigate to checkout, +fill the shipping address with {{tester.address}}, click Pay. +Assert the order confirmation page loads. +Assert no console errors and no API calls returned 5xx. +Store the order number as 'order_id'. +``` + +The step body is exactly the same grammar as `kane-cli run`. Everything in this cookbook applies. diff --git a/.agents/skills/kane-cli/references/parallel.md b/.agents/skills/kane-cli/references/parallel.md new file mode 100644 index 0000000..e9a6cbc --- /dev/null +++ b/.agents/skills/kane-cli/references/parallel.md @@ -0,0 +1,65 @@ + + +# Parallel Execution + +For multiple independent browser tasks, decompose and run in parallel using the Agent tool. + +## When to Split + +- **>15 steps** — long runs drift and get stuck +- **Independent flows** — login test and search test don't depend on each other +- **Different pages/features** — settings vs checkout vs admin +- **Different user roles** — admin flow vs regular user flow + +## How to Split + +Each sub-objective must be **self-contained**: navigates to its own URL, authenticates independently, asserts its own outcomes. No sub-objective depends on another having run first. + +## Execution Pattern + +1. Decompose the user's request into N independent sub-objectives +2. Spawn N Agent tool calls in a **single message** — each runs: + ```bash + kane-cli run "Go to and " --agent --headless --timeout 120 + ``` +3. Each agent parses the NDJSON output, waits for `run_end`, returns: status, steps, duration, summary, session path +4. After ALL agents complete, format the batch summary + +## Agent Prompt Template + +```text +Run this kane-cli browser test and report results: + + kane-cli run "Go to and " --agent --headless --timeout 120 + +After the command completes: +1. Capture the exit code +2. Parse the run_end NDJSON event from stdout +3. If failed, read the failing step's screenshot from run_dir +4. Return: {status, steps, duration, summary, session_dir, failure_step, screenshot_path} +``` + +## Batch Summary Format + +```markdown +## 🧪 Test Suite: + +| # | Test | Status | Steps | Time | What happened | +|---|------|--------|-------|------|---------| +| 1 | Login + dashboard | ✅ | 5 | 12s | Welcome banner visible | +| 2 | Product search | ✅ | 7 | 18s | 3 results for 'shoes' | +| 3 | Checkout flow | ❌ | 9 | 25s | Payment form did not load | +| 4 | Admin CSV export | ✅ | 6 | 15s | CSV downloaded (42 rows) | + +### 📊 Overall +- **Pass rate:** 3/4 (75%) +- **Total steps:** 27 · **Total time:** 1m10s + +### ❌ Failures +**#3 Checkout flow** — Payment form did not load after clicking "Credit Card". +📸 [screenshot of the failure shown inline] +``` + +Status icons: ✅ passed · ❌ failed · ⚠️ stuck/timeout + +**Do not** show raw file paths (like `~/.testmuai/kaneai/sessions/...`) in the summary. Instead, read the screenshot and show it inline, or offer to inspect logs only if the user asks. diff --git a/.agents/skills/kane-cli/references/parsing.md b/.agents/skills/kane-cli/references/parsing.md new file mode 100644 index 0000000..517b817 --- /dev/null +++ b/.agents/skills/kane-cli/references/parsing.md @@ -0,0 +1,101 @@ + + +# Parsing --agent Output + +> **Internal reference only.** Everything in this section (field names, event types, JSON structure) is for you to parse programmatically. **Never expose these internal terms to the user.** The user should see plain-language summaries, not `run_end`, `final_state`, `bifurcation`, `NDJSON`, `session_dir`, or any raw JSON fields. + +With `--agent`, kane-cli outputs one JSON object per line to **stdout**. Progress UI renders to **stderr**. + +## Event Types + +**Progress events** (bulk of the output — one per step): + +```json +{"step": 1, "status": "passed", "remark": "Navigated to amazon.in"} +{"step": 2, "status": "passed", "remark": "Typed 'laptop' in search box"} +{"step": 3, "status": "failed", "remark": "Could not find Add to Cart button"} +``` + +| Field | Type | Description | +|-------|------|-------------| +| `step` | number | Step index (1-based) | +| `status` | string | `"passed"` or `"failed"` | +| `remark` | string | What the agent did or why it failed | + +These are **untyped** — they have no `type` field. Do **not** key on `event.type === 'step_start'` or `'step_end'`; those event types are not emitted. + +**Flow events:** + +| Event (`type` field) | Key Fields | Purpose | +|-------|-----------|---------| +| `bifurcation` | `flows[]`, `count` | Agent split objective into sub-flows | +| `child_agent_start` | `child_id`, `objective`, `parent_step` | Child agent spawned | +| `child_agent_end` | `child_id`, `success`, `steps_taken`, `summary` | Child agent finished | +| `ask_user` | `question`, `step_index`, `options?` | Agent needs user input | +| `error` | `message` | Error occurred | + +**Note:** There is no `run_start` event — the first line is either a `bifurcation` or a progress object. + +**Note:** `ask_user` is auto-disabled when stdin is not a TTY. Since agents typically run kane-cli as a subprocess, ask_user events will not be emitted. Write objectives that don't require interactive input. + +## Parsing Strategy + +Since progress events lack a `type` field, distinguish them from typed events like this: + +``` +for each line of NDJSON: + if obj.type === "run_end" → terminal event, stop parsing + if obj.type === "bifurcation" → flow split + if obj.type exists → other typed event + if obj.step exists → progress event (step/status/remark) +``` + +**Build automation on `run_end`** — it is the only event guaranteed to have a stable schema across versions. Use progress events for live status display only. + +**Terminal event** (always the last line): + +```json +{ + "type": "run_end", + "status": "passed", + "summary": "Searched for laptop and added first result to cart", + "one_liner": "Searched for laptop on Amazon and added to cart", + "reason": "Objective completed", + "duration": 45.2, + "credits": 12, + "final_state": { + "price": "$29.99", + "product_name": "Wireless Headphones" + }, + "context": { + "memory": {}, + "variables": {}, + "pointer": "(passed) Searched for laptop and added first result to cart" + }, + "session_dir": "~/.testmuai/kaneai/sessions/a1b2c3d4-e5f6-7890-abcd-ef1234567890", + "run_dir": "~/.testmuai/kaneai/sessions/a1b2c3d4-e5f6-7890-abcd-ef1234567890/runs/0", + "test_url": "https://test-manager.lambdatest.com/projects/123/test-cases/456" +} +``` + +Key `run_end` fields: +- `status` — `"passed"` or `"failed"` +- `summary` — what the agent did +- `one_liner` — short summary for display +- `reason` — why it stopped +- `credits` — credits consumed by the run (when reported) +- `final_state` — extracted values from "store as" objectives +- `test_url` — link to KaneAI dashboard (if upload succeeded) +- `session_dir` / `run_dir` — paths to log files + +## Responding to `ask_user` (if stdin is a TTY) + +```json +{"type": "user_response", "answer": "Medium size"} +``` + +To cancel a run: + +```json +{"type": "cancel"} +``` diff --git a/.agents/skills/kane-cli/references/setup-and-config.md b/.agents/skills/kane-cli/references/setup-and-config.md new file mode 100644 index 0000000..81a4fe6 --- /dev/null +++ b/.agents/skills/kane-cli/references/setup-and-config.md @@ -0,0 +1,140 @@ + + +# kane-cli Setup, Variables, and Config Reference + +## Install and auth + +Before first use, verify installation and auth. + +### Install + +```bash +npm install -g @testmuai/kane-cli +``` + +### Check Auth Status + +```bash +kane-cli whoami +``` + +If this shows "not configured" or errors, run login: + +### Login (Basic Auth) + +```bash +kane-cli login --username --access-key +``` + +This creates the default profile with basic auth, auto-selects the KaneAI project, and marks setup complete. Credentials come from the user's TestmuAI dashboard (Settings → Keys). + +Optional flag: +- `--profile ` — profile name (default: last selected profile check using `config show`) + +### Login (OAuth) + +```bash +kane-cli login --oauth +``` + +This opens the browser for OAuth consent and waits for the callback. Works in both TTY and non-TTY (agent) mode. + +### Login (Interactive — TTY only) + +In a terminal, run `kane-cli login` with no flags for the interactive wizard (auth method → project picker → folder picker). If the user needs this, ask them to run it directly: + +> Please run `! kane-cli login` and complete the sign-in. + +### Verify + +```bash +kane-cli whoami # Auth status +kane-cli config show # Current configuration +``` + +## Variables — full precedence chain + +Variables parameterize objectives with reusable values and secrets. Use `{{key}}` syntax in objectives. + +**Format:** +```json +{ + "username": { "value": "alice", "secret": false }, + "password": { "value": "s3cret!", "secret": true } +} +``` + +`secret: true` masks the value in logs and routes it to TestmuAI's secrets store instead of being synced as plain TMS variables. + +**Loading order** (later wins): +1. `~/.testmuai/kaneai/variables/*.json` (global, alphabetical) +2. `{cwd}/.testmuai/variables/*.json` (local project overrides) +3. `--variables-file ` +4. `--variables '{...}'` (inline JSON) + +**Always parameterize:** credentials, API keys, tokens, environment-specific URLs. +**OK to hardcode:** one-off URLs, static UI text, navigation paths. + +## Context files + +Context files provide additional instructions to the agent: +- **Global:** `~/.testmuai/kaneai/global-memory.md` — shared across all runs +- **Local:** `.testmuai/context.md` in cwd — project-specific + +Override per-run with `--global-context` / `--local-context` flags. + +## Config commands + +```bash +kane-cli config show # Show all current settings +kane-cli config set-window x # Browser window size (e.g. 1920x1080) +kane-cli config chrome-profile # Chrome profile path (or interactive picker in TTY) +kane-cli config project # TMS project ID (or interactive picker in TTY) +kane-cli config folder # TMS folder ID (or interactive picker in TTY) +``` + +### Feedback + +Submit feedback on a completed test run: +```bash +kane-cli feedback --test-id --feedback-type --details "..." +``` + +## Directory structure + +```text +~/.testmuai/kaneai/ +├── tui-config.json # Persistent CLI settings +├── config.json # Shared auth configuration +├── global-memory.md # Global agent context +├── chrome-profile/ # Default Chrome user profile +├── profiles/ # Stored credentials +│ └── {profile}/{env}/ +│ └── credentials +├── sessions/ # Session history +│ └── {session-id}/ +│ ├── session.json # Metadata, run list, upload status +│ ├── tui.log # Session event log +│ ├── runs/{n}/ +│ │ └── run-test/ +│ │ └── actions.ndjson # Step-by-step record of agent actions +│ └── code-export/ # (when --code-export) generated code files +└── variables/ # Global variable files + └── *.json + +# Project-local overrides (in cwd): +.testmuai/ +├── context.md # Project-specific agent context +└── variables/ + └── *.json # Project-specific variables +``` + +## Chrome management + +kane-cli auto-launches Chrome with CDP (DevTools Protocol) on ports 9222–9230. Chrome runs as a detached process and outlives the CLI. + +- `--headless` — runs Chrome in headless mode (no visible window) +- `--cdp-endpoint ` — connect to an already-running Chrome instance +- `--ws-endpoint ` — connect to a remote browser (LambdaTest grid) + +If Chrome fails to launch, ensure Google Chrome is installed and no other process is using CDP ports 9222–9230. diff --git a/.agents/skills/kane-cli/references/testmd.md b/.agents/skills/kane-cli/references/testmd.md new file mode 100644 index 0000000..5ee1fbb --- /dev/null +++ b/.agents/skills/kane-cli/references/testmd.md @@ -0,0 +1,217 @@ + + +# Saving & Replaying Tests with testmd + +The §3 `run` command is the **primary** mode — one-shot, ephemeral. `testmd` is the **secondary** mode: tests live as `_test.md` files on disk, each step is cached on the first run, and every later run **replays from cache** with no LLM cost. + +Use `testmd` whenever the user wants the test to persist. The decision is binary — once a test exists as a file, every later invocation is `testmd run`, never `run`. + +## When to switch from `run` to `testmd` + +| User says | Use | +|---|---| +| "save this test", "commit this", "keep this", "add this to the suite" | `testmd` | +| "regression test", "smoke test", "make this replayable" | `testmd` | +| "this is a test", "test the X flow end-to-end" (suite-shaped) | `testmd` | +| "run this once", "check if X works right now", "try X" | `run` (§3) | +| "search for", "click", "fill", "verify" (one-shot) | `run` (§3) | + +If unclear, ask: "Do you want me to save this test so you can re-run it later?" + +## Quick start + +Write the file (any path; filename must end in `_test.md`): + +```markdown +--- +mode: testing +max_steps: 30 +--- + +# Amazon search + +## Open Amazon +Open https://www.amazon.com. + +## Search for headphones +Type "wireless headphones" into the search box and submit. +Verify at least one product result is visible. +``` + +Run it: + +```bash +kane-cli testmd run amazon_test.md --agent +``` + +## File format + +Four parts in order: + +1. **YAML frontmatter** — between `--- ... ---` at the very top. +2. **`# Title`** — decorative; everything before the first `## ` is ignored. +3. **`## H2` step headings** — one per step. The agent reads the step body, not the heading. +4. **Step body** — either prose **or** a single `@import ` line. Never both. Prose bodies are objectives with the same grammar as `kane-cli run` — for the full pattern catalog (action verbs, assertion analyze methods, checkpoint types, chaining, worked examples), Read `references/objectives-cookbook.md`. + +Per-step `yaml` overrides go immediately under the heading, in a fenced block: + +````markdown +## Submit the form +```yaml +timeout: 90 +optional: true +``` +Click submit and verify the confirmation banner. +```` + +**Frontmatter keys to use:** + +| Key | Scope | Description | +|---|---|---| +| `mode` | root | `action` (halts on auth walls) or `testing` (default — pushes through so negative-test assertions can fire) | +| `max_steps` | root + step | Max agent reasoning steps. Default `30`. | +| `timeout` | root + step | Hard kill per step in seconds. | +| `headless` | root | No browser window. | +| `variables` | root + step | `{{name}}` params, same shape as §3, with `secret: true` for credentials | +| `global_context` / `local_context` | root + step | Inline Markdown or path | +| `code_export` / `code_language` | root + step | Generate Playwright after the run; language `python` or `javascript` | + +Files ending in `_test.md` are tests (valid entry points). Any other `.md` is a helper — reachable only via `@import`. + +## The replay & cascade rule (CRITICAL) + +On the **first** run of a test, the agent authors each step and saves a recording. On **every later run**, each step replays from its recording — no agent, no LLM cost, much faster. + +A step replays only if **all** of these hold: +- A recording for that step exists, +- Its prose is unchanged since the recording, +- Its `yaml` block is unchanged, +- No earlier step in the file invalidated it. + +**Editing step N re-authors step N AND every step after it in the same file.** Each step starts where the previous step left off (URL, login, tabs). When step 3 changes, step 4 cannot safely replay against state that no longer exists. + +Consequences when editing tests: +- A one-line tweak at the top of a 20-step test re-authors all 20 steps on the next run. +- To re-record only one step, edit only that step (or steps after it). +- `--author` forces full authoring for one run (debugging only). +- `rm -rf output-/` wipes the cache entirely. + +## `@import` for reusing flows + +Extract a repeating flow (login, setup, cookie banner dismissal) into a helper file: + +```markdown +## Sign in +@import ./helpers/login.md +``` + +Rules: +- Helper filename **must not** end in `_test.md`. +- Path resolves relative to the **importing file**, not the shell's cwd. +- The step body must be exactly `@import ` — no mixed prose, no extra lines. +- The step's `yaml` block may contain **only** `optional`. Other keys are rejected. +- `optional: true` on `@import` is allowed only at the root file, not on a nested import. + +Variables and context propagate into helpers. Chrome / `mode` / auth do not (root-only). + +Editing a helper re-authors that step in **every test that imports it**, plus everything after the import in those tests. Same cascade rule. + +## Commands + +| Command | Use | +|---|---| +| `kane-cli testmd run --agent [flags]` | Run a test | +| `kane-cli testmd list` | List `*_test.md` files under cwd (NDJSON when non-TTY) | +| `kane-cli testmd status ` | Test Manager identity + local-sync state | +| `kane-cli testmd export [--code-language python\|javascript]` | Regenerate code export from existing recordings (no browser launch) | +| `kane-cli testmd delete ` | Local-only delete: removes source + `output-/`. Does NOT delete from Test Manager. | + +**Flags on `testmd run` that don't exist on §3 `run`:** + +| Flag | Default | Description | +|---|---|---| +| `--name ` | none | Persist the run under this name. Regex `[a-zA-Z0-9_-]+`. | +| `--on-lock-conflict ` | none | Behavior when another user holds the test's edit lock. `readonly` = replay-only / no upload, `fail` = exit 2, `wait` = block until released | +| `--retry` | off | On replay failure, restart with a shrinking replay window | +| `--retry-count ` | `3` | Max retry restarts before falling back to full re-author | +| `--author` | off | Force authoring every step (skip replay decision) | + +All §3 `run` flags also apply (`--agent`, `--headless`, `--max-steps`, `--timeout`, `--variables`, etc.). + +Flag wins over frontmatter for everything **except** `variables` — the file owns variables; you can add new keys via flags but cannot override file-defined ones. + +## Output: `output-/` and `Result.md` + +After a run: + +```text +amazon_test.md +output-amazon/ + Result.md # human-readable run report + .internal/ # cached recordings — do not edit + playwright-python-code/ # only if code_export enabled +``` + +**`output-/` is commit-safe and should be committed to git.** That's how teammates and CI replay the same recordings. + +For tests using `@import`, helper recordings land next to the helper file in `helper-output---/` directories. Also commit-safe. + +**`Result.md`** opens in any Markdown viewer. It contains: +- Frontmatter — `status`, `started`, `duration_s`, `session_id` +- One entry per root step with one of `✓ passed`, `✗ failed`, `⏭ skipped`, optionally suffixed `(optional)` when a soft-failing step failed but the run continued +- For `@import` steps that failed, a path to the failing sub-step inside the helper + +When the user asks "did the test pass?" or "where did it fail?" for a previously-run test, read `Result.md` rather than re-running the test. + +## Recording a `_test.md` from a live session + +If the user runs an ad-hoc objective with §3 `run` and decides to keep it: + +```bash +kane-cli run "Search for noise-cancelling headphones on amazon.com" --name amazon-search +``` + +On exit, kane-cli writes `/.testmuai/tests/amazon-search_test.md`. Move that file into the user's repo and re-run it with `testmd run`. Without `--name`, an ad-hoc `run` is ephemeral and nothing is written. + +## CI invocation + +```bash +kane-cli testmd run ./tests/checkout_test.md \ + --agent \ + --headless \ + --on-lock-conflict wait \ + --retry +``` + +- `--agent` — NDJSON to stdout (auto-enabled when stdin is not a TTY; pass explicitly anyway). +- `--headless` — no window. +- `--on-lock-conflict wait` — block instead of failing if a teammate is editing the same test. +- `--retry` — automatically recover transient replay failures. + +Exit codes: + +| Code | Meaning | +|------|---------| +| 0 | ✅ Passed | +| 1 | ❌ Failed | +| 2 | ⚠️ Error (auth, setup, infra) — for `testmd`, also includes parse errors and `--on-lock-conflict fail` | +| 3 | ⏱️ Timeout or cancelled — for `testmd`, also includes `--on-lock-conflict wait` timeout | + +## Parse errors (when writing a `_test.md`) + +Parse errors abort **before** any browser launch with exit `2`. Common ones and the fix: + +| Message | Fix | +|---|---| +| `frontmatter is missing closing '---'` | Add the trailing `---` | +| `invalid YAML in frontmatter` | Re-validate the YAML block | +| `step body must be exactly one of prose / @import` | Split into two steps | +| `step config on @import may only contain 'optional'` | Remove other keys from the yaml block | +| `cannot @import a test file` | Imports may only reference helpers (not ending in `_test.md`) | +| `cyclic reference` | Restructure helpers to break the loop | +| `chrome config is global-only` | Move Chrome key to root frontmatter | +| `'' is run-level and cannot be set per-step` | Move `mode` / `on_lock_conflict` to root frontmatter | +| `unknown config key` | Remove or fix the key | +| `auth/identity keys are CLI-only` | Pass `username` / `access_key` as CLI flags, not in frontmatter | + +When the user reports a parse error, fix the file before retrying — don't loop on the same error. diff --git a/.claude/skills/kane-cli/SKILL.md b/.claude/skills/kane-cli/SKILL.md index e5bf710..9190215 100644 --- a/.claude/skills/kane-cli/SKILL.md +++ b/.claude/skills/kane-cli/SKILL.md @@ -5,804 +5,255 @@ description: Browser automation via kane-cli — run objectives, parse NDJSON ou # Kane CLI — Browser Automation Skill -Use `kane-cli` for **any task that requires a real browser**: navigating websites, clicking elements, filling forms, searching, testing web UI, taking screenshots, or verifying deployments. - -**Do NOT** use Playwright, Puppeteer, or Selenium directly. `kane-cli` manages Chrome, auth, and the AI automation agent. - -**Always run with `--agent` flag.** This gives structured NDJSON output that you parse and present to the user with rich formatting. - ---- - -## 1. Decision Tree - -When the user's request involves a browser, follow this flow: - -**Is kane-cli installed?** -├─ Unknown → Check with `kane-cli --version` -├─ No → `npm install -g @testmuai/kane-cli` then §2 -└─ Yes ↓ - -**Is kane-cli set up?** -├─ Unknown → Run `kane-cli whoami` to check auth status -├─ No → Go to §2 (Pre-flight Setup) -└─ Yes ↓ - -**What does the user want?** -├─ Single browser task → Build one `kane-cli run --agent` command (§3, §4) -├─ Test/verify something → Same, but use assertion objectives (§4) -├─ Extract data from a page → Same, but use "store as" extraction pattern (§4) -├─ Save / re-run / commit the test → Use `kane-cli testmd` (§7) -├─ Multiple independent tasks → Decompose into sub-objectives, run in parallel via Agent tool (§9) -├─ Debug a failed run → Inspect logs (§8) -└─ Configure kane-cli → Run config commands (§10) - -**After every run:** -1. Parse the NDJSON output (§5) -2. Present rich results with emojis (§6) -3. If failed, inspect logs and diagnose (§8) - ---- - -## 2. Pre-flight Setup - -Before first use, verify installation and auth. - -### Install - -```bash -npm install -g @testmuai/kane-cli -``` - -### Check Auth Status - -```bash -kane-cli whoami -``` - -If this shows "not configured" or errors, run login: - -### Login (Basic Auth) - -```bash -kane-cli login --username --access-key -``` - -This creates the default profile with basic auth, auto-selects the KaneAI project, and marks setup complete. Credentials come from the user's TestmuAI dashboard (Settings → Keys). - -Optional flag: -- `--profile ` — profile name (default: last selected profile check using `config show`) - -### Login (OAuth) - -```bash -kane-cli login --oauth -``` - -This opens the browser for OAuth consent and waits for the callback. Works in both TTY and non-TTY (agent) mode. - -### Login (Interactive — TTY only) - -In a terminal, run `kane-cli login` with no flags for the interactive wizard (auth method → project picker → folder picker). If the user needs this, ask them to run it directly: - -> Please run `! kane-cli login` and complete the sign-in. - -### Verify - -```bash -kane-cli whoami # Auth status -kane-cli config show # Current configuration -``` +Use `kane-cli` for **any task that requires a real browser**: navigating websites, clicking elements, filling forms, searching, testing web UI, taking screenshots, or verifying deployments. Do NOT use Playwright, Puppeteer, or Selenium directly. Always run with `--agent` so output is structured NDJSON you can parse. --- -## 3. Building the Command +## 1. Live narration and results presentation — READ THIS FIRST -Every run uses this pattern: - -```bash -kane-cli run "" --agent [options] -``` - -`--agent` is **mandatory** — it outputs structured NDJSON that you parse and present to the user. - -### Flags - -| Flag | Purpose | Default | -|------|---------|---------| -| `--headless` | No visible browser window | Off (browser visible) | -| `--max-steps ` | Limit agent reasoning steps | 30 | -| `--timeout ` | Kill run after N seconds | No limit | -| `--variables ` | Inline variables JSON | None | -| `--variables-file ` | Load variables from a JSON file | None | -| `--global-context ` | Override global agent context markdown | `~/.testmuai/kaneai/global-memory.md` | -| `--local-context ` | Override local project context markdown | `.testmuai/context.md` | -| `--ws-endpoint ` | Remote browser via WebSocket (e.g. LambdaTest grid) | Local Chrome | -| `--cdp-endpoint ` | Connect to existing Chrome via CDP | Auto-launch Chrome | -| `--code-export` | Generate code export after upload | Off | - -### Exit Codes - -| Code | Meaning | -|------|---------| -| 0 | ✅ Passed | -| 1 | ❌ Failed | -| 2 | ⚠️ Error (auth, setup, infra) | -| 3 | ⏱️ Timeout or cancelled | - -### Variables - -Variables parameterize objectives with reusable values and secrets. Use `{{key}}` syntax in objectives. - -**Format:** -```json -{ - "username": { "value": "alice", "secret": false }, - "password": { "value": "s3cret!", "secret": true } -} -``` +The user is watching this happen in real time. Silence during a kane-cli run is a bug; a one-line "Test passed" instead of the results table is a bug. Both happen because this section used to be buried at line 353 of an 800-line file. It's first now. Follow it exactly. -`secret: true` masks the value in logs and routes it to TestmuAI's secrets store instead of being synced as plain TMS variables. +### 1.1 How to launch kane-cli — Monitor (Claude Code) or Bash (Codex / Gemini) -**Loading order** (later wins): -1. `~/.testmuai/kaneai/variables/*.json` (global, alphabetical) -2. `{cwd}/.testmuai/variables/*.json` (local project overrides) -3. `--variables-file ` -4. `--variables '{...}'` (inline JSON) +**Bash is synchronous — it blocks until kane-cli exits, then hands you the whole stdout at once. That means you cannot narrate event-by-event from a Bash call.** To narrate live, the launch tool must stream stdout line-by-line. -**Always parameterize:** credentials, API keys, tokens, environment-specific URLs. -**OK to hardcode:** one-off URLs, static UI text, navigation paths. - -### Context Files - -Context files provide additional instructions to the agent: -- **Global:** `~/.testmuai/kaneai/global-memory.md` — shared across all runs -- **Local:** `.testmuai/context.md` in cwd — project-specific - -Override per-run with `--global-context` / `--local-context` flags. - -### Examples - -```bash -# Simple browser task -kane-cli run "Go to https://www.amazon.in and search for 'laptop'" --agent - -# Headless with timeout -kane-cli run "Go to https://app.example.com and verify login page loads" --agent --headless --timeout 60 - -# With variables -kane-cli run "Go to https://app.example.com and login with {{username}} and {{password}}" --agent \ - --variables '{"username": {"value": "alice"}, "password": {"value": "secret123", "secret": true}}' - -# Remote browser (LambdaTest grid) -kane-cli run "Go to https://shop.example.com and add item to cart" --agent \ - --ws-endpoint "wss://cdp.lambdatest.com/playwright?capabilities=..." - -# With variables file -kane-cli run "Go to https://staging.myapp.com, login and verify dashboard" --agent \ - --variables-file ./test-creds.json --headless --timeout 120 -``` - ---- - -## 4. Writing Objectives - -The objective string is the most important input. How you phrase it determines what the agent does. - -### Three Patterns - -| Pattern | Trigger Phrases | Agent Behavior | -|---------|----------------|----------------| -| 🎯 **Action** | "go to", "click", "type", "search", "fill", "scroll" | Performs browser actions | -| ✅ **Assertion** | "assert", "verify", "confirm", "check that" | Validates a condition (pass/fail) | -| 📦 **Extraction** | "store X as 'name'" | Reads a value from the page and persists it in structured output | - -### Extraction: The "store as" Pattern - -**Critical.** Vague phrasing like "read", "report", or "tell me" does NOT reliably extract data. The agent may observe the value visually but won't persist it in structured output. - -❌ **Bad** — agent looks but doesn't capture: -``` -"go to example.com and read the page title" -"go to example.com and tell me the price" -``` - -✅ **Good** — agent extracts and persists in `final_state`: -``` -"go to example.com, store the page title as 'page_title'" -"go to example.com, store the price of the first item as 'price'" -``` - -Stored values appear in the `run_end` event's `final_state` and `context.memory` fields. - -### Combining Patterns +| Agent | Launch tool | Live narration possible? | +|---|---|---| +| **Claude Code** | `Monitor` — streams each stdout line as its own notification | ✅ Yes — narrate per event as it arrives | +| **Codex CLI** | `Bash` (or shell equivalent) | ❌ No — narrate post-run from captured stdout | +| **Gemini CLI** | `Bash` (or shell equivalent) | ❌ No — narrate post-run from captured stdout | -Chain action → extraction → assertion in a single objective: +**In Claude Code, you MUST use `Monitor` (not Bash) to launch `kane-cli run` / `kane-cli testmd run`.** Pattern: +```yaml +description: "kane-cli: " +command: kane-cli run "" --agent +timeout_ms: 600000 +persistent: false ``` -"go to {{app_url}}/dashboard, - store the welcome message as 'welcome_text', - store the user role in the sidebar as 'role', - assert the role is 'Admin'" -``` - -### Assertion Specificity - -| Type | Example | -|------|---------| -| **Exact match** | `"assert the cart total shows '$29.99'"` | -| **Flexible match** | `"assert a price is displayed for each product"` | -| **State** | `"assert the Submit button is disabled until all fields are filled"` | -| **Conditional** | `"if a cookie banner appears, dismiss it, then assert the homepage loads"` | -| **Negative** | `"assert no error message or red banner is visible"` | -| **Positional** | `"assert 'Settings' appears in the left sidebar navigation"` | - -### Dos and Don'ts - -| ✅ Do | ❌ Don't | -|-------|---------| -| Use imperative verbs: "go to", "click", "store as" | Use vague verbs: "check out", "look at", "explore" | -| Be specific: "click the 'Add to Cart' button" | Be vague: "add the item" | -| Name extractions: "store X as 'price'" | Hope for values: "tell me the price" | -| Use `{{variables}}` for credentials/URLs | Hardcode secrets in the objective | -| Include starting URL in the objective: "Go to https://..." | Assume the agent knows where to start | -| Split mega-objectives (>15 steps) into multiple runs | Cram everything into one massive objective | ---- - -## 5. Parsing Output (--agent mode) +Every NDJSON line from kane-cli arrives as a notification. The watch ends when kane-cli exits (you'll see the exit code in the final notification). Do NOT also call Bash for the same run — that double-launches kane-cli. -> **Internal reference only.** Everything in this section (field names, event types, JSON structure) is for you to parse programmatically. **Never expose these internal terms to the user.** The user should see plain-language summaries, not `run_end`, `final_state`, `bifurcation`, `NDJSON`, `session_dir`, or any raw JSON fields. +In Codex/Gemini, use Bash with the same `kane-cli ... --agent` command. After it returns, parse the captured stdout as if you had received the events in sequence. -With `--agent`, kane-cli outputs one JSON object per line to **stdout**. Progress UI renders to **stderr**. +### 1.2 Before you launch — emit start line and create todos -### Event Types +**Before** invoking Monitor (or Bash), emit: -**Progress events** (bulk of the output — one per step): - -```json -{"step": 1, "status": "passed", "remark": "Navigated to amazon.in"} -{"step": 2, "status": "passed", "remark": "Typed 'laptop' in search box"} -{"step": 3, "status": "failed", "remark": "Could not find Add to Cart button"} +```text +Starting browser task: . ``` -| Field | Type | Description | -|-------|------|-------------| -| `step` | number | Step index (1-based) | -| `status` | string | `"passed"` or `"failed"` | -| `remark` | string | What the agent did or why it failed | - -These are **untyped** — they have no `type` field. Do **not** key on `event.type === 'step_start'` or `'step_end'`; those event types are not emitted. - -**Flow events:** - -| Event (`type` field) | Key Fields | Purpose | -|-------|-----------|---------| -| `bifurcation` | `flows[]`, `count` | Agent split objective into sub-flows | -| `child_agent_start` | `child_id`, `objective`, `parent_step` | Child agent spawned | -| `child_agent_end` | `child_id`, `success`, `steps_taken`, `summary` | Child agent finished | -| `ask_user` | `question`, `step_index`, `options?` | Agent needs user input | -| `error` | `message` | Error occurred | - -**Note:** There is no `run_start` event — the first line is either a `bifurcation` or a progress object. +Then create these TodoWrite items (skip on Gemini CLI where TodoWrite is unavailable): -**Note:** `ask_user` is auto-disabled when stdin is not a TTY. Since agents typically run kane-cli as a subprocess, ask_user events will not be emitted. Write objectives that don't require interactive input. +1. `Narrate start of ` — mark `in_progress` immediately +2. `Narrate each step as NDJSON arrives` +3. `Present results table after run_end` -### Parsing Strategy +The todos exist so that after Monitor/Bash returns control, the in-context reminder pulls you back into narration mode rather than a generic "parse stdout" mode. -Since progress events lack a `type` field, distinguish them from typed events like this: +### 1.3 During the run — narrate every event -``` -for each line of NDJSON: - if obj.type === "run_end" → terminal event, stop parsing - if obj.type === "bifurcation" → flow split - if obj.type exists → other typed event - if obj.step exists → progress event (step/status/remark) -``` +Progress events have `step`/`status`/`remark` fields and **no `type` field**. Each one gets ONE narration line. -**Build automation on `run_end`** — it is the only event guaranteed to have a stable schema across versions. Use progress events for live status display only. - -**Terminal event** (always the last line): - -```json -{ - "type": "run_end", - "status": "passed", - "summary": "Searched for laptop and added first result to cart", - "one_liner": "Searched for laptop on Amazon and added to cart", - "reason": "Objective completed", - "duration": 45.2, - "credits": 12, - "final_state": { - "price": "$29.99", - "product_name": "Wireless Headphones" - }, - "context": { - "memory": {}, - "variables": {}, - "pointer": "(passed) Searched for laptop and added first result to cart" - }, - "session_dir": "~/.testmuai/kaneai/sessions/a1b2c3d4-e5f6-7890-abcd-ef1234567890", - "run_dir": "~/.testmuai/kaneai/sessions/a1b2c3d4-e5f6-7890-abcd-ef1234567890/runs/0", - "test_url": "https://test-manager.lambdatest.com/projects/123/test-cases/456" -} -``` +**Claude Code (Monitor):** Each Monitor notification IS one event. Narrate it the moment the notification arrives. Do not batch. Do not wait for more events. One notification → one narration line. -Key `run_end` fields: -- `status` — `"passed"` or `"failed"` -- `summary` — what the agent did -- `one_liner` — short summary for display -- `reason` — why it stopped -- `credits` — credits consumed by the run (when reported) -- `final_state` — extracted values from "store as" objectives -- `test_url` — link to KaneAI dashboard (if upload succeeded) -- `session_dir` / `run_dir` — paths to log files +**Codex / Gemini (Bash post-run):** Iterate the captured stdout line-by-line in order. Emit one narration per progress event in sequence before moving on to the results table. -### Responding to `ask_user` (if stdin is a TTY) +Template (both cases): -```json -{"type": "user_response", "answer": "Medium size"} +```text +Step : ``` -To cancel a run: +If `status` is `"failed"`, flag it immediately: -```json -{"type": "cancel"} +```text +Step failed: — the agent is retrying. ``` ---- - -## 6. Presenting Results to the User - -> **Golden rule:** The user should feel like they're watching a browser task happen, not reading a log file. Use plain language, never expose internal field names, JSON keys, file paths, or technical jargon. Translate everything into what the user cares about. - -### 📢 Live Progress (During the Run) +Never expose internal field names (`step`, `status`, `remark`, `run_end`, `final_state`, `bifurcation`, `session_dir`, etc.) to the user. Translate to plain language. -**Do not stay silent while kane-cli runs.** As the command executes, keep the user informed: +### 1.4 After run_end — present the results table -1. **Before starting** — Tell the user what you're about to do: - > Starting browser task: searching for 'laptop' on Amazon... +The terminal event has `type: "run_end"` and stable fields: `status`, `summary`, `one_liner`, `duration`, `credits`, `final_state`, `test_url`, `session_dir`, `run_dir`. -2. **As steps complete** — Relay each step's outcome in plain language as it happens. Parse the progress events from stdout and narrate them: - > Step 1: Opened Amazon homepage - > Step 2: Typed 'laptop' in the search bar - > Step 3: Clicked the search button - > Step 4: Search results loaded — found product listings - -3. **If something goes wrong mid-run** — Flag it immediately, don't wait for the final result: - > Step 5: Could not find the 'Add to Cart' button — the agent is retrying... - -This keeps the user engaged and lets them intervene early if the task is going in the wrong direction. - -### 📋 Results Summary (After the Run) - -**After every run, present a clear summary.** Never just say "it passed" — show the full picture in a user-friendly format. - -**Successful run:** +**For a passing run, always emit this exact table** (substituting the field values): +```markdown | | | |-------|-------| | 🟢 **Result** | Passed | -| 🎯 **Task** | Search for 'laptop' on Amazon | -| ⏱️ **Duration** | 45.2s | -| 👣 **Steps taken** | 7 | -| 📝 **What happened** | Opened Amazon, typed 'laptop' in search, clicked search, results loaded with 48 products | -| 🔗 **View details** | [Open in KaneAI Dashboard](https://test-manager.lambdatest.com/...) | - -**If data was extracted** (from "store as" objectives), show it as a clean results table: - -| 📦 What was found | Value | -|-------------|----------------| -| Top repository | freeCodeCamp/freeCodeCamp | -| Star count | 413k | -| Price | $29.99 | +| 🎯 **Task** | | +| ⏱️ **Duration** | s | +| 👣 **Steps taken** | | +| 📝 **What happened** | | +| 🔗 **View details** | [Open in KaneAI Dashboard]() | +``` -**If assertions were checked**, show pass/fail for each: +**If `final_state` has values** (the user used "store as X" — see §4), append a second table: -| ✅ Check | Result | -|-------------|--------| -| Dashboard shows welcome message | 🟢 Passed | -| User role is Admin | 🔴 Failed | -### ❌ When Things Go Wrong -For failed runs, explain **what went wrong in plain language**: +```markdown +| 📦 What was found | Value | +|-------------|----------------| +| | | +``` -- 🔍 **What failed** — describe the step that failed and why, in the user's terms (not "step_003.json shows dom_action error") -- 📸 **Screenshot** — if a screenshot exists, read and show it so the user can see what the browser looked like at the point of failure -- 💡 **Why it likely failed** — your diagnosis: was the element missing? Did the page not load? Was the objective ambiguous? -- 🔧 **Suggested fix** — a concrete next step: rephrase the objective, increase timeout, check auth, etc. +**If the objective used assertions** ("assert …", "verify …"), append a pass/fail table per assertion derived from the run summary and step remarks. -**Example of a good failure report:** +### 1.5 On failure -> 🔴 **Failed** at step 5 of 9 (after 25s) -> -> **What happened:** The agent clicked "Proceed to Checkout" but the payment form never appeared. The page showed a loading spinner for 15 seconds before the agent timed out. -> -> **Likely cause:** The checkout page may require authentication, or the site's payment service was slow/down. -> -> **Suggested fix:** Try adding an explicit login step before checkout, or increase the timeout to 120s. +For exit code 1 (or `status: "failed"` in `run_end`), present a plain-language failure report — never raw paths or NDJSON. Template: -### 🐛 Suggesting a Bug Report +```markdown +🔴 **Failed** at step of (after s) -If the failure looks like a **kane-cli bug** (not auth, timeout, or a vague objective), offer to file a report: +**What happened:** . -> This looks like it might be a bug in kane-cli. Want me to file a report? +**Likely cause:** -File at: **https://github.com/LambdaTest/kane-cli/issues**. Gather the details automatically — don't ask the user to dig through log files. +**Suggested fix:** . +``` -**Do NOT suggest bug reports for:** auth issues, low timeouts, vague objectives, or website errors (500s, CAPTCHAs). +If a screenshot exists at `/run-test/screenshots/step_.png`, Read it and show it inline before the suggested fix. For deeper diagnosis, see `references/debug.md`. --- -## 7. Saving & Replaying Tests (`testmd`) +## 2. Decision tree -The §3 `run` command is the **primary** mode — one-shot, ephemeral. `testmd` is the **secondary** mode: tests live as `_test.md` files on disk, each step is cached on the first run, and every later run **replays from cache** with no LLM cost. +When the user's request involves a browser: -Use `testmd` whenever the user wants the test to persist. The decision is binary — once a test exists as a file, every later invocation is `testmd run`, never `run`. +**Is kane-cli installed and authenticated?** +- Unknown → `kane-cli whoami` +- No / errors → Read `references/setup-and-config.md` +- Yes ↓ -### When to switch from `run` to `testmd` - -| User says | Use | -|---|---| -| "save this test", "commit this", "keep this", "add this to the suite" | `testmd` | -| "regression test", "smoke test", "make this replayable" | `testmd` | -| "this is a test", "test the X flow end-to-end" (suite-shaped) | `testmd` | -| "run this once", "check if X works right now", "try X" | `run` (§3) | -| "search for", "click", "fill", "verify" (one-shot) | `run` (§3) | - -If unclear, ask: "Do you want me to save this test so you can re-run it later?" - -### Quick start +**What does the user want?** +- A single one-shot browser task → build a `kane-cli run --agent` command (§3 + §4) +- A test they want to save / re-run / commit → Read `references/testmd.md` first, then use `kane-cli testmd` +- Multiple independent browser tasks → Read `references/parallel.md` first +- Debug a failed run → Read `references/debug.md` +- Configure kane-cli or check directory layout → Read `references/setup-and-config.md` +- You need the full NDJSON event schema (rare — §5's summary covers 90% of cases) → Read `references/parsing.md` -Write the file (any path; filename must end in `_test.md`): +**Every run, always:** follow §1 above. -```markdown ---- -mode: testing -max_steps: 30 --- -# Amazon search - -## Open Amazon -Open https://www.amazon.com. - -## Search for headphones -Type "wireless headphones" into the search box and submit. -Verify at least one product result is visible. -``` - -Run it: +## 3. Building a `run` command ```bash -kane-cli testmd run amazon_test.md --agent -``` - -### File format - -Four parts in order: - -1. **YAML frontmatter** — between `--- ... ---` at the very top. -2. **`# Title`** — decorative; everything before the first `## ` is ignored. -3. **`## H2` step headings** — one per step. The agent reads the step body, not the heading. -4. **Step body** — either prose **or** a single `@import ` line. Never both. - -Per-step `yaml` overrides go immediately under the heading, in a fenced block: - -````markdown -## Submit the form -```yaml -timeout: 90 -optional: true +kane-cli run "" --agent [options] ``` -Click submit and verify the confirmation banner. -```` -**Frontmatter keys to use:** +`--agent` is mandatory — it switches stdout to NDJSON. Most-used flags: -| Key | Scope | Description | -|---|---|---| -| `mode` | root | `action` (halts on auth walls) or `testing` (default — pushes through so negative-test assertions can fire) | -| `max_steps` | root + step | Max agent reasoning steps. Default `30`. | -| `timeout` | root + step | Hard kill per step in seconds. | -| `headless` | root | No browser window. | -| `variables` | root + step | `{{name}}` params, same shape as §3, with `secret: true` for credentials | -| `global_context` / `local_context` | root + step | Inline Markdown or path | -| `code_export` / `code_language` | root + step | Generate Playwright after the run; language `python` or `javascript` | - -Files ending in `_test.md` are tests (valid entry points). Any other `.md` is a helper — reachable only via `@import`. - -### The replay & cascade rule (CRITICAL) - -On the **first** run of a test, the agent authors each step and saves a recording. On **every later run**, each step replays from its recording — no agent, no LLM cost, much faster. +| Flag | Purpose | Default | +|------|---------|---------| +| `--headless` | No visible browser window | Off | +| `--max-steps ` | Cap agent reasoning steps | 30 | +| `--timeout ` | Hard kill after N seconds | No limit | +| `--variables ` | Inline variables JSON (for `{{key}}` in objective) | None | +| `--variables-file ` | Load variables from a JSON file | None | +| `--ws-endpoint ` | Remote browser (LambdaTest grid) | Local Chrome | +| `--code-export` | Generate code export after upload | Off | -A step replays only if **all** of these hold: -- A recording for that step exists, -- Its prose is unchanged since the recording, -- Its `yaml` block is unchanged, -- No earlier step in the file invalidated it. +Other flags (`--global-context`, `--local-context`, `--cdp-endpoint`) and the full variables precedence chain live in `references/setup-and-config.md`. -**Editing step N re-authors step N AND every step after it in the same file.** Each step starts where the previous step left off (URL, login, tabs). When step 3 changes, step 4 cannot safely replay against state that no longer exists. +**Exit codes:** `0` passed · `1` failed · `2` auth/infra error · `3` timeout/cancelled. -Consequences when editing tests: -- A one-line tweak at the top of a 20-step test re-authors all 20 steps on the next run. -- To re-record only one step, edit only that step (or steps after it). -- `--author` forces full authoring for one run (debugging only). -- `rm -rf output-/` wipes the cache entirely. +### Examples -### `@import` for reusing flows +```bash +# One-shot +kane-cli run "Go to https://www.amazon.in and search for 'laptop'" --agent -Extract a repeating flow (login, setup, cookie banner dismissal) into a helper file: +# Headless with timeout +kane-cli run "Go to https://app.example.com and verify login page loads" --agent --headless --timeout 60 -```markdown -## Sign in -@import ./helpers/login.md +# With inline credentials +kane-cli run "Go to https://app.example.com and login with {{username}} and {{password}}" --agent \ + --variables '{"username":{"value":"alice"},"password":{"value":"s3cret","secret":true}}' ``` -Rules: -- Helper filename **must not** end in `_test.md`. -- Path resolves relative to the **importing file**, not the shell's cwd. -- The step body must be exactly `@import ` — no mixed prose, no extra lines. -- The step's `yaml` block may contain **only** `optional`. Other keys are rejected. -- `optional: true` on `@import` is allowed only at the root file, not on a nested import. - -Variables and context propagate into helpers. Chrome / `mode` / auth do not (root-only). - -Editing a helper re-authors that step in **every test that imports it**, plus everything after the import in those tests. Same cascade rule. +--- -### Commands +## 4. Writing objectives -| Command | Use | -|---|---| -| `kane-cli testmd run --agent [flags]` | Run a test | -| `kane-cli testmd list` | List `*_test.md` files under cwd (NDJSON when non-TTY) | -| `kane-cli testmd status ` | Test Manager identity + local-sync state | -| `kane-cli testmd export [--code-language python\|javascript]` | Regenerate code export from existing recordings (no browser launch) | -| `kane-cli testmd delete ` | Local-only delete: removes source + `output-/`. Does NOT delete from Test Manager. | +How you phrase the objective string determines what the agent does. Three patterns: -**Flags on `testmd run` that don't exist on §3 `run`:** +> For the full catalog — every action verb, every assertion analyze method (Visual / Textual-DOM / URL / Title / DevTools→Network/Console/Performance/Cookies/localStorage), operators, chaining, conditional/negative patterns, and worked examples — Read `references/objectives-cookbook.md`. Same grammar applies to one-shot `kane-cli run` objectives and `_test.md` step bodies. -| Flag | Default | Description | +| Pattern | Trigger words | Behavior | |---|---|---| -| `--name ` | none | Persist the run under this name. Regex `[a-zA-Z0-9_-]+`. | -| `--on-lock-conflict ` | none | Behavior when another user holds the test's edit lock. `readonly` = replay-only / no upload, `fail` = exit 2, `wait` = block until released | -| `--retry` | off | On replay failure, restart with a shrinking replay window | -| `--retry-count ` | `3` | Max retry restarts before falling back to full re-author | -| `--author` | off | Force authoring every step (skip replay decision) | +| 🎯 **Action** | "go to", "click", "type", "search", "fill" | Performs browser actions | +| ✅ **Assertion** | "assert", "verify", "confirm", "check that" | Pass/fail check on a condition | +| 📦 **Extraction** | "store X as 'name'" | Persists a value into `run_end.final_state` | -All §3 `run` flags also apply (`--agent`, `--headless`, `--max-steps`, `--timeout`, `--variables`, etc.). +### The "store as" rule (critical for extraction) -Flag wins over frontmatter for everything **except** `variables` — the file owns variables; you can add new keys via flags but cannot override file-defined ones. +Vague phrasing like "read", "tell me", "report" does NOT reliably extract data — the agent may see the value but won't capture it. Use "store as". -### Output: `output-/` and `Result.md` +❌ `"go to example.com and read the page title"` +✅ `"go to example.com, store the page title as 'page_title'"` -After a run: - -``` -amazon_test.md -output-amazon/ - Result.md # human-readable run report - .internal/ # cached recordings — do not edit - playwright-python-code/ # only if code_export enabled -``` +Stored values appear in `run_end.final_state` and become the second results table per §1.4. -**`output-/` is commit-safe and should be committed to git.** That's how teammates and CI replay the same recordings. +### Chaining -For tests using `@import`, helper recordings land next to the helper file in `helper-output---/` directories. Also commit-safe. +Action → extraction → assertion in one objective: -**`Result.md`** opens in any Markdown viewer. It contains: -- Frontmatter — `status`, `started`, `duration_s`, `session_id` -- One entry per root step with one of `✓ passed`, `✗ failed`, `⏭ skipped`, optionally suffixed `(optional)` when a soft-failing step failed but the run continued -- For `@import` steps that failed, a path to the failing sub-step inside the helper - -When the user asks "did the test pass?" or "where did it fail?" for a previously-run test, read `Result.md` rather than re-running the test. - -### Recording a `_test.md` from a live session - -If the user runs an ad-hoc objective with §3 `run` and decides to keep it: - -```bash -kane-cli run "Search for noise-cancelling headphones on amazon.com" --name amazon-search -``` - -On exit, kane-cli writes `/.testmuai/tests/amazon-search_test.md`. Move that file into the user's repo and re-run it with `testmd run`. Without `--name`, an ad-hoc `run` is ephemeral and nothing is written. - -### CI invocation - -```bash -kane-cli testmd run ./tests/checkout_test.md \ - --agent \ - --headless \ - --on-lock-conflict wait \ - --retry +```text +"go to {{app_url}}/dashboard, + store the welcome message as 'welcome_text', + assert the user role in the sidebar is 'Admin'" ``` -- `--agent` — NDJSON to stdout (auto-enabled when stdin is not a TTY; pass explicitly anyway). -- `--headless` — no window. -- `--on-lock-conflict wait` — block instead of failing if a teammate is editing the same test. -- `--retry` — automatically recover transient replay failures. - -Exit codes follow §3 with new semantics: -- `2` now includes parse errors and `--on-lock-conflict fail` -- `3` now includes `--on-lock-conflict wait` timeout - -### Parse errors (when writing a `_test.md`) - -Parse errors abort **before** any browser launch with exit `2`. Common ones and the fix: +### Dos and don'ts -| Message | Fix | +| ✅ Do | ❌ Don't | |---|---| -| `frontmatter is missing closing '---'` | Add the trailing `---` | -| `invalid YAML in frontmatter` | Re-validate the YAML block | -| `step body must be exactly one of prose / @import` | Split into two steps | -| `step config on @import may only contain 'optional'` | Remove other keys from the yaml block | -| `cannot @import a test file` | Imports may only reference helpers (not ending in `_test.md`) | -| `cyclic reference` | Restructure helpers to break the loop | -| `chrome config is global-only` | Move Chrome key to root frontmatter | -| `'' is run-level and cannot be set per-step` | Move `mode` / `on_lock_conflict` to root frontmatter | -| `unknown config key` | Remove or fix the key | -| `auth/identity keys are CLI-only` | Pass `username` / `access_key` as CLI flags, not in frontmatter | - -When the user reports a parse error, fix the file before retrying — don't loop on the same error. - ---- - -## 8. Failure Handling & Log Inspection - -When a run fails, diagnose before suggesting fixes. - -### Log Locations - -The `run_end` event provides `session_dir` and `run_dir` paths. Use those directly. - -``` -{session_dir}/ -├── session.json # Session metadata, run list, upload status -├── tui.log # Timeline: session start, run start/end, errors -└── runs/{n}/ - └── run-test/ - └── actions.ndjson # Step-by-step record of agent actions -``` - -### Debugging Flow - -1. **Parse the `run_end` event** from stdout — it has `status`, `reason`, and `summary` plus the `session_dir` / `run_dir` paths. -2. **Read `actions.ndjson`** in `{run_dir}/run-test/` — each line is one agent action with its intent and outcome. -3. **Check `tui.log`** in `{session_dir}/` — for session-level issues (Chrome launch, auth, upload). - -### Common Failure Patterns - -| Symptom | Likely Cause | Fix | -|---------|-------------|-----| -| 🔄 Agent repeats same action | Stuck in a loop / page didn't change | Rephrase objective, add explicit wait or assertion | -| 🎯 Agent clicks wrong element | Ambiguous UI, multiple similar elements | Be more specific: "click the **blue** 'Submit' button in the **checkout form**" | -| 👁️ Agent says done but didn't finish | Objective too vague | Add explicit assertions: "assert the confirmation page shows order number" | -| 💀 Exit code 2, no steps | Auth or Chrome failure | Check `kane-cli whoami`, verify Chrome is available | -| ⏱️ Exit code 3 | Timeout or cancelled | Increase `--timeout` or `--max-steps`, or split into smaller objectives | -| 🚫 "CDP endpoint not reachable" | Chrome not running | Let kane-cli manage Chrome (remove `--cdp-endpoint`) | +| Imperative verbs: "go to", "click", "store as" | Vague verbs: "check out", "look at", "explore" | +| Specific: "click the 'Add to Cart' button" | Vague: "add the item" | +| Name extractions: "store X as 'price'" | Hope for values: "tell me the price" | +| `{{variables}}` for credentials/URLs | Hardcode secrets in the objective | +| Always include starting URL | Assume the agent knows where to start | +| Split mega-objectives (>15 steps) into multiple runs | Cram everything into one | --- -## 9. Parallel Execution - -For multiple independent browser tasks, decompose and run in parallel using the Agent tool. - -### When to Split - -- **>15 steps** — long runs drift and get stuck -- **Independent flows** — login test and search test don't depend on each other -- **Different pages/features** — settings vs checkout vs admin -- **Different user roles** — admin flow vs regular user flow - -### How to Split - -Each sub-objective must be **self-contained**: navigates to its own URL, authenticates independently, asserts its own outcomes. No sub-objective depends on another having run first. - -### Execution Pattern +## 5. Parsing `--agent` output — essentials -1. Decompose the user's request into N independent sub-objectives -2. Spawn N Agent tool calls in a **single message** — each runs: - ```bash - kane-cli run "Go to and " --agent --headless --timeout 120 - ``` -3. Each agent parses the NDJSON output, waits for `run_end`, returns: status, steps, duration, summary, session path -4. After ALL agents complete, format the batch summary +> Internal reference only. Never expose these field names to the user — translate them per §1. -### Agent Prompt Template +Stdout is NDJSON, one event per line. There are two shapes: -``` -Run this kane-cli browser test and report results: +- **Progress events** (most events) have `step` (1-based), `status` (`passed`/`failed`), `remark` — and **no `type` field**. +- **Typed events** have a `type` field: `bifurcation`, `child_agent_start`, `child_agent_end`, `ask_user`, `error`, and finally `run_end`. - kane-cli run "Go to and " --agent --headless --timeout 120 +Parsing strategy: -After the command completes: -1. Capture the exit code -2. Parse the run_end NDJSON event from stdout -3. If failed, read the failing step's screenshot from run_dir -4. Return: {status, steps, duration, summary, session_dir, failure_step, screenshot_path} +```text +for each line: + if obj.type === "run_end" → terminal, stop parsing + else if obj.type exists → typed flow event (rare) + else if obj.step exists → progress event → narrate per §1.3 ``` -### Batch Summary Format - -```markdown -## 🧪 Test Suite: - -| # | Test | Status | Steps | Time | What happened | -|---|------|--------|-------|------|---------| -| 1 | Login + dashboard | ✅ | 5 | 12s | Welcome banner visible | -| 2 | Product search | ✅ | 7 | 18s | 3 results for 'shoes' | -| 3 | Checkout flow | ❌ | 9 | 25s | Payment form did not load | -| 4 | Admin CSV export | ✅ | 6 | 15s | CSV downloaded (42 rows) | - -### 📊 Overall -- **Pass rate:** 3/4 (75%) -- **Total steps:** 27 · **Total time:** 1m10s - -### ❌ Failures -**#3 Checkout flow** — Payment form did not load after clicking "Credit Card". -📸 [screenshot of the failure shown inline] -``` +`run_end` is the only event with a stable cross-version schema — build all post-run logic on it. -Status icons: ✅ passed · ❌ failed · ⚠️ stuck/timeout - -**Do not** show raw file paths (like `~/.testmuai/kaneai/sessions/...`) in the summary. Instead, read the screenshot and show it inline, or offer to inspect logs only if the user asks. +For full event schemas (`bifurcation` flow fields, `child_agent_*`, `ask_user` semantics, `cancel`/`user_response` outbound events, complete `run_end` field list), Read `references/parsing.md`. --- -## 10. Configuration & Reference - -### Config Commands - -```bash -kane-cli config show # Show all current settings -kane-cli config set-window x # Browser window size (e.g. 1920x1080) -kane-cli config chrome-profile # Chrome profile path (or interactive picker in TTY) -kane-cli config project # TMS project ID (or interactive picker in TTY) -kane-cli config folder # TMS folder ID (or interactive picker in TTY) -``` - -### Feedback - -Submit feedback on a completed test run: -```bash -kane-cli feedback --test-id --feedback-type --details "..." -``` - -### Directory Structure +## 6. When to read which reference -``` -~/.testmuai/kaneai/ -├── tui-config.json # Persistent CLI settings -├── config.json # Shared auth configuration -├── global-memory.md # Global agent context -├── chrome-profile/ # Default Chrome user profile -├── profiles/ # Stored credentials -│ └── {profile}/{env}/ -│ └── credentials -├── sessions/ # Session history -│ └── {session-id}/ -│ ├── session.json # Metadata, run list, upload status -│ ├── tui.log # Session event log -│ ├── runs/{n}/ -│ │ └── run-test/ -│ │ └── actions.ndjson # Step-by-step record of agent actions -│ └── code-export/ # (when --code-export) generated code files -└── variables/ # Global variable files - └── *.json - -# Project-local overrides (in cwd): -.testmuai/ -├── context.md # Project-specific agent context -└── variables/ - └── *.json # Project-specific variables -``` - -### Chrome Management - -kane-cli auto-launches Chrome with CDP (DevTools Protocol) on ports 9222–9230. Chrome runs as a detached process and outlives the CLI. - -- `--headless` — runs Chrome in headless mode (no visible window) -- `--cdp-endpoint ` — connect to an already-running Chrome instance -- `--ws-endpoint ` — connect to a remote browser (LambdaTest grid) - -If Chrome fails to launch, ensure Google Chrome is installed and no other process is using CDP ports 9222–9230. +| Situation | Read | +|---|---| +| User wants to save/persist/re-run a test | `references/testmd.md` | +| Run failed, need to diagnose | `references/debug.md` | +| Multiple independent browser tasks | `references/parallel.md` | +| Need full NDJSON event schema | `references/parsing.md` | +| First-time install, auth, or full config | `references/setup-and-config.md` | diff --git a/.claude/skills/kane-cli/references/debug.md b/.claude/skills/kane-cli/references/debug.md new file mode 100644 index 0000000..d599f31 --- /dev/null +++ b/.claude/skills/kane-cli/references/debug.md @@ -0,0 +1,45 @@ + + +# Failure Handling & Log Inspection + +When a run fails, diagnose before suggesting fixes. + +## Log Locations + +The `run_end` event provides `session_dir` and `run_dir` paths. Use those directly. + +```text +{session_dir}/ +├── session.json # Session metadata, run list, upload status +├── tui.log # Timeline: session start, run start/end, errors +└── runs/{n}/ + └── run-test/ + └── actions.ndjson # Step-by-step record of agent actions +``` + +## Debugging Flow + +1. **Parse the `run_end` event** from stdout — it has `status`, `reason`, and `summary` plus the `session_dir` / `run_dir` paths. +2. **Read `actions.ndjson`** in `{run_dir}/run-test/` — each line is one agent action with its intent and outcome. +3. **Check `tui.log`** in `{session_dir}/` — for session-level issues (Chrome launch, auth, upload). + +## Common Failure Patterns + +| Symptom | Likely Cause | Fix | +|---------|-------------|-----| +| 🔄 Agent repeats same action | Stuck in a loop / page didn't change | Rephrase objective, add explicit wait or assertion | +| 🎯 Agent clicks wrong element | Ambiguous UI, multiple similar elements | Be more specific: "click the **blue** 'Submit' button in the **checkout form**" | +| 👁️ Agent says done but didn't finish | Objective too vague | Add explicit assertions: "assert the confirmation page shows order number" | +| 💀 Exit code 2, no steps | Auth or Chrome failure | Check `kane-cli whoami`, verify Chrome is available | +| ⏱️ Exit code 3 | Timeout or cancelled | Increase `--timeout` or `--max-steps`, or split into smaller objectives | +| 🚫 "CDP endpoint not reachable" | Chrome not running | Let kane-cli manage Chrome (remove `--cdp-endpoint`) | + +## Filing a bug report + +If the failure looks like a **kane-cli bug** (not auth, timeout, or a vague objective), offer to file a report: + +> This looks like it might be a bug in kane-cli. Want me to file a report? + +File at: **https://github.com/LambdaTest/kane-cli/issues**. Gather the details automatically — don't ask the user to dig through log files. + +**Do NOT suggest bug reports for:** auth issues, low timeouts, vague objectives, or website errors (500s, CAPTCHAs). diff --git a/.claude/skills/kane-cli/references/objectives-cookbook.md b/.claude/skills/kane-cli/references/objectives-cookbook.md new file mode 100644 index 0000000..c4ee42f --- /dev/null +++ b/.claude/skills/kane-cli/references/objectives-cookbook.md @@ -0,0 +1,372 @@ + + +# Writing Kane-CLI Objectives — Pattern Cookbook + +Read this whenever you're constructing the prose objective for `kane-cli run ""` or the body of a `## Step` in a `_test.md` file. Both surfaces feed the same agent and accept the same grammar. + +--- + +## 1. Anatomy of a good objective + +Three properties make an objective reliable: + +- **Specific** — name the site, the action, and the field values where they matter. +- **Action-oriented** — lead with a verb (`go to`, `search`, `open`, `fill`, `click`, `verify`). +- **Has a success criterion** — state what "done" looks like so the agent knows when to stop. + +Bad → better: + +| | Objective | +|---|---| +| ❌ | Test the login page. | +| ✅ | Open `https://app.example.com/login`, log in as `{{tester}}`, and verify the dashboard URL contains `/home`. | + +The bad version leaves "test" undefined and gives the agent no end state. The better version names the URL, the credentials, and the assertion that closes the loop. + +--- + +## 2. Action verbs — quick catalog + +Reference list. Use these in your prose; the agent recognizes them all. + +| Category | Verbs | +|---|---| +| **Navigation** | go to, open, navigate to, visit, reload, go back, switch to tab/window | +| **Input** | type, fill, enter, paste, clear, select (dropdown), check (checkbox), uncheck, toggle | +| **Click/hover** | click, double-click, right-click, hover, long-press | +| **Scroll/drag** | scroll to, scroll down/up, drag to, drop on | +| **Wait** | wait for, wait until, pause for | +| **File** | upload, attach, download | +| **Misc** | dismiss, accept dialog, switch frame, take screenshot | + +Always include a **starting URL** somewhere in the first action verb if the agent needs to navigate. Never assume the agent knows where to start. + +--- + +## 3. Assertions, extractions, and if/else — using checkpoints + +Checkpoints are the agent's verification primitives. There are three kinds, and each one works with every analyze method below: + +| Kind | Phrasing | What happens | +|---|---|---| +| **Assertion** | "Assert: …", "Verify …", "Confirm …" | Fails the run if the condition is false. | +| **Extraction** | "Store …", "Extract …", "Get …" | Saves a value into `run_end.final_state` for later use. | +| **If/Else** | "If … then … else …" | Branches the run based on a condition. | + +### 3.1 Analyze methods — where the agent looks + +The agent automatically picks the right method based on phrasing. To get the method you want, use the language column. + +| Method | Use it for | Phrasing the agent recognizes | +|---|---|---| +| **Visual** (default) | Visible text, prices, labels, counts, color names, visibility | "the price …", "the heading …", "is visible", "displays", "is shown" | +| **Textual (DOM)** | Element states, CSS properties, HTML attributes, exact CSS color values | "is disabled / enabled / checked / readonly", "the placeholder of …", "the aria-label of …", "the font-size of …", "rgb(…)" / "#hex" | +| **URL** | Address bar — path, query, fragment, redirects | "URL contains …", "URL path is …", "URL has param …", "redirected to …" | +| **Title** | Browser tab `document.title` | "page title contains …", "title is …" | +| **DevTools** | Things not visible on screen — network, console, performance, cookies, localStorage | see §3.2 below | + +### 3.2 DevTools analyze methods + +Five subdomains. Each one is the right choice when the data you care about lives in the browser's internals rather than on the page. + +#### Network (HTTP traffic) + +The agent captures every HTTP request/response per step. **Resets each step** — assert on traffic in the same step it happens (or extract and carry forward). + +Queryable fields: `method`, `url`, `domain`, `path`, `query_params`, `resource_type`, `request_headers`, `request_body`, `response_status`, `response_headers`, `response_body`, `timing.duration_ms`, `timing.ttfb_ms`, `failed`, `failure_reason`. + +```text +Assert: no API calls returned 5xx status codes +Assert: the POST /api/login returned HTTP status 200 +Assert: all API responses completed in under 2 seconds +Assert: no network requests failed with connection errors +Assert: the /posts endpoint returned at least 10 items in the response body + +Store the response body of the POST /api/login request +Extract the status code of the last API call to /api/users +Store all API request URLs + +If the /api/auth returned 200 then proceed to dashboard, else show error message +``` + +Limits: up to 5,000 requests per step, response bodies capped at 64KB, binary content (images/fonts/videos) skipped. + +#### Console (browser console output) + +Captures every `console.log/warn/error/info/debug` and every uncaught JS exception. **Resets each step**. Top frame only — iframes (payment widgets, third-party embeds) are not captured. + +Levels normalize to: `log`, `warning`, `error`, `info`, `debug`. `errors` includes both `console.error()` and uncaught exceptions; `exceptions` is just the uncaught-exception subset (where `is_exception: true`). + +```text +Assert: no console errors on the page +Assert: no uncaught JavaScript exceptions +Assert: no JS errors after clicking Submit +Assert: console contains "Amplitude SDK triggered" +Assert: no console warnings + +Store all console error messages +Extract the first console error text + +If console contains "feature_flag_enabled" then use new flow, else use legacy flow +``` + +#### Performance (Core Web Vitals) + +Point-in-time read of the **last full page navigation's** metrics. Place the assertion after the page has loaded; use a wait step if the page needs time to settle. + +Available metrics with good thresholds: + +| Metric | Measures | Good | +|---|---|---| +| **LCP** | Largest Contentful Paint | < 2,500ms | +| **CLS** | Cumulative Layout Shift | < 0.1 | +| **INP** | Interaction to Next Paint (requires user interaction) | < 200ms | +| **FCP** | First Contentful Paint | < 1,800ms | +| **TTFB** | Time to First Byte | < 800ms | + +```text +Assert: page LCP is under 2500ms +Assert: CLS is below 0.1 +Assert: TTFB is under 800ms +Assert: page performance meets Core Web Vitals thresholds + +Store the page LCP value +Extract all web vitals metrics +``` + +#### Cookies + +Snapshot at assertion time. Sees `httpOnly` cookies too (unlike `document.cookie`). Cookies persist across steps; asserting on a different domain may show different cookies. + +Fields: `name`, `value`, `domain`, `path`, `expires`, `http_only`, `secure`, `same_site` (`Strict`/`Lax`/`None`). + +```text +Assert: a cookie named "session_id" exists +Assert: the session cookie is httpOnly +Assert: no cookies are set without the Secure flag +Assert: the auth cookie has sameSite set to "Strict" + +Store all cookies +Extract the value of the "session_id" cookie + +If a cookie named "auth_token" exists then go to dashboard, else go to login +``` + +#### localStorage + +Snapshot at assertion time. Per-origin (protocol + domain + port). Persists across steps as long as you stay on the same origin. Values are always strings — if the app stores JSON, the value is the raw JSON string but the agent will parse it to drill into fields. + +```text +Assert: auth_token exists in localStorage +Assert: the theme preference in localStorage is "dark" +Assert: localStorage has fewer than 10 items +Assert: the "theme" field in the user_prefs localStorage item is "dark" + +Store all localStorage items +Extract the auth_token from localStorage +Get all localStorage keys + +If localStorage has "onboarding_complete" then show dashboard, else start onboarding +``` + +### 3.3 Operators + +Assertions support these comparisons. Phrase them naturally — the agent maps to the right operator. + +| Operator | Meaning | Example | +|---|---|---| +| `equals` | Exact match | "price equals $29.99", "title is 'Home'" | +| `contains` | Substring match | "URL contains /checkout" | +| `not_contains` | Does not contain | "title not contains 'Error'" | +| `gt` / `gte` | Greater than / or equal | "items greater than 5" | +| `lt` / `lte` | Less than / or equal | "LCP less than 2500" | +| `not_equals` | Not equal | "status not equals 'failed'" | + +### 3.4 Picking the right method when in doubt + +- "Is the price $29.99?" — **Visual** (it's on screen). +- "Is the submit button disabled?" — **Textual/DOM** (state, not visible text). +- "Does this red background match exactly `rgb(220, 38, 38)`?" — **Textual/DOM** (exact CSS). +- "Are we on the checkout page?" — **URL** (address bar). +- "Did the page send any failed API calls?" — **DevTools/Network**. +- "Are there console errors?" — **DevTools/Console**. +- "Is the page fast?" — **DevTools/Performance** (LCP/FCP/TTFB). +- "Did the login set a session cookie?" — **DevTools/Cookies**. +- "Did the app store the auth token?" — **DevTools/localStorage**. + +If you're not sure which method, default to **Visual** — that's what the agent does too. + +--- + +## 4. Extraction — the "store as" rule + +Vague phrasing like "read", "tell me", "report" does NOT reliably persist data. The agent may *observe* the value but won't *capture* it into `run_end.final_state`. + +```text +❌ "go to example.com and read the page title" +❌ "go to example.com and tell me the price" + +✅ "go to example.com, store the page title as 'page_title'" +✅ "go to example.com, store the price of the first item as 'price'" +``` + +For DevTools extractions, the same rule applies — use "store" or "extract": + +```text +✅ "store the response body of the POST /api/login as 'login_response'" +✅ "extract the value of the session_id cookie as 'session'" +``` + +Stored values land in `run_end.final_state` and feed the second results table per `SKILL.md §1.4`. + +--- + +## 5. Chaining — action → extraction → assertion + +Multi-clause objectives are fine — and often preferable to splitting into multiple steps when the operations are tightly coupled. + +```text +"go to {{app_url}}/dashboard, + store the welcome message as 'welcome_text', + store the user role in the sidebar as 'role', + assert the role is 'Admin'" +``` + +```text +"open https://shop.example.com, + add the first 'Wireless Headphones' result to the cart, + navigate to the cart, + store the cart total as 'total', + assert the cart contains exactly one item" +``` + +```text +"go to {{app_url}}/api-health, + store the API response body as 'health', + assert no console errors, + assert no API calls returned 5xx" +``` + +When chaining, keep each clause as a complete instruction. The agent processes them in order. + +### Splitting vs. chaining — when to break into multiple steps + +| Chain in one objective | Split into separate steps | +|---|---| +| ≤ 15 clauses, related state | > 15 reasoning steps expected | +| All happen on one page or flow | Different flows / different user roles | +| Extraction needed for the assertion in the same objective | Each step is independently testable | + +For `_test.md` step bodies, each step is its own objective — split aggressively. For one-shot `kane-cli run`, chain when the operations share state. + +--- + +## 6. Variables and context + +Use `{{name}}` syntax for values that should be parameterized: + +```text +"Log in as {{username}} with password {{password}}, then verify the dashboard loads" +``` + +**Always parameterize:** credentials, API keys, tokens, environment-specific URLs. +**OK to hardcode:** one-off URLs, static UI text, navigation paths. + +Mark credentials with `secret: true` in the variables JSON so they're masked in logs and routed to the secrets store: + +```json +{ + "username": { "value": "alice", "secret": false }, + "password": { "value": "s3cret!", "secret": true } +} +``` + +For the full variables-loading precedence and context-file behavior, Read `references/setup-and-config.md`. + +--- + +## 7. Conditional and negative patterns + +Conditional objectives let the agent handle optional UI states without failing: + +```text +"go to {{app_url}}, if a cookie banner appears then dismiss it, then assert the homepage loads" + +"open the dashboard, if a 'What's new' modal is visible then close it, then click Settings" +``` + +Negative assertions verify the *absence* of something: + +```text +"after submitting, assert no error message or red banner is visible" +"assert no console errors after clicking Save" +"assert no API calls failed during the checkout flow" +``` + +Positional assertions check where something is on the page: + +```text +"assert 'Settings' appears in the left sidebar navigation" +"assert the 'Cancel' button is on the right side of the modal footer" +``` + +--- + +## 8. Common pitfalls + +| ❌ Don't | ✅ Do | Why | +|---|---|---| +| "Test the checkout flow" | "Go to /cart, click Checkout, fill the address form with {{tester}}, click Pay, assert the order confirmation page loads" | "Test" has no end state — the agent doesn't know when to stop. | +| "Add the item" | "Click the 'Add to Cart' button on the first product card" | Vague target — agent may click the wrong element. | +| "Tell me the price" | "Store the cart total as 'total'" | Vague verbs don't extract — use "store" / "extract" / "get". | +| Hardcode credentials in the objective | Use `{{username}}` / `{{password}}` from `--variables-file` | Credentials in plain text leak into logs and TMS. | +| Omit the URL | "Go to https://example.com/login first, then …" | Agent doesn't know where to start. | +| Cram 25 operations into one objective | Split at logical boundaries (login, navigate, action, verify) | Long runs drift and stall. | +| "Check the page is fast" | "Assert LCP is under 2500ms and CLS is below 0.1" | Use the explicit web-vital metric, not a vague "fast." | +| "Make sure no errors" | "Assert no console errors and no API calls returned 5xx" | Be explicit about which kind of error you're checking. | + +--- + +## 9. Worked end-to-end examples + +### Example A — Single-page assertion suite + +```text +"go to https://shop.example.com/products/42, + assert the product title is 'Wireless Headphones', + assert the price is $129.99, + store the SKU as 'sku', + assert URL contains /products/42, + assert page LCP is under 2500ms, + assert no console errors" +``` + +This exercises Visual (title, price), Extraction (SKU), URL, Performance, and Console — all in one objective. + +### Example B — Login + dashboard verification + +```text +"open https://app.example.com/login, + log in with email {{tester.email}} and password {{tester.password}}, + assert the URL redirected to /dashboard, + assert a cookie named 'session_id' exists and is httpOnly, + assert no API calls returned 5xx during login, + store the user role from the sidebar as 'role', + assert the role is 'Admin'" +``` + +### Example C — testmd step body (same grammar) + +In a `_test.md` file: + +```markdown +## Verify checkout flow happy path +Open https://shop.example.com, log in as {{tester}}, add the first +'Wireless Headphones' result to the cart, navigate to checkout, +fill the shipping address with {{tester.address}}, click Pay. +Assert the order confirmation page loads. +Assert no console errors and no API calls returned 5xx. +Store the order number as 'order_id'. +``` + +The step body is exactly the same grammar as `kane-cli run`. Everything in this cookbook applies. diff --git a/.claude/skills/kane-cli/references/parallel.md b/.claude/skills/kane-cli/references/parallel.md new file mode 100644 index 0000000..e9a6cbc --- /dev/null +++ b/.claude/skills/kane-cli/references/parallel.md @@ -0,0 +1,65 @@ + + +# Parallel Execution + +For multiple independent browser tasks, decompose and run in parallel using the Agent tool. + +## When to Split + +- **>15 steps** — long runs drift and get stuck +- **Independent flows** — login test and search test don't depend on each other +- **Different pages/features** — settings vs checkout vs admin +- **Different user roles** — admin flow vs regular user flow + +## How to Split + +Each sub-objective must be **self-contained**: navigates to its own URL, authenticates independently, asserts its own outcomes. No sub-objective depends on another having run first. + +## Execution Pattern + +1. Decompose the user's request into N independent sub-objectives +2. Spawn N Agent tool calls in a **single message** — each runs: + ```bash + kane-cli run "Go to and " --agent --headless --timeout 120 + ``` +3. Each agent parses the NDJSON output, waits for `run_end`, returns: status, steps, duration, summary, session path +4. After ALL agents complete, format the batch summary + +## Agent Prompt Template + +```text +Run this kane-cli browser test and report results: + + kane-cli run "Go to and " --agent --headless --timeout 120 + +After the command completes: +1. Capture the exit code +2. Parse the run_end NDJSON event from stdout +3. If failed, read the failing step's screenshot from run_dir +4. Return: {status, steps, duration, summary, session_dir, failure_step, screenshot_path} +``` + +## Batch Summary Format + +```markdown +## 🧪 Test Suite: + +| # | Test | Status | Steps | Time | What happened | +|---|------|--------|-------|------|---------| +| 1 | Login + dashboard | ✅ | 5 | 12s | Welcome banner visible | +| 2 | Product search | ✅ | 7 | 18s | 3 results for 'shoes' | +| 3 | Checkout flow | ❌ | 9 | 25s | Payment form did not load | +| 4 | Admin CSV export | ✅ | 6 | 15s | CSV downloaded (42 rows) | + +### 📊 Overall +- **Pass rate:** 3/4 (75%) +- **Total steps:** 27 · **Total time:** 1m10s + +### ❌ Failures +**#3 Checkout flow** — Payment form did not load after clicking "Credit Card". +📸 [screenshot of the failure shown inline] +``` + +Status icons: ✅ passed · ❌ failed · ⚠️ stuck/timeout + +**Do not** show raw file paths (like `~/.testmuai/kaneai/sessions/...`) in the summary. Instead, read the screenshot and show it inline, or offer to inspect logs only if the user asks. diff --git a/.claude/skills/kane-cli/references/parsing.md b/.claude/skills/kane-cli/references/parsing.md new file mode 100644 index 0000000..517b817 --- /dev/null +++ b/.claude/skills/kane-cli/references/parsing.md @@ -0,0 +1,101 @@ + + +# Parsing --agent Output + +> **Internal reference only.** Everything in this section (field names, event types, JSON structure) is for you to parse programmatically. **Never expose these internal terms to the user.** The user should see plain-language summaries, not `run_end`, `final_state`, `bifurcation`, `NDJSON`, `session_dir`, or any raw JSON fields. + +With `--agent`, kane-cli outputs one JSON object per line to **stdout**. Progress UI renders to **stderr**. + +## Event Types + +**Progress events** (bulk of the output — one per step): + +```json +{"step": 1, "status": "passed", "remark": "Navigated to amazon.in"} +{"step": 2, "status": "passed", "remark": "Typed 'laptop' in search box"} +{"step": 3, "status": "failed", "remark": "Could not find Add to Cart button"} +``` + +| Field | Type | Description | +|-------|------|-------------| +| `step` | number | Step index (1-based) | +| `status` | string | `"passed"` or `"failed"` | +| `remark` | string | What the agent did or why it failed | + +These are **untyped** — they have no `type` field. Do **not** key on `event.type === 'step_start'` or `'step_end'`; those event types are not emitted. + +**Flow events:** + +| Event (`type` field) | Key Fields | Purpose | +|-------|-----------|---------| +| `bifurcation` | `flows[]`, `count` | Agent split objective into sub-flows | +| `child_agent_start` | `child_id`, `objective`, `parent_step` | Child agent spawned | +| `child_agent_end` | `child_id`, `success`, `steps_taken`, `summary` | Child agent finished | +| `ask_user` | `question`, `step_index`, `options?` | Agent needs user input | +| `error` | `message` | Error occurred | + +**Note:** There is no `run_start` event — the first line is either a `bifurcation` or a progress object. + +**Note:** `ask_user` is auto-disabled when stdin is not a TTY. Since agents typically run kane-cli as a subprocess, ask_user events will not be emitted. Write objectives that don't require interactive input. + +## Parsing Strategy + +Since progress events lack a `type` field, distinguish them from typed events like this: + +``` +for each line of NDJSON: + if obj.type === "run_end" → terminal event, stop parsing + if obj.type === "bifurcation" → flow split + if obj.type exists → other typed event + if obj.step exists → progress event (step/status/remark) +``` + +**Build automation on `run_end`** — it is the only event guaranteed to have a stable schema across versions. Use progress events for live status display only. + +**Terminal event** (always the last line): + +```json +{ + "type": "run_end", + "status": "passed", + "summary": "Searched for laptop and added first result to cart", + "one_liner": "Searched for laptop on Amazon and added to cart", + "reason": "Objective completed", + "duration": 45.2, + "credits": 12, + "final_state": { + "price": "$29.99", + "product_name": "Wireless Headphones" + }, + "context": { + "memory": {}, + "variables": {}, + "pointer": "(passed) Searched for laptop and added first result to cart" + }, + "session_dir": "~/.testmuai/kaneai/sessions/a1b2c3d4-e5f6-7890-abcd-ef1234567890", + "run_dir": "~/.testmuai/kaneai/sessions/a1b2c3d4-e5f6-7890-abcd-ef1234567890/runs/0", + "test_url": "https://test-manager.lambdatest.com/projects/123/test-cases/456" +} +``` + +Key `run_end` fields: +- `status` — `"passed"` or `"failed"` +- `summary` — what the agent did +- `one_liner` — short summary for display +- `reason` — why it stopped +- `credits` — credits consumed by the run (when reported) +- `final_state` — extracted values from "store as" objectives +- `test_url` — link to KaneAI dashboard (if upload succeeded) +- `session_dir` / `run_dir` — paths to log files + +## Responding to `ask_user` (if stdin is a TTY) + +```json +{"type": "user_response", "answer": "Medium size"} +``` + +To cancel a run: + +```json +{"type": "cancel"} +``` diff --git a/.claude/skills/kane-cli/references/setup-and-config.md b/.claude/skills/kane-cli/references/setup-and-config.md new file mode 100644 index 0000000..81a4fe6 --- /dev/null +++ b/.claude/skills/kane-cli/references/setup-and-config.md @@ -0,0 +1,140 @@ + + +# kane-cli Setup, Variables, and Config Reference + +## Install and auth + +Before first use, verify installation and auth. + +### Install + +```bash +npm install -g @testmuai/kane-cli +``` + +### Check Auth Status + +```bash +kane-cli whoami +``` + +If this shows "not configured" or errors, run login: + +### Login (Basic Auth) + +```bash +kane-cli login --username --access-key +``` + +This creates the default profile with basic auth, auto-selects the KaneAI project, and marks setup complete. Credentials come from the user's TestmuAI dashboard (Settings → Keys). + +Optional flag: +- `--profile ` — profile name (default: last selected profile check using `config show`) + +### Login (OAuth) + +```bash +kane-cli login --oauth +``` + +This opens the browser for OAuth consent and waits for the callback. Works in both TTY and non-TTY (agent) mode. + +### Login (Interactive — TTY only) + +In a terminal, run `kane-cli login` with no flags for the interactive wizard (auth method → project picker → folder picker). If the user needs this, ask them to run it directly: + +> Please run `! kane-cli login` and complete the sign-in. + +### Verify + +```bash +kane-cli whoami # Auth status +kane-cli config show # Current configuration +``` + +## Variables — full precedence chain + +Variables parameterize objectives with reusable values and secrets. Use `{{key}}` syntax in objectives. + +**Format:** +```json +{ + "username": { "value": "alice", "secret": false }, + "password": { "value": "s3cret!", "secret": true } +} +``` + +`secret: true` masks the value in logs and routes it to TestmuAI's secrets store instead of being synced as plain TMS variables. + +**Loading order** (later wins): +1. `~/.testmuai/kaneai/variables/*.json` (global, alphabetical) +2. `{cwd}/.testmuai/variables/*.json` (local project overrides) +3. `--variables-file ` +4. `--variables '{...}'` (inline JSON) + +**Always parameterize:** credentials, API keys, tokens, environment-specific URLs. +**OK to hardcode:** one-off URLs, static UI text, navigation paths. + +## Context files + +Context files provide additional instructions to the agent: +- **Global:** `~/.testmuai/kaneai/global-memory.md` — shared across all runs +- **Local:** `.testmuai/context.md` in cwd — project-specific + +Override per-run with `--global-context` / `--local-context` flags. + +## Config commands + +```bash +kane-cli config show # Show all current settings +kane-cli config set-window x # Browser window size (e.g. 1920x1080) +kane-cli config chrome-profile # Chrome profile path (or interactive picker in TTY) +kane-cli config project # TMS project ID (or interactive picker in TTY) +kane-cli config folder # TMS folder ID (or interactive picker in TTY) +``` + +### Feedback + +Submit feedback on a completed test run: +```bash +kane-cli feedback --test-id --feedback-type --details "..." +``` + +## Directory structure + +```text +~/.testmuai/kaneai/ +├── tui-config.json # Persistent CLI settings +├── config.json # Shared auth configuration +├── global-memory.md # Global agent context +├── chrome-profile/ # Default Chrome user profile +├── profiles/ # Stored credentials +│ └── {profile}/{env}/ +│ └── credentials +├── sessions/ # Session history +│ └── {session-id}/ +│ ├── session.json # Metadata, run list, upload status +│ ├── tui.log # Session event log +│ ├── runs/{n}/ +│ │ └── run-test/ +│ │ └── actions.ndjson # Step-by-step record of agent actions +│ └── code-export/ # (when --code-export) generated code files +└── variables/ # Global variable files + └── *.json + +# Project-local overrides (in cwd): +.testmuai/ +├── context.md # Project-specific agent context +└── variables/ + └── *.json # Project-specific variables +``` + +## Chrome management + +kane-cli auto-launches Chrome with CDP (DevTools Protocol) on ports 9222–9230. Chrome runs as a detached process and outlives the CLI. + +- `--headless` — runs Chrome in headless mode (no visible window) +- `--cdp-endpoint ` — connect to an already-running Chrome instance +- `--ws-endpoint ` — connect to a remote browser (LambdaTest grid) + +If Chrome fails to launch, ensure Google Chrome is installed and no other process is using CDP ports 9222–9230. diff --git a/.claude/skills/kane-cli/references/testmd.md b/.claude/skills/kane-cli/references/testmd.md new file mode 100644 index 0000000..5ee1fbb --- /dev/null +++ b/.claude/skills/kane-cli/references/testmd.md @@ -0,0 +1,217 @@ + + +# Saving & Replaying Tests with testmd + +The §3 `run` command is the **primary** mode — one-shot, ephemeral. `testmd` is the **secondary** mode: tests live as `_test.md` files on disk, each step is cached on the first run, and every later run **replays from cache** with no LLM cost. + +Use `testmd` whenever the user wants the test to persist. The decision is binary — once a test exists as a file, every later invocation is `testmd run`, never `run`. + +## When to switch from `run` to `testmd` + +| User says | Use | +|---|---| +| "save this test", "commit this", "keep this", "add this to the suite" | `testmd` | +| "regression test", "smoke test", "make this replayable" | `testmd` | +| "this is a test", "test the X flow end-to-end" (suite-shaped) | `testmd` | +| "run this once", "check if X works right now", "try X" | `run` (§3) | +| "search for", "click", "fill", "verify" (one-shot) | `run` (§3) | + +If unclear, ask: "Do you want me to save this test so you can re-run it later?" + +## Quick start + +Write the file (any path; filename must end in `_test.md`): + +```markdown +--- +mode: testing +max_steps: 30 +--- + +# Amazon search + +## Open Amazon +Open https://www.amazon.com. + +## Search for headphones +Type "wireless headphones" into the search box and submit. +Verify at least one product result is visible. +``` + +Run it: + +```bash +kane-cli testmd run amazon_test.md --agent +``` + +## File format + +Four parts in order: + +1. **YAML frontmatter** — between `--- ... ---` at the very top. +2. **`# Title`** — decorative; everything before the first `## ` is ignored. +3. **`## H2` step headings** — one per step. The agent reads the step body, not the heading. +4. **Step body** — either prose **or** a single `@import ` line. Never both. Prose bodies are objectives with the same grammar as `kane-cli run` — for the full pattern catalog (action verbs, assertion analyze methods, checkpoint types, chaining, worked examples), Read `references/objectives-cookbook.md`. + +Per-step `yaml` overrides go immediately under the heading, in a fenced block: + +````markdown +## Submit the form +```yaml +timeout: 90 +optional: true +``` +Click submit and verify the confirmation banner. +```` + +**Frontmatter keys to use:** + +| Key | Scope | Description | +|---|---|---| +| `mode` | root | `action` (halts on auth walls) or `testing` (default — pushes through so negative-test assertions can fire) | +| `max_steps` | root + step | Max agent reasoning steps. Default `30`. | +| `timeout` | root + step | Hard kill per step in seconds. | +| `headless` | root | No browser window. | +| `variables` | root + step | `{{name}}` params, same shape as §3, with `secret: true` for credentials | +| `global_context` / `local_context` | root + step | Inline Markdown or path | +| `code_export` / `code_language` | root + step | Generate Playwright after the run; language `python` or `javascript` | + +Files ending in `_test.md` are tests (valid entry points). Any other `.md` is a helper — reachable only via `@import`. + +## The replay & cascade rule (CRITICAL) + +On the **first** run of a test, the agent authors each step and saves a recording. On **every later run**, each step replays from its recording — no agent, no LLM cost, much faster. + +A step replays only if **all** of these hold: +- A recording for that step exists, +- Its prose is unchanged since the recording, +- Its `yaml` block is unchanged, +- No earlier step in the file invalidated it. + +**Editing step N re-authors step N AND every step after it in the same file.** Each step starts where the previous step left off (URL, login, tabs). When step 3 changes, step 4 cannot safely replay against state that no longer exists. + +Consequences when editing tests: +- A one-line tweak at the top of a 20-step test re-authors all 20 steps on the next run. +- To re-record only one step, edit only that step (or steps after it). +- `--author` forces full authoring for one run (debugging only). +- `rm -rf output-/` wipes the cache entirely. + +## `@import` for reusing flows + +Extract a repeating flow (login, setup, cookie banner dismissal) into a helper file: + +```markdown +## Sign in +@import ./helpers/login.md +``` + +Rules: +- Helper filename **must not** end in `_test.md`. +- Path resolves relative to the **importing file**, not the shell's cwd. +- The step body must be exactly `@import ` — no mixed prose, no extra lines. +- The step's `yaml` block may contain **only** `optional`. Other keys are rejected. +- `optional: true` on `@import` is allowed only at the root file, not on a nested import. + +Variables and context propagate into helpers. Chrome / `mode` / auth do not (root-only). + +Editing a helper re-authors that step in **every test that imports it**, plus everything after the import in those tests. Same cascade rule. + +## Commands + +| Command | Use | +|---|---| +| `kane-cli testmd run --agent [flags]` | Run a test | +| `kane-cli testmd list` | List `*_test.md` files under cwd (NDJSON when non-TTY) | +| `kane-cli testmd status ` | Test Manager identity + local-sync state | +| `kane-cli testmd export [--code-language python\|javascript]` | Regenerate code export from existing recordings (no browser launch) | +| `kane-cli testmd delete ` | Local-only delete: removes source + `output-/`. Does NOT delete from Test Manager. | + +**Flags on `testmd run` that don't exist on §3 `run`:** + +| Flag | Default | Description | +|---|---|---| +| `--name ` | none | Persist the run under this name. Regex `[a-zA-Z0-9_-]+`. | +| `--on-lock-conflict ` | none | Behavior when another user holds the test's edit lock. `readonly` = replay-only / no upload, `fail` = exit 2, `wait` = block until released | +| `--retry` | off | On replay failure, restart with a shrinking replay window | +| `--retry-count ` | `3` | Max retry restarts before falling back to full re-author | +| `--author` | off | Force authoring every step (skip replay decision) | + +All §3 `run` flags also apply (`--agent`, `--headless`, `--max-steps`, `--timeout`, `--variables`, etc.). + +Flag wins over frontmatter for everything **except** `variables` — the file owns variables; you can add new keys via flags but cannot override file-defined ones. + +## Output: `output-/` and `Result.md` + +After a run: + +```text +amazon_test.md +output-amazon/ + Result.md # human-readable run report + .internal/ # cached recordings — do not edit + playwright-python-code/ # only if code_export enabled +``` + +**`output-/` is commit-safe and should be committed to git.** That's how teammates and CI replay the same recordings. + +For tests using `@import`, helper recordings land next to the helper file in `helper-output---/` directories. Also commit-safe. + +**`Result.md`** opens in any Markdown viewer. It contains: +- Frontmatter — `status`, `started`, `duration_s`, `session_id` +- One entry per root step with one of `✓ passed`, `✗ failed`, `⏭ skipped`, optionally suffixed `(optional)` when a soft-failing step failed but the run continued +- For `@import` steps that failed, a path to the failing sub-step inside the helper + +When the user asks "did the test pass?" or "where did it fail?" for a previously-run test, read `Result.md` rather than re-running the test. + +## Recording a `_test.md` from a live session + +If the user runs an ad-hoc objective with §3 `run` and decides to keep it: + +```bash +kane-cli run "Search for noise-cancelling headphones on amazon.com" --name amazon-search +``` + +On exit, kane-cli writes `/.testmuai/tests/amazon-search_test.md`. Move that file into the user's repo and re-run it with `testmd run`. Without `--name`, an ad-hoc `run` is ephemeral and nothing is written. + +## CI invocation + +```bash +kane-cli testmd run ./tests/checkout_test.md \ + --agent \ + --headless \ + --on-lock-conflict wait \ + --retry +``` + +- `--agent` — NDJSON to stdout (auto-enabled when stdin is not a TTY; pass explicitly anyway). +- `--headless` — no window. +- `--on-lock-conflict wait` — block instead of failing if a teammate is editing the same test. +- `--retry` — automatically recover transient replay failures. + +Exit codes: + +| Code | Meaning | +|------|---------| +| 0 | ✅ Passed | +| 1 | ❌ Failed | +| 2 | ⚠️ Error (auth, setup, infra) — for `testmd`, also includes parse errors and `--on-lock-conflict fail` | +| 3 | ⏱️ Timeout or cancelled — for `testmd`, also includes `--on-lock-conflict wait` timeout | + +## Parse errors (when writing a `_test.md`) + +Parse errors abort **before** any browser launch with exit `2`. Common ones and the fix: + +| Message | Fix | +|---|---| +| `frontmatter is missing closing '---'` | Add the trailing `---` | +| `invalid YAML in frontmatter` | Re-validate the YAML block | +| `step body must be exactly one of prose / @import` | Split into two steps | +| `step config on @import may only contain 'optional'` | Remove other keys from the yaml block | +| `cannot @import a test file` | Imports may only reference helpers (not ending in `_test.md`) | +| `cyclic reference` | Restructure helpers to break the loop | +| `chrome config is global-only` | Move Chrome key to root frontmatter | +| `'' is run-level and cannot be set per-step` | Move `mode` / `on_lock_conflict` to root frontmatter | +| `unknown config key` | Remove or fix the key | +| `auth/identity keys are CLI-only` | Pass `username` / `access_key` as CLI flags, not in frontmatter | + +When the user reports a parse error, fix the file before retrying — don't loop on the same error. diff --git a/docs/user-guide/features/checkpoints/devtools/console.md b/docs/user-guide/features/checkpoints/devtools/console.md new file mode 100644 index 0000000..1539783 --- /dev/null +++ b/docs/user-guide/features/checkpoints/devtools/console.md @@ -0,0 +1,75 @@ +# Console Assertions + +Console assertions let you verify browser console output — error messages, warnings, log messages, and uncaught JavaScript exceptions. + +## How Capture Works + +KaneAI captures all browser console output automatically during each test step: + +- **Continuous capture**: Every `console.log()`, `console.warn()`, `console.error()`, and uncaught JS exception is recorded +- **Per-step scope**: Each test step starts with a fresh capture. Console messages from previous steps are not carried over +- **Limits**: Up to 50,000 messages are stored per step. When the limit is reached, the oldest 10% are dropped +- **No truncation**: Message text is stored in full — unlike network response bodies, console messages are never truncated +- **Object resolution**: When JavaScript logs an object (`console.log({status: "ok"})`), KaneAI resolves it to the actual value instead of storing "JSHandle@object" +- **Multi-tab support**: Console messages from new tabs and popups are also captured +- **Top-frame only**: Messages from embedded iframes (payment widgets, third-party components) are not captured + +### Planning for Multi-Step Tests + +Because console data resets each step, plan accordingly: + +- If you need to verify a console message **later**, extract and store it in the same step it appears +- Console output from step 1 won't be visible in step 3's console log + +### Level Normalization + +Console message levels are normalized to 5 values: + +| Level | What triggers it | +|-------|-----------------| +| `log` | `console.log()`, `console.dir()`, `console.table()`, and other info-level calls | +| `warning` | `console.warn()` | +| `error` | `console.error()` and uncaught exceptions | +| `info` | `console.info()` | +| `debug` | `console.debug()` | + +## What You Can Query + +| Field | Type | Description | +|-------|------|-------------| +| `level` | string | Message level: "log", "warning", "error", "info", "debug" | +| `text` | string | Full message text (not truncated) | +| `url` | string | Source file URL | +| `line_number` | int | Source line number | +| `is_exception` | bool | True for uncaught JS exceptions (pageerror events) | +| `stack_trace` | string | Stack trace (exceptions only) | + +### Errors vs Exceptions + +- **`errors`** includes ALL error-level messages: both `console.error()` calls AND uncaught exceptions +- **`exceptions`** is a subset of errors — only uncaught JavaScript exceptions +- To check for app-level errors without exceptions: query errors where `is_exception` is false + +## Example Assertions + +``` +Assert: no console errors on the page +Assert: no uncaught JavaScript exceptions +Assert: console contains "Amplitude SDK triggered" +Assert: no console warnings +Assert: no JS errors after clicking Submit +``` + +## Example Extractions + +``` +Store all console error messages +Extract the first console error text +Store all console log output +``` + +## Example If/Else + +``` +If console contains "feature_flag_enabled" then use new flow, else use legacy flow +``` diff --git a/docs/user-guide/features/checkpoints/devtools/cookies.md b/docs/user-guide/features/checkpoints/devtools/cookies.md new file mode 100644 index 0000000..fe07933 --- /dev/null +++ b/docs/user-guide/features/checkpoints/devtools/cookies.md @@ -0,0 +1,58 @@ +# Cookie Assertions + +Cookie assertions let you verify browser cookies — check existence, values, and security attributes like httpOnly, secure, and sameSite. + +## How Capture Works + +Cookies are captured as a **point-in-time snapshot** when the checkpoint triggers: + +- **On-demand capture**: Cookies are read from the browser context at the moment the assertion runs — not accumulated over time +- **Current state only**: You see exactly what cookies exist right now, including any set by the page's JavaScript or server responses +- **All cookies visible**: Unlike `document.cookie` in JavaScript, KaneAI can see httpOnly cookies too +- **Domain-scoped**: Cookies are captured for all domains the browser has visited in this session + +### Planning for Multi-Step Tests + +Because cookies are captured at assertion time: + +- If you need to check cookies set by a specific page, assert on the **same page** or **after** you've visited it +- If cookies are needed in a later step (e.g., after navigating away), extract and store them first +- Cookies persist in the browser across steps (unlike network/console which reset) — but asserting on a different domain may show different cookies + +## What You Can Query + +| Field | Type | Description | +|-------|------|-------------| +| `name` | string | Cookie name (e.g., "session_id") | +| `value` | string | Cookie value | +| `domain` | string | Domain (e.g., ".example.com") | +| `path` | string | Cookie path (e.g., "/") | +| `expires` | float | Expiry as epoch seconds (-1 for session cookies) | +| `http_only` | bool | True if HttpOnly flag is set | +| `secure` | bool | True if Secure flag is set | +| `same_site` | string | "Strict", "Lax", or "None" | + +## Example Assertions + +``` +Assert: a cookie named "session_id" exists +Assert: the session cookie is httpOnly +Assert: no cookies are set without the Secure flag +Assert: at least 3 cookies are set on the page +Assert: the auth cookie has sameSite set to "Strict" +``` + +## Example Extractions + +``` +Store all cookies +Extract the value of the "session_id" cookie +Store all cookie names +Get all cookies for the example.com domain +``` + +## Example If/Else + +``` +If a cookie named "auth_token" exists then go to dashboard, else go to login +``` diff --git a/docs/user-guide/features/checkpoints/devtools/local-storage.md b/docs/user-guide/features/checkpoints/devtools/local-storage.md new file mode 100644 index 0000000..ed92235 --- /dev/null +++ b/docs/user-guide/features/checkpoints/devtools/local-storage.md @@ -0,0 +1,70 @@ +# localStorage Assertions + +localStorage assertions let you verify data stored in the browser's `window.localStorage` — check key existence, values, and item counts. + +## How Capture Works + +localStorage is captured as a **point-in-time snapshot** when the checkpoint triggers: + +- **On-demand capture**: localStorage is read from the current page at the moment the assertion runs +- **Current state only**: You see exactly what's in localStorage right now +- **Domain-scoped**: localStorage is per-origin (protocol + domain + port). You only see data for the current page's origin +- **String values**: All localStorage values are strings. If the application stores JSON objects, they're stored as JSON strings + +### Planning for Multi-Step Tests + +Because localStorage is captured at assertion time: + +- Assert on localStorage while you're **on the page** that set the values — navigating to a different domain means a different localStorage +- If values are needed later, extract and store them before navigating away +- localStorage persists across steps (unlike network/console) as long as you stay on the same origin + +### JSON Values + +Applications often store structured data in localStorage as JSON strings: + +```javascript +// Application code +localStorage.setItem("user_prefs", JSON.stringify({theme: "dark", lang: "en"})); +``` + +In assertions, the value is the raw JSON string. You can parse it to check individual fields: + +``` +Assert: the "theme" field in the user_prefs localStorage item is "dark" +``` + +KaneAI will parse the JSON and drill into the value automatically. + +## What You Can Query + +| Method | Returns | Description | +|--------|---------|-------------| +| `storage.all()` | dict | All key-value pairs | +| `storage.get(key)` | string or None | Value for a specific key | +| `storage.keys()` | list of strings | All key names | +| `storage.has(key)` | bool | Whether a key exists | + +## Example Assertions + +``` +Assert: auth_token exists in localStorage +Assert: the theme preference in localStorage is "dark" +Assert: localStorage has fewer than 10 items +Assert: the user_id value in localStorage is not empty +``` + +## Example Extractions + +``` +Store all localStorage items +Extract the auth_token from localStorage +Store the user preferences from localStorage +Get all localStorage keys +``` + +## Example If/Else + +``` +If localStorage has "onboarding_complete" then show dashboard, else start onboarding +``` diff --git a/docs/user-guide/features/checkpoints/devtools/network.md b/docs/user-guide/features/checkpoints/devtools/network.md new file mode 100644 index 0000000..560533c --- /dev/null +++ b/docs/user-guide/features/checkpoints/devtools/network.md @@ -0,0 +1,65 @@ +# Network Assertions + +Network assertions let you verify HTTP traffic — API responses, status codes, headers, response bodies, and request timing. + +## How Capture Works + +KaneAI captures all HTTP network traffic automatically during each test step: + +- **Continuous capture**: Every HTTP request and response is recorded as it happens +- **Per-step scope**: Each test step starts with a fresh capture. Traffic from previous steps is not carried over +- **Limits**: Up to 5,000 requests are stored per step. When the limit is reached, the oldest 10% of entries are dropped to make room. Response bodies are capped at 64KB per entry +- **Text bodies only**: Response bodies are captured for text-based content types (JSON, HTML, XML, CSS, JavaScript). Binary content (images, fonts, videos) is skipped +- **Multi-tab support**: Traffic from new tabs and popups is also captured + +### Planning for Multi-Step Tests + +Because network data resets each step, plan accordingly: + +- If you need to assert on an API response **later**, extract and store it in the same step the request happens +- Navigation and API calls in step 1 won't be visible in step 3's network log +- Use extraction checkpoints to save values across steps + +## What You Can Query + +| Field | Type | Description | +|-------|------|-------------| +| `method` | string | HTTP method (GET, POST, PUT, DELETE, ...) | +| `url` | string | Full request URL | +| `domain` | string | Domain (e.g., "api.example.com") | +| `path` | string | URL path without query string | +| `query_params` | dict | Query parameters | +| `resource_type` | string | xhr, fetch, document, script, image, ... | +| `request_headers` | dict | Request headers | +| `request_body` | string | Request body (may be truncated) | +| `response_status` | int | HTTP status code (200, 404, 500, ...) | +| `response_headers` | dict | Response headers | +| `response_body` | string | Response body (text types only, may be truncated) | +| `timing.duration_ms` | float | Total request duration in milliseconds | +| `timing.ttfb_ms` | float | Time to first byte in milliseconds | +| `failed` | bool | True if request failed at network level | +| `failure_reason` | string | Error reason (e.g., "net::ERR_CONNECTION_REFUSED") | + +## Example Assertions + +``` +Assert: no API calls returned 5xx status codes +Assert: the POST /api/login returned HTTP status 200 +Assert: all API responses completed in under 2 seconds +Assert: no network requests failed with connection errors +Assert: the /posts endpoint returned at least 10 items in the response body +``` + +## Example Extractions + +``` +Store the response body of the POST /api/login request +Extract the status code of the last API call to /api/users +Store all API request URLs +``` + +## Example If/Else + +``` +If the /api/auth returned 200 then proceed to dashboard, else show error message +``` diff --git a/docs/user-guide/features/checkpoints/devtools/overview.md b/docs/user-guide/features/checkpoints/devtools/overview.md new file mode 100644 index 0000000..1d12c2d --- /dev/null +++ b/docs/user-guide/features/checkpoints/devtools/overview.md @@ -0,0 +1,48 @@ +# DevTools Assertions + +DevTools assertions let you verify data that isn't visible on the page — HTTP network traffic, browser console output, performance metrics, cookies, and localStorage. KaneAI captures this data automatically in the background; you just write what to check. + +## Available Domains + +| Domain | What It Captures | Documentation | +|--------|-----------------|---------------| +| [Network](./network.md) | HTTP requests and responses | Status codes, headers, response bodies, timing | +| [Console](./console.md) | Browser console messages | Errors, warnings, log messages, JS exceptions | +| [Performance](./performance.md) | Core Web Vitals | LCP, CLS, INP, FCP, TTFB | +| [Cookies](./cookies.md) | Browser cookies | Names, values, flags (httpOnly, secure, sameSite) | +| [localStorage](./local-storage.md) | Browser localStorage | Key-value pairs stored in the browser | + +## How It Works + +Each DevTools domain follows the same pattern: + +1. **Capture** — KaneAI captures the data automatically during your test run +2. **Generate** — When a checkpoint triggers, the AI generates code to query the captured data +3. **Execute** — The code runs in an isolated sandbox and returns a result +4. **Assert** — The result is compared against your expected value + +You don't write code — you write natural language objectives, and KaneAI handles the rest. + +## Examples + +``` +Assert: no API calls returned 5xx +Assert: no console errors on the page +Assert: page LCP is under 2500ms +Assert: session cookie exists and is httpOnly +Assert: auth_token is stored in localStorage +``` + +## All Checkpoint Types Work + +DevTools assertions support all three checkpoint types: + +- **Assert**: "Assert: no console errors" — fails the test if there are errors +- **Extract**: "Store all cookies" — saves the data for later steps +- **If/Else**: "If the API returned 200 then proceed, else retry" — branch on the result + +## Important Notes + +- DevTools data is **not visible in a screenshot** — KaneAI will never try to open the browser DevTools panel +- Each domain captures data differently — see individual pages for details on timing and scope +- All assertions run in an isolated sandbox — generated code cannot access the file system or network diff --git a/docs/user-guide/features/checkpoints/devtools/performance.md b/docs/user-guide/features/checkpoints/devtools/performance.md new file mode 100644 index 0000000..086d5fe --- /dev/null +++ b/docs/user-guide/features/checkpoints/devtools/performance.md @@ -0,0 +1,59 @@ +# Performance Assertions (Web Vitals) + +Performance assertions let you verify [Core Web Vitals](https://web.dev/articles/vitals) and other key performance metrics for the current page. + +## How Capture Works + +Performance data is **navigation-based** — metrics are measured for the most recent page navigation: + +- **Automatic measurement**: KaneAI uses the [web-vitals](https://github.com/GoogleChrome/web-vitals) library to capture metrics +- **Per-navigation scope**: Metrics reflect the last full page load. If you navigate to a new page, the metrics reset for that navigation +- **Point-in-time snapshot**: When a performance checkpoint triggers, KaneAI captures the current metrics at that moment + +### What This Means for Your Tests + +- Performance metrics describe the **last navigation** — if you navigate to Page A then Page B, the metrics reflect Page B +- Place performance assertions **after** the page you want to measure has fully loaded +- Use a wait step if the page needs time to settle before measuring + +## Available Metrics + +| Metric | What It Measures | Good Threshold | Learn More | +|--------|-----------------|----------------|------------| +| **LCP** | Largest Contentful Paint — when the largest visible element finishes rendering | < 2,500ms | [web.dev/lcp](https://web.dev/articles/lcp) | +| **CLS** | Cumulative Layout Shift — visual stability, how much the page layout shifts | < 0.1 | [web.dev/cls](https://web.dev/articles/cls) | +| **INP** | Interaction to Next Paint — responsiveness to user input | < 200ms | [web.dev/inp](https://web.dev/articles/inp) | +| **FCP** | First Contentful Paint — when the first content appears on screen | < 1,800ms | [web.dev/fcp](https://web.dev/articles/fcp) | +| **TTFB** | Time to First Byte — server response time | < 800ms | [web.dev/ttfb](https://web.dev/articles/ttfb) | + +> **Note**: Not all metrics are available for every page. INP requires user interaction to trigger. Some metrics may be `null` if the browser hasn't measured them yet. + +## Example Assertions + +``` +Assert: page LCP is under 2500ms +Assert: CLS is below 0.1 +Assert: TTFB is under 800ms +Assert: FCP is less than 1800ms +Assert: page performance meets Core Web Vitals thresholds +``` + +## Example Extractions + +``` +Store the page LCP value +Extract all web vitals metrics +Store the TTFB for this page +``` + +## Example If/Else + +``` +If LCP is under 2500ms then continue, else report performance issue +``` + +## Tips + +- **Wait for load**: Place a wait step before performance assertions to ensure the page has fully loaded and metrics are available +- **Navigate first**: Metrics are per-navigation — make sure you've navigated to the target page before asserting +- **Not all metrics are instant**: CLS accumulates over time, INP requires interaction. LCP and FCP are typically available after the page visually completes loading diff --git a/docs/user-guide/features/checkpoints/overview.md b/docs/user-guide/features/checkpoints/overview.md new file mode 100644 index 0000000..d1f209c --- /dev/null +++ b/docs/user-guide/features/checkpoints/overview.md @@ -0,0 +1,71 @@ +# Checkpoints + +Checkpoints are verification points that KaneAI evaluates during test execution. They let you assert conditions, branch on results, or extract values for later use. + +## Checkpoint Types + +| Type | What it does | +|------|-------------| +| **Assertion** | Verify a condition is true — fails the test if not | +| **If/Else** | Branch execution based on a condition | +| **Extraction** | Store a value for use in later steps | + +All three types work with every analyze method below. + +## Analyze Methods + +Each checkpoint uses an analyze method to determine *where* to look for the data: + +| Method | Data Source | When to Use | +|--------|-----------|-------------| +| [Visual](./visual.md) | Screenshot (what you see on screen) | Text, labels, prices, counts, colors, visibility checks | +| [Textual (DOM)](./textual.md) | Page DOM elements | Element states (disabled, checked), CSS properties, HTML attributes | +| [URL](./url.md) | Browser URL bar | URL path, query params, redirects | +| [Title](./title.md) | Page title | Document title verification | +| [DevTools](./devtools/) | Browser internals | Network traffic, console logs, performance, cookies, localStorage | + +## How to Use + +Write your assertions naturally in the objective. KaneAI automatically picks the right analyze method: + +``` +Assert: the price is $29.99 → Visual +Assert: the submit button is disabled → Textual (DOM) +Assert: URL contains /checkout → URL +Assert: page title contains "Dashboard" → Title +Assert: no API calls returned 5xx → DevTools (Network) +Assert: no console errors → DevTools (Console) +Assert: page LCP is under 2500ms → DevTools (Performance) +Assert: session cookie exists → DevTools (Cookies) +Assert: auth_token exists in localStorage → DevTools (localStorage) +``` + +Extractions work the same way: + +``` +Store the product price → Visual +Store the current URL → URL +Store all cookies → DevTools (Cookies) +Store the API response body → DevTools (Network) +``` + +## Operators + +Assertions support these comparison operators: + +| Operator | Meaning | Example | +|----------|---------|---------| +| `equals` | Exact match | price equals "29.99" | +| `contains` | Substring match | URL contains "/checkout" | +| `not_contains` | Does not contain | title not contains "Error" | +| `gt` / `gte` | Greater than / or equal | items greater than 5 | +| `lt` / `lte` | Less than / or equal | LCP less than 2500 | +| `not_equals` | Not equal | status not equals "failed" | + +## Learn More + +- [Visual Assertions](./visual.md) — screenshot-based text and visibility checks +- [Textual (DOM) Assertions](./textual.md) — element states and attributes +- [URL Assertions](./url.md) — URL-based checks +- [Title Assertions](./title.md) — page title checks +- [DevTools Assertions](./devtools/) — network, console, performance, cookies, localStorage diff --git a/docs/user-guide/features/checkpoints/textual.md b/docs/user-guide/features/checkpoints/textual.md new file mode 100644 index 0000000..2ac95fb --- /dev/null +++ b/docs/user-guide/features/checkpoints/textual.md @@ -0,0 +1,43 @@ +# Textual (DOM) Assertions + +Textual assertions extract data from the page's DOM — element states, attributes, and computed styles that aren't always visible in a screenshot. + +## When It's Used + +- Element states: disabled, enabled, checked, readonly, expanded +- CSS properties with exact values: `font-size: 16px`, `opacity: 0.5`, `display: none` +- HTML attributes: `placeholder`, `aria-*`, `data-*`, `class`, `id`, `href`, `src`, `type`, `value` +- Attribute existence: "has placeholder", "has aria-label" +- Exact CSS color values: `rgb(255,0,0)`, `#ff0000` + +## Examples + +### Assertions + +``` +Assert: the submit button is disabled +Assert: the checkbox is checked +Assert: the input field has placeholder "Enter email" +Assert: the element has aria-label "Close dialog" +Assert: the font-size of the heading is 24px +``` + +### Extractions + +``` +Extract the href of the first link +Store the value attribute of the email input +Get the class of the error message element +``` + +## When NOT to Use + +- For visible text content (prices, labels) → use [Visual](./visual.md) +- For color names like "red background" → use [Visual](./visual.md) (DOM may return `transparent` or inherited values) +- For network/console/cookie data → use [DevTools](./devtools/) + +## How It Works + +1. KaneAI captures the DOM snapshot of the page +2. The AI model identifies the target element and extracts the requested property +3. The value is compared against the expected value diff --git a/docs/user-guide/features/checkpoints/title.md b/docs/user-guide/features/checkpoints/title.md new file mode 100644 index 0000000..e806d20 --- /dev/null +++ b/docs/user-guide/features/checkpoints/title.md @@ -0,0 +1,29 @@ +# Title Assertions + +Title assertions check the browser tab's document title (`document.title`). + +## When It's Used + +- Page title verification: "title contains Dashboard" +- Navigation confirmation: "title is Home Page" + +## Examples + +### Assertions + +``` +Assert: page title contains "Dashboard" +Assert: title is "My Account - Settings" +``` + +### Extractions + +``` +Store the page title +``` + +## How It Works + +1. KaneAI reads `page.title()` directly +2. The title string is compared against the expected value +3. No screenshot or DOM analysis needed — this is a direct read diff --git a/docs/user-guide/features/checkpoints/url.md b/docs/user-guide/features/checkpoints/url.md new file mode 100644 index 0000000..9d41cab --- /dev/null +++ b/docs/user-guide/features/checkpoints/url.md @@ -0,0 +1,39 @@ +# URL Assertions + +URL assertions check values in the browser's address bar — the current URL path, query parameters, fragments, and redirect targets. + +## When It's Used + +- URL path: "URL contains /checkout" +- Query parameters: "URL has param `sort=price`" +- Redirect verification: "redirected to /login" +- Fragment/hash: "URL hash is #section-2" + +## Examples + +### Assertions + +``` +Assert: URL contains /checkout +Assert: the page redirected to /dashboard +Assert: URL path is /products/42 +``` + +### Extractions + +``` +Store the current URL +Extract the URL path +``` + +### If/Else + +``` +If URL contains /login then enter credentials, else go to profile +``` + +## How It Works + +1. KaneAI reads the current `page.url` value directly +2. The URL string is compared against the expected value using the specified operator +3. No screenshot or DOM analysis needed — this is a direct read diff --git a/docs/user-guide/features/checkpoints/visual.md b/docs/user-guide/features/checkpoints/visual.md new file mode 100644 index 0000000..5ca9a70 --- /dev/null +++ b/docs/user-guide/features/checkpoints/visual.md @@ -0,0 +1,48 @@ +# Visual Assertions + +Visual assertions verify what's visible on screen by analyzing the current screenshot. This is the default method — when in doubt, KaneAI uses visual analysis. + +## When It's Used + +- Text content: prices, labels, headings, counts, messages +- Visibility: "is the login button visible", "are search results displayed" +- Color checks using color names: "has red background", "blue text" +- Any content that appears visually on the page + +## Examples + +### Assertions + +``` +Assert: the product price is $29.99 +Assert: the search results show at least 5 items +Assert: the error message is visible +Assert: the hero section displays "Welcome back" +``` + +### Extractions + +``` +Store the product price +Extract the heading text +Get the number of items in the cart +``` + +### If/Else + +``` +If the login button is visible then click it, else click Sign Up +``` + +## How It Works + +1. KaneAI takes a screenshot of the current page +2. The AI model analyzes the screenshot to find the requested information +3. The extracted value is compared against the expected value using the specified operator + +## Best Practices + +- Use visual assertions for any text or content you can see on screen +- Be specific about what to look for: "the price in the cart summary" not just "the price" +- For exact CSS values (like `rgb(255,0,0)` or `#ff0000`), use [Textual (DOM)](./textual.md) instead +- For element states (disabled, checked), use [Textual (DOM)](./textual.md) instead diff --git a/integrations/docs/kiro-powers.md b/integrations/docs/kiro-powers.md index 798ae9c..4f63a95 100644 --- a/integrations/docs/kiro-powers.md +++ b/integrations/docs/kiro-powers.md @@ -14,28 +14,33 @@ The `integrations/kiro-powers/` folder is a [Kiro power](https://kiro.dev/docs/p | Artifact | Path | Purpose | |---|---|---| -| **Canonical skill** | `skill-installer/skills/SKILL.md` | Source of every CLI fact: command shapes, flags, exit codes, NDJSON schema, log layout, testmd file format, parse errors. **Edit this first.** Every other integration mirrors from here. | +| **Canonical skill** | `skill-installer/skills/SKILL.md` | Source of every CLI fact: command shapes, flags, exit codes, NDJSON essentials, decision tree, results presentation. **Edit this first.** Every other integration mirrors from here. | +| **Canonical references** | `skill-installer/skills/references/*.md` | On-demand reference content: `objectives-cookbook.md` (pattern catalog + checkpoint analyze methods), `testmd.md` (file format, replay), `parsing.md` (full NDJSON schema), `debug.md` (log layout), `parallel.md`, `setup-and-config.md`. Equally authoritative — facts in any of these must mirror through. | | Kiro power root | `integrations/kiro-powers/POWER.md` | Frontmatter (name/displayName/keywords/author), onboarding, condensed command reference, steering-file mapping. | -| `kane-cli run` steering | `integrations/kiro-powers/steering/kane-cli-run.md` | Full reference for one-shot `kane-cli run`: objective patterns, full flag table, NDJSON parsing, results presentation, failure diagnosis, parallel execution. | +| `kane-cli run` steering | `integrations/kiro-powers/steering/kane-cli-run.md` | Full reference for one-shot `kane-cli run`: objective patterns + checkpoint analyze methods (Visual / Textual-DOM / URL / Title / DevTools→Network/Console/Performance/Cookies/localStorage), full flag table, NDJSON parsing, results presentation, failure diagnosis, parallel execution. | | `kane-cli testmd` steering | `integrations/kiro-powers/steering/kane-cli-testmd.md` | Full reference for `kane-cli testmd`: file format, frontmatter, `@import`, replay/author cache, `Result.md`, CI patterns, parse errors. | | Hook template | `integrations/kiro-powers/hooks/kane-verify.kiro.hook` | Sample agent hook the user copies to their workspace `.kiro/hooks/`. | -If a fact appears in this integration that is **not** in `SKILL.md`, that's a bug — either backfill `SKILL.md` first, or delete the fact from the integration. +If a fact appears in this integration that is **not** in `SKILL.md` or one of the `references/*.md`, that's a bug — either backfill the canonical source first, or delete the fact from the integration. -## SKILL.md → Kiro Powers mapping +## SKILL.md (+ references) → Kiro Powers mapping -| `SKILL.md` section | Where it lives in the Kiro power | +The canonical skill restructured in May 2026 into a thin `SKILL.md` (6 sections, ~260 lines) plus six on-demand `references/*.md`. The Kiro power's two steering files absorb the equivalent depth — Kiro reads the right steering file per workflow, the way Claude Code reads the right reference file on demand. + +| Canonical source | Where it lives in the Kiro power | |---|---| -| §1 Decision Tree | `steering/kane-cli-run.md` → Decision tree, and `steering/kane-cli-testmd.md` → Decision tree | -| §2 Pre-flight Setup (install / login / verify) | `POWER.md` → Onboarding (Steps 1–3) | -| §3 Building the Command — flag table, exit codes, variables, context files | `POWER.md` → Command reference (condensed) **and** `steering/kane-cli-run.md` → Full flag reference + Variables and secrets + Context files | -| §4 Writing Objectives — patterns, "store as", do/don't | `steering/kane-cli-run.md` → Writing objectives — three patterns | -| §5 Parsing Output — events, parsing strategy, run_end | `steering/kane-cli-run.md` → Parsing the NDJSON output | -| §6 Presenting Results — live narration, results card, failure | `steering/kane-cli-run.md` → Presenting results | -| §7 Saving & Replaying Tests (`testmd`) | All of `steering/kane-cli-testmd.md` | -| §8 Failure Handling & Log Inspection | `steering/kane-cli-run.md` → Failure handling & log inspection | -| §9 Parallel Execution | `steering/kane-cli-run.md` → Parallel execution | -| §10 Configuration & Reference | `POWER.md` → Configuration, and `steering/kane-cli-run.md` → Configuration surface | +| `SKILL.md` §1 Live narration & results presentation (Monitor/Bash launch decision is Claude-Code-specific — Kiro keeps its own narration model) | `steering/kane-cli-run.md` → Presenting results | +| `SKILL.md` §2 Decision tree | `steering/kane-cli-run.md` → Decision tree, and `steering/kane-cli-testmd.md` → Decision tree | +| `SKILL.md` §3 Building a `run` command — flags, exit codes, examples | `POWER.md` → Command reference (condensed) **and** `steering/kane-cli-run.md` → Full flag reference | +| `SKILL.md` §4 Writing objectives — three patterns, "store as", do/don't | `steering/kane-cli-run.md` → Writing objectives — three patterns | +| `SKILL.md` §5 Parsing `--agent` output — essentials | `steering/kane-cli-run.md` → Parsing the NDJSON output (Event types + Parsing strategy summary) | +| `SKILL.md` §6 When to read which reference | Kiro analogue: `POWER.md`'s steering-file mapping (POWER.md tells Kiro when to load each steering file) | +| `references/objectives-cookbook.md` — analyze methods (Visual / Textual-DOM / URL / Title / DevTools→Network/Console/Performance/Cookies/localStorage), operators, chaining, pitfalls, worked examples | `steering/kane-cli-run.md` → Analyze methods — picking the right checkpoint (plus the existing Combining patterns, Assertion specificity, and Do / Don't sections) | +| `references/testmd.md` — testmd file format, replay & cascade, `@import`, commands, parse errors | All of `steering/kane-cli-testmd.md` | +| `references/parsing.md` — full NDJSON event schemas (`bifurcation`, `child_agent_*`, `ask_user`, complete `run_end` fields) | `steering/kane-cli-run.md` → Parsing the NDJSON output (full event-type list + Terminal `run_end` event) | +| `references/debug.md` — log layout, debugging flow, common failure patterns, bug-report heuristic | `steering/kane-cli-run.md` → Failure handling & log inspection + Bug-report heuristic | +| `references/parallel.md` — when to split, agent prompt template, batch summary | `steering/kane-cli-run.md` → Parallel execution | +| `references/setup-and-config.md` — install / auth / variables precedence / context files / config commands / Chrome management / directory layout | `POWER.md` → Onboarding (Steps 1–3) + `steering/kane-cli-run.md` → Variables and secrets + Context files + Configuration surface | ## Kiro-specific framing (don't lose these on edit) diff --git a/integrations/kiro-powers/steering/kane-cli-run.md b/integrations/kiro-powers/steering/kane-cli-run.md index 1a0638b..debd0b5 100644 --- a/integrations/kiro-powers/steering/kane-cli-run.md +++ b/integrations/kiro-powers/steering/kane-cli-run.md @@ -84,6 +84,59 @@ Chain action → extraction → assertion in a single objective: | Negative | `"assert no error message or red banner is visible"` | | Positional | `"assert 'Settings' appears in the left sidebar navigation"` | +## Analyze methods — picking the right checkpoint + +Assertions, extractions, and if/else checkpoints each work with five **analyze methods** — *where* the agent looks for the data. The method is selected from the phrasing of the objective. Pick the method that matches the data source, not the one that's easiest to type. + +| Method | Use it for | Phrasing the agent recognizes | +|---|---|---| +| **Visual** (default) | Visible text, prices, labels, counts, colors by name, visibility | "the price …", "is visible", "displays", "is shown" | +| **Textual (DOM)** | Element states, CSS properties, HTML attributes, exact CSS color values | "is disabled / enabled / checked", "the placeholder of …", "the aria-label of …", "the font-size of …", "rgb(…)" / "#hex" | +| **URL** | Address bar — path, query, fragment, redirects | "URL contains …", "URL path is …", "URL has param …", "redirected to …" | +| **Title** | Browser tab `document.title` | "page title contains …", "title is …" | +| **DevTools** | Things not visible on the page — network, console, performance, cookies, localStorage | see DevTools subdomains below | + +### DevTools subdomains + +Five domains. Each captures data the user cannot see on screen. The agent picks the subdomain from phrasing. + +| Subdomain | Captures | Scope | Common phrasing | +|---|---|---|---| +| **Network** | HTTP requests/responses — status codes, headers, response bodies, timing | Resets each step | "no API calls returned 5xx", "the POST /api/login returned 200", "all API responses completed under 2 seconds" | +| **Console** | `console.log/warn/error/info/debug` + uncaught JS exceptions | Resets each step. Top frame only | "no console errors", "no uncaught JS exceptions", "console contains '…'" | +| **Performance** | Core Web Vitals — LCP, CLS, INP, FCP, TTFB | Per-navigation, point-in-time | "page LCP is under 2500ms", "CLS is below 0.1", "TTFB under 800ms" | +| **Cookies** | All cookies including `httpOnly` — name, value, flags | Point-in-time, persists across steps | "a cookie named 'session_id' exists", "the session cookie is httpOnly", "no cookies without the Secure flag" | +| **localStorage** | Browser `localStorage` for the current origin | Point-in-time, persists across steps on same origin | "auth_token exists in localStorage", "the theme preference is 'dark'" | + +> Network and Console **reset between steps** — if a later step asserts on traffic or logs from an earlier step, extract and carry the value forward. Cookies and localStorage **persist** across steps on the same origin. + +### Operators + +Assertions support these comparisons. Phrase naturally — the agent maps to the right one. + +| Operator | Meaning | Example | +|---|---|---| +| `equals` | Exact match | "price equals $29.99", "title is 'Home'" | +| `contains` | Substring match | "URL contains /checkout" | +| `not_contains` | Does not contain | "title not contains 'Error'" | +| `gt` / `gte` | Greater than / or equal | "items greater than 5" | +| `lt` / `lte` | Less than / or equal | "LCP less than 2500" | +| `not_equals` | Not equal | "status not equals 'failed'" | + +### Picking the right method when in doubt + +- "Is the price $29.99?" → **Visual** (on screen). +- "Is the submit button disabled?" → **Textual/DOM** (state, not visible text). +- "Does this red background match exactly `rgb(220,38,38)`?" → **Textual/DOM** (exact CSS). +- "Are we on the checkout page?" → **URL**. +- "Did the page send failed API calls?" → **DevTools/Network**. +- "Are there console errors?" → **DevTools/Console**. +- "Is the page fast?" → **DevTools/Performance** (LCP/FCP/TTFB). +- "Did login set a session cookie?" → **DevTools/Cookies**. +- "Did the app store the auth token?" → **DevTools/localStorage**. + +Default when uncertain: **Visual** — that's what the agent does too. + ## Do / Don't | ✅ Do | ❌ Don't | @@ -91,6 +144,7 @@ Chain action → extraction → assertion in a single objective: | Imperative verbs: "go to", "click", "store as" | Vague verbs: "check out", "look at", "explore" | | Be specific: "click the 'Add to Cart' button" | Be vague: "add the item" | | Name extractions: "store X as 'price'" | Hope for values: "tell me the price" | +| Use the right analyze method for the data | Default to a generic "check this" when DevTools fits | | Use `{{variables}}` for credentials and URLs | Hardcode secrets in the objective string | | Include the starting URL in the objective | Assume the agent knows where to start | | Split mega-objectives (>15 steps) into runs | Cram everything into one giant objective | diff --git a/skill-installer/cli.js b/skill-installer/cli.js index 3fbc053..5040a9a 100644 --- a/skill-installer/cli.js +++ b/skill-installer/cli.js @@ -1,6 +1,6 @@ #!/usr/bin/env node -import { cpSync, mkdirSync, rmSync, existsSync } from "node:fs"; +import { cpSync, mkdirSync, rmSync, existsSync, readFileSync, writeFileSync } from "node:fs"; import { join, dirname } from "node:path"; import { homedir } from "node:os"; import { fileURLToPath } from "node:url"; @@ -9,6 +9,7 @@ const __dirname = dirname(fileURLToPath(import.meta.url)); const SKILL_NAME = "kane-cli"; const SOURCE_DIR = join(__dirname, "skills"); +const VERSION = JSON.parse(readFileSync(join(__dirname, "package.json"), "utf8")).version; const TARGETS = [ { dir: join(homedir(), ".claude", "skills", SKILL_NAME), agent: "Claude Code" }, @@ -22,13 +23,14 @@ function install() { process.exit(1); } - console.log("Installing kane-cli skill...\n"); + console.log(`Installing kane-cli skill v${VERSION}...\n`); let installed = 0; for (const { dir, agent } of TARGETS) { try { mkdirSync(dir, { recursive: true }); cpSync(SOURCE_DIR, dir, { recursive: true, force: true }); + writeFileSync(join(dir, "VERSION"), VERSION + "\n"); console.log(` ✓ ${agent} → ${dir}`); installed++; } catch (err) { @@ -38,7 +40,7 @@ function install() { console.log(); if (installed > 0) { - console.log(`Installed to ${installed}/3 agents.`); + console.log(`Installed v${VERSION} to ${installed}/3 agents.`); console.log(); console.log("Usage:"); console.log(" Claude Code → /kane-cli or ask any browser task"); diff --git a/skill-installer/package.json b/skill-installer/package.json index be671db..00b2077 100644 --- a/skill-installer/package.json +++ b/skill-installer/package.json @@ -1,6 +1,6 @@ { "name": "@testmuai/kane-cli-skill", - "version": "1.0.0", + "version": "1.1.0", "description": "Install kane-cli browser automation skill for AI coding agents (Claude Code, Codex CLI, Gemini CLI)", "type": "module", "bin": "./cli.js", diff --git a/skill-installer/skills/SKILL.md b/skill-installer/skills/SKILL.md index e5bf710..9190215 100644 --- a/skill-installer/skills/SKILL.md +++ b/skill-installer/skills/SKILL.md @@ -5,804 +5,255 @@ description: Browser automation via kane-cli — run objectives, parse NDJSON ou # Kane CLI — Browser Automation Skill -Use `kane-cli` for **any task that requires a real browser**: navigating websites, clicking elements, filling forms, searching, testing web UI, taking screenshots, or verifying deployments. - -**Do NOT** use Playwright, Puppeteer, or Selenium directly. `kane-cli` manages Chrome, auth, and the AI automation agent. - -**Always run with `--agent` flag.** This gives structured NDJSON output that you parse and present to the user with rich formatting. - ---- - -## 1. Decision Tree - -When the user's request involves a browser, follow this flow: - -**Is kane-cli installed?** -├─ Unknown → Check with `kane-cli --version` -├─ No → `npm install -g @testmuai/kane-cli` then §2 -└─ Yes ↓ - -**Is kane-cli set up?** -├─ Unknown → Run `kane-cli whoami` to check auth status -├─ No → Go to §2 (Pre-flight Setup) -└─ Yes ↓ - -**What does the user want?** -├─ Single browser task → Build one `kane-cli run --agent` command (§3, §4) -├─ Test/verify something → Same, but use assertion objectives (§4) -├─ Extract data from a page → Same, but use "store as" extraction pattern (§4) -├─ Save / re-run / commit the test → Use `kane-cli testmd` (§7) -├─ Multiple independent tasks → Decompose into sub-objectives, run in parallel via Agent tool (§9) -├─ Debug a failed run → Inspect logs (§8) -└─ Configure kane-cli → Run config commands (§10) - -**After every run:** -1. Parse the NDJSON output (§5) -2. Present rich results with emojis (§6) -3. If failed, inspect logs and diagnose (§8) - ---- - -## 2. Pre-flight Setup - -Before first use, verify installation and auth. - -### Install - -```bash -npm install -g @testmuai/kane-cli -``` - -### Check Auth Status - -```bash -kane-cli whoami -``` - -If this shows "not configured" or errors, run login: - -### Login (Basic Auth) - -```bash -kane-cli login --username --access-key -``` - -This creates the default profile with basic auth, auto-selects the KaneAI project, and marks setup complete. Credentials come from the user's TestmuAI dashboard (Settings → Keys). - -Optional flag: -- `--profile ` — profile name (default: last selected profile check using `config show`) - -### Login (OAuth) - -```bash -kane-cli login --oauth -``` - -This opens the browser for OAuth consent and waits for the callback. Works in both TTY and non-TTY (agent) mode. - -### Login (Interactive — TTY only) - -In a terminal, run `kane-cli login` with no flags for the interactive wizard (auth method → project picker → folder picker). If the user needs this, ask them to run it directly: - -> Please run `! kane-cli login` and complete the sign-in. - -### Verify - -```bash -kane-cli whoami # Auth status -kane-cli config show # Current configuration -``` +Use `kane-cli` for **any task that requires a real browser**: navigating websites, clicking elements, filling forms, searching, testing web UI, taking screenshots, or verifying deployments. Do NOT use Playwright, Puppeteer, or Selenium directly. Always run with `--agent` so output is structured NDJSON you can parse. --- -## 3. Building the Command +## 1. Live narration and results presentation — READ THIS FIRST -Every run uses this pattern: - -```bash -kane-cli run "" --agent [options] -``` - -`--agent` is **mandatory** — it outputs structured NDJSON that you parse and present to the user. - -### Flags - -| Flag | Purpose | Default | -|------|---------|---------| -| `--headless` | No visible browser window | Off (browser visible) | -| `--max-steps ` | Limit agent reasoning steps | 30 | -| `--timeout ` | Kill run after N seconds | No limit | -| `--variables ` | Inline variables JSON | None | -| `--variables-file ` | Load variables from a JSON file | None | -| `--global-context ` | Override global agent context markdown | `~/.testmuai/kaneai/global-memory.md` | -| `--local-context ` | Override local project context markdown | `.testmuai/context.md` | -| `--ws-endpoint ` | Remote browser via WebSocket (e.g. LambdaTest grid) | Local Chrome | -| `--cdp-endpoint ` | Connect to existing Chrome via CDP | Auto-launch Chrome | -| `--code-export` | Generate code export after upload | Off | - -### Exit Codes - -| Code | Meaning | -|------|---------| -| 0 | ✅ Passed | -| 1 | ❌ Failed | -| 2 | ⚠️ Error (auth, setup, infra) | -| 3 | ⏱️ Timeout or cancelled | - -### Variables - -Variables parameterize objectives with reusable values and secrets. Use `{{key}}` syntax in objectives. - -**Format:** -```json -{ - "username": { "value": "alice", "secret": false }, - "password": { "value": "s3cret!", "secret": true } -} -``` +The user is watching this happen in real time. Silence during a kane-cli run is a bug; a one-line "Test passed" instead of the results table is a bug. Both happen because this section used to be buried at line 353 of an 800-line file. It's first now. Follow it exactly. -`secret: true` masks the value in logs and routes it to TestmuAI's secrets store instead of being synced as plain TMS variables. +### 1.1 How to launch kane-cli — Monitor (Claude Code) or Bash (Codex / Gemini) -**Loading order** (later wins): -1. `~/.testmuai/kaneai/variables/*.json` (global, alphabetical) -2. `{cwd}/.testmuai/variables/*.json` (local project overrides) -3. `--variables-file ` -4. `--variables '{...}'` (inline JSON) +**Bash is synchronous — it blocks until kane-cli exits, then hands you the whole stdout at once. That means you cannot narrate event-by-event from a Bash call.** To narrate live, the launch tool must stream stdout line-by-line. -**Always parameterize:** credentials, API keys, tokens, environment-specific URLs. -**OK to hardcode:** one-off URLs, static UI text, navigation paths. - -### Context Files - -Context files provide additional instructions to the agent: -- **Global:** `~/.testmuai/kaneai/global-memory.md` — shared across all runs -- **Local:** `.testmuai/context.md` in cwd — project-specific - -Override per-run with `--global-context` / `--local-context` flags. - -### Examples - -```bash -# Simple browser task -kane-cli run "Go to https://www.amazon.in and search for 'laptop'" --agent - -# Headless with timeout -kane-cli run "Go to https://app.example.com and verify login page loads" --agent --headless --timeout 60 - -# With variables -kane-cli run "Go to https://app.example.com and login with {{username}} and {{password}}" --agent \ - --variables '{"username": {"value": "alice"}, "password": {"value": "secret123", "secret": true}}' - -# Remote browser (LambdaTest grid) -kane-cli run "Go to https://shop.example.com and add item to cart" --agent \ - --ws-endpoint "wss://cdp.lambdatest.com/playwright?capabilities=..." - -# With variables file -kane-cli run "Go to https://staging.myapp.com, login and verify dashboard" --agent \ - --variables-file ./test-creds.json --headless --timeout 120 -``` - ---- - -## 4. Writing Objectives - -The objective string is the most important input. How you phrase it determines what the agent does. - -### Three Patterns - -| Pattern | Trigger Phrases | Agent Behavior | -|---------|----------------|----------------| -| 🎯 **Action** | "go to", "click", "type", "search", "fill", "scroll" | Performs browser actions | -| ✅ **Assertion** | "assert", "verify", "confirm", "check that" | Validates a condition (pass/fail) | -| 📦 **Extraction** | "store X as 'name'" | Reads a value from the page and persists it in structured output | - -### Extraction: The "store as" Pattern - -**Critical.** Vague phrasing like "read", "report", or "tell me" does NOT reliably extract data. The agent may observe the value visually but won't persist it in structured output. - -❌ **Bad** — agent looks but doesn't capture: -``` -"go to example.com and read the page title" -"go to example.com and tell me the price" -``` - -✅ **Good** — agent extracts and persists in `final_state`: -``` -"go to example.com, store the page title as 'page_title'" -"go to example.com, store the price of the first item as 'price'" -``` - -Stored values appear in the `run_end` event's `final_state` and `context.memory` fields. - -### Combining Patterns +| Agent | Launch tool | Live narration possible? | +|---|---|---| +| **Claude Code** | `Monitor` — streams each stdout line as its own notification | ✅ Yes — narrate per event as it arrives | +| **Codex CLI** | `Bash` (or shell equivalent) | ❌ No — narrate post-run from captured stdout | +| **Gemini CLI** | `Bash` (or shell equivalent) | ❌ No — narrate post-run from captured stdout | -Chain action → extraction → assertion in a single objective: +**In Claude Code, you MUST use `Monitor` (not Bash) to launch `kane-cli run` / `kane-cli testmd run`.** Pattern: +```yaml +description: "kane-cli: " +command: kane-cli run "" --agent +timeout_ms: 600000 +persistent: false ``` -"go to {{app_url}}/dashboard, - store the welcome message as 'welcome_text', - store the user role in the sidebar as 'role', - assert the role is 'Admin'" -``` - -### Assertion Specificity - -| Type | Example | -|------|---------| -| **Exact match** | `"assert the cart total shows '$29.99'"` | -| **Flexible match** | `"assert a price is displayed for each product"` | -| **State** | `"assert the Submit button is disabled until all fields are filled"` | -| **Conditional** | `"if a cookie banner appears, dismiss it, then assert the homepage loads"` | -| **Negative** | `"assert no error message or red banner is visible"` | -| **Positional** | `"assert 'Settings' appears in the left sidebar navigation"` | - -### Dos and Don'ts - -| ✅ Do | ❌ Don't | -|-------|---------| -| Use imperative verbs: "go to", "click", "store as" | Use vague verbs: "check out", "look at", "explore" | -| Be specific: "click the 'Add to Cart' button" | Be vague: "add the item" | -| Name extractions: "store X as 'price'" | Hope for values: "tell me the price" | -| Use `{{variables}}` for credentials/URLs | Hardcode secrets in the objective | -| Include starting URL in the objective: "Go to https://..." | Assume the agent knows where to start | -| Split mega-objectives (>15 steps) into multiple runs | Cram everything into one massive objective | ---- - -## 5. Parsing Output (--agent mode) +Every NDJSON line from kane-cli arrives as a notification. The watch ends when kane-cli exits (you'll see the exit code in the final notification). Do NOT also call Bash for the same run — that double-launches kane-cli. -> **Internal reference only.** Everything in this section (field names, event types, JSON structure) is for you to parse programmatically. **Never expose these internal terms to the user.** The user should see plain-language summaries, not `run_end`, `final_state`, `bifurcation`, `NDJSON`, `session_dir`, or any raw JSON fields. +In Codex/Gemini, use Bash with the same `kane-cli ... --agent` command. After it returns, parse the captured stdout as if you had received the events in sequence. -With `--agent`, kane-cli outputs one JSON object per line to **stdout**. Progress UI renders to **stderr**. +### 1.2 Before you launch — emit start line and create todos -### Event Types +**Before** invoking Monitor (or Bash), emit: -**Progress events** (bulk of the output — one per step): - -```json -{"step": 1, "status": "passed", "remark": "Navigated to amazon.in"} -{"step": 2, "status": "passed", "remark": "Typed 'laptop' in search box"} -{"step": 3, "status": "failed", "remark": "Could not find Add to Cart button"} +```text +Starting browser task: . ``` -| Field | Type | Description | -|-------|------|-------------| -| `step` | number | Step index (1-based) | -| `status` | string | `"passed"` or `"failed"` | -| `remark` | string | What the agent did or why it failed | - -These are **untyped** — they have no `type` field. Do **not** key on `event.type === 'step_start'` or `'step_end'`; those event types are not emitted. - -**Flow events:** - -| Event (`type` field) | Key Fields | Purpose | -|-------|-----------|---------| -| `bifurcation` | `flows[]`, `count` | Agent split objective into sub-flows | -| `child_agent_start` | `child_id`, `objective`, `parent_step` | Child agent spawned | -| `child_agent_end` | `child_id`, `success`, `steps_taken`, `summary` | Child agent finished | -| `ask_user` | `question`, `step_index`, `options?` | Agent needs user input | -| `error` | `message` | Error occurred | - -**Note:** There is no `run_start` event — the first line is either a `bifurcation` or a progress object. +Then create these TodoWrite items (skip on Gemini CLI where TodoWrite is unavailable): -**Note:** `ask_user` is auto-disabled when stdin is not a TTY. Since agents typically run kane-cli as a subprocess, ask_user events will not be emitted. Write objectives that don't require interactive input. +1. `Narrate start of ` — mark `in_progress` immediately +2. `Narrate each step as NDJSON arrives` +3. `Present results table after run_end` -### Parsing Strategy +The todos exist so that after Monitor/Bash returns control, the in-context reminder pulls you back into narration mode rather than a generic "parse stdout" mode. -Since progress events lack a `type` field, distinguish them from typed events like this: +### 1.3 During the run — narrate every event -``` -for each line of NDJSON: - if obj.type === "run_end" → terminal event, stop parsing - if obj.type === "bifurcation" → flow split - if obj.type exists → other typed event - if obj.step exists → progress event (step/status/remark) -``` +Progress events have `step`/`status`/`remark` fields and **no `type` field**. Each one gets ONE narration line. -**Build automation on `run_end`** — it is the only event guaranteed to have a stable schema across versions. Use progress events for live status display only. - -**Terminal event** (always the last line): - -```json -{ - "type": "run_end", - "status": "passed", - "summary": "Searched for laptop and added first result to cart", - "one_liner": "Searched for laptop on Amazon and added to cart", - "reason": "Objective completed", - "duration": 45.2, - "credits": 12, - "final_state": { - "price": "$29.99", - "product_name": "Wireless Headphones" - }, - "context": { - "memory": {}, - "variables": {}, - "pointer": "(passed) Searched for laptop and added first result to cart" - }, - "session_dir": "~/.testmuai/kaneai/sessions/a1b2c3d4-e5f6-7890-abcd-ef1234567890", - "run_dir": "~/.testmuai/kaneai/sessions/a1b2c3d4-e5f6-7890-abcd-ef1234567890/runs/0", - "test_url": "https://test-manager.lambdatest.com/projects/123/test-cases/456" -} -``` +**Claude Code (Monitor):** Each Monitor notification IS one event. Narrate it the moment the notification arrives. Do not batch. Do not wait for more events. One notification → one narration line. -Key `run_end` fields: -- `status` — `"passed"` or `"failed"` -- `summary` — what the agent did -- `one_liner` — short summary for display -- `reason` — why it stopped -- `credits` — credits consumed by the run (when reported) -- `final_state` — extracted values from "store as" objectives -- `test_url` — link to KaneAI dashboard (if upload succeeded) -- `session_dir` / `run_dir` — paths to log files +**Codex / Gemini (Bash post-run):** Iterate the captured stdout line-by-line in order. Emit one narration per progress event in sequence before moving on to the results table. -### Responding to `ask_user` (if stdin is a TTY) +Template (both cases): -```json -{"type": "user_response", "answer": "Medium size"} +```text +Step : ``` -To cancel a run: +If `status` is `"failed"`, flag it immediately: -```json -{"type": "cancel"} +```text +Step failed: — the agent is retrying. ``` ---- - -## 6. Presenting Results to the User - -> **Golden rule:** The user should feel like they're watching a browser task happen, not reading a log file. Use plain language, never expose internal field names, JSON keys, file paths, or technical jargon. Translate everything into what the user cares about. - -### 📢 Live Progress (During the Run) +Never expose internal field names (`step`, `status`, `remark`, `run_end`, `final_state`, `bifurcation`, `session_dir`, etc.) to the user. Translate to plain language. -**Do not stay silent while kane-cli runs.** As the command executes, keep the user informed: +### 1.4 After run_end — present the results table -1. **Before starting** — Tell the user what you're about to do: - > Starting browser task: searching for 'laptop' on Amazon... +The terminal event has `type: "run_end"` and stable fields: `status`, `summary`, `one_liner`, `duration`, `credits`, `final_state`, `test_url`, `session_dir`, `run_dir`. -2. **As steps complete** — Relay each step's outcome in plain language as it happens. Parse the progress events from stdout and narrate them: - > Step 1: Opened Amazon homepage - > Step 2: Typed 'laptop' in the search bar - > Step 3: Clicked the search button - > Step 4: Search results loaded — found product listings - -3. **If something goes wrong mid-run** — Flag it immediately, don't wait for the final result: - > Step 5: Could not find the 'Add to Cart' button — the agent is retrying... - -This keeps the user engaged and lets them intervene early if the task is going in the wrong direction. - -### 📋 Results Summary (After the Run) - -**After every run, present a clear summary.** Never just say "it passed" — show the full picture in a user-friendly format. - -**Successful run:** +**For a passing run, always emit this exact table** (substituting the field values): +```markdown | | | |-------|-------| | 🟢 **Result** | Passed | -| 🎯 **Task** | Search for 'laptop' on Amazon | -| ⏱️ **Duration** | 45.2s | -| 👣 **Steps taken** | 7 | -| 📝 **What happened** | Opened Amazon, typed 'laptop' in search, clicked search, results loaded with 48 products | -| 🔗 **View details** | [Open in KaneAI Dashboard](https://test-manager.lambdatest.com/...) | - -**If data was extracted** (from "store as" objectives), show it as a clean results table: - -| 📦 What was found | Value | -|-------------|----------------| -| Top repository | freeCodeCamp/freeCodeCamp | -| Star count | 413k | -| Price | $29.99 | +| 🎯 **Task** | | +| ⏱️ **Duration** | s | +| 👣 **Steps taken** | | +| 📝 **What happened** | | +| 🔗 **View details** | [Open in KaneAI Dashboard]() | +``` -**If assertions were checked**, show pass/fail for each: +**If `final_state` has values** (the user used "store as X" — see §4), append a second table: -| ✅ Check | Result | -|-------------|--------| -| Dashboard shows welcome message | 🟢 Passed | -| User role is Admin | 🔴 Failed | -### ❌ When Things Go Wrong -For failed runs, explain **what went wrong in plain language**: +```markdown +| 📦 What was found | Value | +|-------------|----------------| +| | | +``` -- 🔍 **What failed** — describe the step that failed and why, in the user's terms (not "step_003.json shows dom_action error") -- 📸 **Screenshot** — if a screenshot exists, read and show it so the user can see what the browser looked like at the point of failure -- 💡 **Why it likely failed** — your diagnosis: was the element missing? Did the page not load? Was the objective ambiguous? -- 🔧 **Suggested fix** — a concrete next step: rephrase the objective, increase timeout, check auth, etc. +**If the objective used assertions** ("assert …", "verify …"), append a pass/fail table per assertion derived from the run summary and step remarks. -**Example of a good failure report:** +### 1.5 On failure -> 🔴 **Failed** at step 5 of 9 (after 25s) -> -> **What happened:** The agent clicked "Proceed to Checkout" but the payment form never appeared. The page showed a loading spinner for 15 seconds before the agent timed out. -> -> **Likely cause:** The checkout page may require authentication, or the site's payment service was slow/down. -> -> **Suggested fix:** Try adding an explicit login step before checkout, or increase the timeout to 120s. +For exit code 1 (or `status: "failed"` in `run_end`), present a plain-language failure report — never raw paths or NDJSON. Template: -### 🐛 Suggesting a Bug Report +```markdown +🔴 **Failed** at step of (after s) -If the failure looks like a **kane-cli bug** (not auth, timeout, or a vague objective), offer to file a report: +**What happened:** . -> This looks like it might be a bug in kane-cli. Want me to file a report? +**Likely cause:** -File at: **https://github.com/LambdaTest/kane-cli/issues**. Gather the details automatically — don't ask the user to dig through log files. +**Suggested fix:** . +``` -**Do NOT suggest bug reports for:** auth issues, low timeouts, vague objectives, or website errors (500s, CAPTCHAs). +If a screenshot exists at `/run-test/screenshots/step_.png`, Read it and show it inline before the suggested fix. For deeper diagnosis, see `references/debug.md`. --- -## 7. Saving & Replaying Tests (`testmd`) +## 2. Decision tree -The §3 `run` command is the **primary** mode — one-shot, ephemeral. `testmd` is the **secondary** mode: tests live as `_test.md` files on disk, each step is cached on the first run, and every later run **replays from cache** with no LLM cost. +When the user's request involves a browser: -Use `testmd` whenever the user wants the test to persist. The decision is binary — once a test exists as a file, every later invocation is `testmd run`, never `run`. +**Is kane-cli installed and authenticated?** +- Unknown → `kane-cli whoami` +- No / errors → Read `references/setup-and-config.md` +- Yes ↓ -### When to switch from `run` to `testmd` - -| User says | Use | -|---|---| -| "save this test", "commit this", "keep this", "add this to the suite" | `testmd` | -| "regression test", "smoke test", "make this replayable" | `testmd` | -| "this is a test", "test the X flow end-to-end" (suite-shaped) | `testmd` | -| "run this once", "check if X works right now", "try X" | `run` (§3) | -| "search for", "click", "fill", "verify" (one-shot) | `run` (§3) | - -If unclear, ask: "Do you want me to save this test so you can re-run it later?" - -### Quick start +**What does the user want?** +- A single one-shot browser task → build a `kane-cli run --agent` command (§3 + §4) +- A test they want to save / re-run / commit → Read `references/testmd.md` first, then use `kane-cli testmd` +- Multiple independent browser tasks → Read `references/parallel.md` first +- Debug a failed run → Read `references/debug.md` +- Configure kane-cli or check directory layout → Read `references/setup-and-config.md` +- You need the full NDJSON event schema (rare — §5's summary covers 90% of cases) → Read `references/parsing.md` -Write the file (any path; filename must end in `_test.md`): +**Every run, always:** follow §1 above. -```markdown ---- -mode: testing -max_steps: 30 --- -# Amazon search - -## Open Amazon -Open https://www.amazon.com. - -## Search for headphones -Type "wireless headphones" into the search box and submit. -Verify at least one product result is visible. -``` - -Run it: +## 3. Building a `run` command ```bash -kane-cli testmd run amazon_test.md --agent -``` - -### File format - -Four parts in order: - -1. **YAML frontmatter** — between `--- ... ---` at the very top. -2. **`# Title`** — decorative; everything before the first `## ` is ignored. -3. **`## H2` step headings** — one per step. The agent reads the step body, not the heading. -4. **Step body** — either prose **or** a single `@import ` line. Never both. - -Per-step `yaml` overrides go immediately under the heading, in a fenced block: - -````markdown -## Submit the form -```yaml -timeout: 90 -optional: true +kane-cli run "" --agent [options] ``` -Click submit and verify the confirmation banner. -```` -**Frontmatter keys to use:** +`--agent` is mandatory — it switches stdout to NDJSON. Most-used flags: -| Key | Scope | Description | -|---|---|---| -| `mode` | root | `action` (halts on auth walls) or `testing` (default — pushes through so negative-test assertions can fire) | -| `max_steps` | root + step | Max agent reasoning steps. Default `30`. | -| `timeout` | root + step | Hard kill per step in seconds. | -| `headless` | root | No browser window. | -| `variables` | root + step | `{{name}}` params, same shape as §3, with `secret: true` for credentials | -| `global_context` / `local_context` | root + step | Inline Markdown or path | -| `code_export` / `code_language` | root + step | Generate Playwright after the run; language `python` or `javascript` | - -Files ending in `_test.md` are tests (valid entry points). Any other `.md` is a helper — reachable only via `@import`. - -### The replay & cascade rule (CRITICAL) - -On the **first** run of a test, the agent authors each step and saves a recording. On **every later run**, each step replays from its recording — no agent, no LLM cost, much faster. +| Flag | Purpose | Default | +|------|---------|---------| +| `--headless` | No visible browser window | Off | +| `--max-steps ` | Cap agent reasoning steps | 30 | +| `--timeout ` | Hard kill after N seconds | No limit | +| `--variables ` | Inline variables JSON (for `{{key}}` in objective) | None | +| `--variables-file ` | Load variables from a JSON file | None | +| `--ws-endpoint ` | Remote browser (LambdaTest grid) | Local Chrome | +| `--code-export` | Generate code export after upload | Off | -A step replays only if **all** of these hold: -- A recording for that step exists, -- Its prose is unchanged since the recording, -- Its `yaml` block is unchanged, -- No earlier step in the file invalidated it. +Other flags (`--global-context`, `--local-context`, `--cdp-endpoint`) and the full variables precedence chain live in `references/setup-and-config.md`. -**Editing step N re-authors step N AND every step after it in the same file.** Each step starts where the previous step left off (URL, login, tabs). When step 3 changes, step 4 cannot safely replay against state that no longer exists. +**Exit codes:** `0` passed · `1` failed · `2` auth/infra error · `3` timeout/cancelled. -Consequences when editing tests: -- A one-line tweak at the top of a 20-step test re-authors all 20 steps on the next run. -- To re-record only one step, edit only that step (or steps after it). -- `--author` forces full authoring for one run (debugging only). -- `rm -rf output-/` wipes the cache entirely. +### Examples -### `@import` for reusing flows +```bash +# One-shot +kane-cli run "Go to https://www.amazon.in and search for 'laptop'" --agent -Extract a repeating flow (login, setup, cookie banner dismissal) into a helper file: +# Headless with timeout +kane-cli run "Go to https://app.example.com and verify login page loads" --agent --headless --timeout 60 -```markdown -## Sign in -@import ./helpers/login.md +# With inline credentials +kane-cli run "Go to https://app.example.com and login with {{username}} and {{password}}" --agent \ + --variables '{"username":{"value":"alice"},"password":{"value":"s3cret","secret":true}}' ``` -Rules: -- Helper filename **must not** end in `_test.md`. -- Path resolves relative to the **importing file**, not the shell's cwd. -- The step body must be exactly `@import ` — no mixed prose, no extra lines. -- The step's `yaml` block may contain **only** `optional`. Other keys are rejected. -- `optional: true` on `@import` is allowed only at the root file, not on a nested import. - -Variables and context propagate into helpers. Chrome / `mode` / auth do not (root-only). - -Editing a helper re-authors that step in **every test that imports it**, plus everything after the import in those tests. Same cascade rule. +--- -### Commands +## 4. Writing objectives -| Command | Use | -|---|---| -| `kane-cli testmd run --agent [flags]` | Run a test | -| `kane-cli testmd list` | List `*_test.md` files under cwd (NDJSON when non-TTY) | -| `kane-cli testmd status ` | Test Manager identity + local-sync state | -| `kane-cli testmd export [--code-language python\|javascript]` | Regenerate code export from existing recordings (no browser launch) | -| `kane-cli testmd delete ` | Local-only delete: removes source + `output-/`. Does NOT delete from Test Manager. | +How you phrase the objective string determines what the agent does. Three patterns: -**Flags on `testmd run` that don't exist on §3 `run`:** +> For the full catalog — every action verb, every assertion analyze method (Visual / Textual-DOM / URL / Title / DevTools→Network/Console/Performance/Cookies/localStorage), operators, chaining, conditional/negative patterns, and worked examples — Read `references/objectives-cookbook.md`. Same grammar applies to one-shot `kane-cli run` objectives and `_test.md` step bodies. -| Flag | Default | Description | +| Pattern | Trigger words | Behavior | |---|---|---| -| `--name ` | none | Persist the run under this name. Regex `[a-zA-Z0-9_-]+`. | -| `--on-lock-conflict ` | none | Behavior when another user holds the test's edit lock. `readonly` = replay-only / no upload, `fail` = exit 2, `wait` = block until released | -| `--retry` | off | On replay failure, restart with a shrinking replay window | -| `--retry-count ` | `3` | Max retry restarts before falling back to full re-author | -| `--author` | off | Force authoring every step (skip replay decision) | +| 🎯 **Action** | "go to", "click", "type", "search", "fill" | Performs browser actions | +| ✅ **Assertion** | "assert", "verify", "confirm", "check that" | Pass/fail check on a condition | +| 📦 **Extraction** | "store X as 'name'" | Persists a value into `run_end.final_state` | -All §3 `run` flags also apply (`--agent`, `--headless`, `--max-steps`, `--timeout`, `--variables`, etc.). +### The "store as" rule (critical for extraction) -Flag wins over frontmatter for everything **except** `variables` — the file owns variables; you can add new keys via flags but cannot override file-defined ones. +Vague phrasing like "read", "tell me", "report" does NOT reliably extract data — the agent may see the value but won't capture it. Use "store as". -### Output: `output-/` and `Result.md` +❌ `"go to example.com and read the page title"` +✅ `"go to example.com, store the page title as 'page_title'"` -After a run: - -``` -amazon_test.md -output-amazon/ - Result.md # human-readable run report - .internal/ # cached recordings — do not edit - playwright-python-code/ # only if code_export enabled -``` +Stored values appear in `run_end.final_state` and become the second results table per §1.4. -**`output-/` is commit-safe and should be committed to git.** That's how teammates and CI replay the same recordings. +### Chaining -For tests using `@import`, helper recordings land next to the helper file in `helper-output---/` directories. Also commit-safe. +Action → extraction → assertion in one objective: -**`Result.md`** opens in any Markdown viewer. It contains: -- Frontmatter — `status`, `started`, `duration_s`, `session_id` -- One entry per root step with one of `✓ passed`, `✗ failed`, `⏭ skipped`, optionally suffixed `(optional)` when a soft-failing step failed but the run continued -- For `@import` steps that failed, a path to the failing sub-step inside the helper - -When the user asks "did the test pass?" or "where did it fail?" for a previously-run test, read `Result.md` rather than re-running the test. - -### Recording a `_test.md` from a live session - -If the user runs an ad-hoc objective with §3 `run` and decides to keep it: - -```bash -kane-cli run "Search for noise-cancelling headphones on amazon.com" --name amazon-search -``` - -On exit, kane-cli writes `/.testmuai/tests/amazon-search_test.md`. Move that file into the user's repo and re-run it with `testmd run`. Without `--name`, an ad-hoc `run` is ephemeral and nothing is written. - -### CI invocation - -```bash -kane-cli testmd run ./tests/checkout_test.md \ - --agent \ - --headless \ - --on-lock-conflict wait \ - --retry +```text +"go to {{app_url}}/dashboard, + store the welcome message as 'welcome_text', + assert the user role in the sidebar is 'Admin'" ``` -- `--agent` — NDJSON to stdout (auto-enabled when stdin is not a TTY; pass explicitly anyway). -- `--headless` — no window. -- `--on-lock-conflict wait` — block instead of failing if a teammate is editing the same test. -- `--retry` — automatically recover transient replay failures. - -Exit codes follow §3 with new semantics: -- `2` now includes parse errors and `--on-lock-conflict fail` -- `3` now includes `--on-lock-conflict wait` timeout - -### Parse errors (when writing a `_test.md`) - -Parse errors abort **before** any browser launch with exit `2`. Common ones and the fix: +### Dos and don'ts -| Message | Fix | +| ✅ Do | ❌ Don't | |---|---| -| `frontmatter is missing closing '---'` | Add the trailing `---` | -| `invalid YAML in frontmatter` | Re-validate the YAML block | -| `step body must be exactly one of prose / @import` | Split into two steps | -| `step config on @import may only contain 'optional'` | Remove other keys from the yaml block | -| `cannot @import a test file` | Imports may only reference helpers (not ending in `_test.md`) | -| `cyclic reference` | Restructure helpers to break the loop | -| `chrome config is global-only` | Move Chrome key to root frontmatter | -| `'' is run-level and cannot be set per-step` | Move `mode` / `on_lock_conflict` to root frontmatter | -| `unknown config key` | Remove or fix the key | -| `auth/identity keys are CLI-only` | Pass `username` / `access_key` as CLI flags, not in frontmatter | - -When the user reports a parse error, fix the file before retrying — don't loop on the same error. - ---- - -## 8. Failure Handling & Log Inspection - -When a run fails, diagnose before suggesting fixes. - -### Log Locations - -The `run_end` event provides `session_dir` and `run_dir` paths. Use those directly. - -``` -{session_dir}/ -├── session.json # Session metadata, run list, upload status -├── tui.log # Timeline: session start, run start/end, errors -└── runs/{n}/ - └── run-test/ - └── actions.ndjson # Step-by-step record of agent actions -``` - -### Debugging Flow - -1. **Parse the `run_end` event** from stdout — it has `status`, `reason`, and `summary` plus the `session_dir` / `run_dir` paths. -2. **Read `actions.ndjson`** in `{run_dir}/run-test/` — each line is one agent action with its intent and outcome. -3. **Check `tui.log`** in `{session_dir}/` — for session-level issues (Chrome launch, auth, upload). - -### Common Failure Patterns - -| Symptom | Likely Cause | Fix | -|---------|-------------|-----| -| 🔄 Agent repeats same action | Stuck in a loop / page didn't change | Rephrase objective, add explicit wait or assertion | -| 🎯 Agent clicks wrong element | Ambiguous UI, multiple similar elements | Be more specific: "click the **blue** 'Submit' button in the **checkout form**" | -| 👁️ Agent says done but didn't finish | Objective too vague | Add explicit assertions: "assert the confirmation page shows order number" | -| 💀 Exit code 2, no steps | Auth or Chrome failure | Check `kane-cli whoami`, verify Chrome is available | -| ⏱️ Exit code 3 | Timeout or cancelled | Increase `--timeout` or `--max-steps`, or split into smaller objectives | -| 🚫 "CDP endpoint not reachable" | Chrome not running | Let kane-cli manage Chrome (remove `--cdp-endpoint`) | +| Imperative verbs: "go to", "click", "store as" | Vague verbs: "check out", "look at", "explore" | +| Specific: "click the 'Add to Cart' button" | Vague: "add the item" | +| Name extractions: "store X as 'price'" | Hope for values: "tell me the price" | +| `{{variables}}` for credentials/URLs | Hardcode secrets in the objective | +| Always include starting URL | Assume the agent knows where to start | +| Split mega-objectives (>15 steps) into multiple runs | Cram everything into one | --- -## 9. Parallel Execution - -For multiple independent browser tasks, decompose and run in parallel using the Agent tool. - -### When to Split - -- **>15 steps** — long runs drift and get stuck -- **Independent flows** — login test and search test don't depend on each other -- **Different pages/features** — settings vs checkout vs admin -- **Different user roles** — admin flow vs regular user flow - -### How to Split - -Each sub-objective must be **self-contained**: navigates to its own URL, authenticates independently, asserts its own outcomes. No sub-objective depends on another having run first. - -### Execution Pattern +## 5. Parsing `--agent` output — essentials -1. Decompose the user's request into N independent sub-objectives -2. Spawn N Agent tool calls in a **single message** — each runs: - ```bash - kane-cli run "Go to and " --agent --headless --timeout 120 - ``` -3. Each agent parses the NDJSON output, waits for `run_end`, returns: status, steps, duration, summary, session path -4. After ALL agents complete, format the batch summary +> Internal reference only. Never expose these field names to the user — translate them per §1. -### Agent Prompt Template +Stdout is NDJSON, one event per line. There are two shapes: -``` -Run this kane-cli browser test and report results: +- **Progress events** (most events) have `step` (1-based), `status` (`passed`/`failed`), `remark` — and **no `type` field**. +- **Typed events** have a `type` field: `bifurcation`, `child_agent_start`, `child_agent_end`, `ask_user`, `error`, and finally `run_end`. - kane-cli run "Go to and " --agent --headless --timeout 120 +Parsing strategy: -After the command completes: -1. Capture the exit code -2. Parse the run_end NDJSON event from stdout -3. If failed, read the failing step's screenshot from run_dir -4. Return: {status, steps, duration, summary, session_dir, failure_step, screenshot_path} +```text +for each line: + if obj.type === "run_end" → terminal, stop parsing + else if obj.type exists → typed flow event (rare) + else if obj.step exists → progress event → narrate per §1.3 ``` -### Batch Summary Format - -```markdown -## 🧪 Test Suite: - -| # | Test | Status | Steps | Time | What happened | -|---|------|--------|-------|------|---------| -| 1 | Login + dashboard | ✅ | 5 | 12s | Welcome banner visible | -| 2 | Product search | ✅ | 7 | 18s | 3 results for 'shoes' | -| 3 | Checkout flow | ❌ | 9 | 25s | Payment form did not load | -| 4 | Admin CSV export | ✅ | 6 | 15s | CSV downloaded (42 rows) | - -### 📊 Overall -- **Pass rate:** 3/4 (75%) -- **Total steps:** 27 · **Total time:** 1m10s - -### ❌ Failures -**#3 Checkout flow** — Payment form did not load after clicking "Credit Card". -📸 [screenshot of the failure shown inline] -``` +`run_end` is the only event with a stable cross-version schema — build all post-run logic on it. -Status icons: ✅ passed · ❌ failed · ⚠️ stuck/timeout - -**Do not** show raw file paths (like `~/.testmuai/kaneai/sessions/...`) in the summary. Instead, read the screenshot and show it inline, or offer to inspect logs only if the user asks. +For full event schemas (`bifurcation` flow fields, `child_agent_*`, `ask_user` semantics, `cancel`/`user_response` outbound events, complete `run_end` field list), Read `references/parsing.md`. --- -## 10. Configuration & Reference - -### Config Commands - -```bash -kane-cli config show # Show all current settings -kane-cli config set-window x # Browser window size (e.g. 1920x1080) -kane-cli config chrome-profile # Chrome profile path (or interactive picker in TTY) -kane-cli config project # TMS project ID (or interactive picker in TTY) -kane-cli config folder # TMS folder ID (or interactive picker in TTY) -``` - -### Feedback - -Submit feedback on a completed test run: -```bash -kane-cli feedback --test-id --feedback-type --details "..." -``` - -### Directory Structure +## 6. When to read which reference -``` -~/.testmuai/kaneai/ -├── tui-config.json # Persistent CLI settings -├── config.json # Shared auth configuration -├── global-memory.md # Global agent context -├── chrome-profile/ # Default Chrome user profile -├── profiles/ # Stored credentials -│ └── {profile}/{env}/ -│ └── credentials -├── sessions/ # Session history -│ └── {session-id}/ -│ ├── session.json # Metadata, run list, upload status -│ ├── tui.log # Session event log -│ ├── runs/{n}/ -│ │ └── run-test/ -│ │ └── actions.ndjson # Step-by-step record of agent actions -│ └── code-export/ # (when --code-export) generated code files -└── variables/ # Global variable files - └── *.json - -# Project-local overrides (in cwd): -.testmuai/ -├── context.md # Project-specific agent context -└── variables/ - └── *.json # Project-specific variables -``` - -### Chrome Management - -kane-cli auto-launches Chrome with CDP (DevTools Protocol) on ports 9222–9230. Chrome runs as a detached process and outlives the CLI. - -- `--headless` — runs Chrome in headless mode (no visible window) -- `--cdp-endpoint ` — connect to an already-running Chrome instance -- `--ws-endpoint ` — connect to a remote browser (LambdaTest grid) - -If Chrome fails to launch, ensure Google Chrome is installed and no other process is using CDP ports 9222–9230. +| Situation | Read | +|---|---| +| User wants to save/persist/re-run a test | `references/testmd.md` | +| Run failed, need to diagnose | `references/debug.md` | +| Multiple independent browser tasks | `references/parallel.md` | +| Need full NDJSON event schema | `references/parsing.md` | +| First-time install, auth, or full config | `references/setup-and-config.md` | diff --git a/skill-installer/skills/references/debug.md b/skill-installer/skills/references/debug.md new file mode 100644 index 0000000..d599f31 --- /dev/null +++ b/skill-installer/skills/references/debug.md @@ -0,0 +1,45 @@ + + +# Failure Handling & Log Inspection + +When a run fails, diagnose before suggesting fixes. + +## Log Locations + +The `run_end` event provides `session_dir` and `run_dir` paths. Use those directly. + +```text +{session_dir}/ +├── session.json # Session metadata, run list, upload status +├── tui.log # Timeline: session start, run start/end, errors +└── runs/{n}/ + └── run-test/ + └── actions.ndjson # Step-by-step record of agent actions +``` + +## Debugging Flow + +1. **Parse the `run_end` event** from stdout — it has `status`, `reason`, and `summary` plus the `session_dir` / `run_dir` paths. +2. **Read `actions.ndjson`** in `{run_dir}/run-test/` — each line is one agent action with its intent and outcome. +3. **Check `tui.log`** in `{session_dir}/` — for session-level issues (Chrome launch, auth, upload). + +## Common Failure Patterns + +| Symptom | Likely Cause | Fix | +|---------|-------------|-----| +| 🔄 Agent repeats same action | Stuck in a loop / page didn't change | Rephrase objective, add explicit wait or assertion | +| 🎯 Agent clicks wrong element | Ambiguous UI, multiple similar elements | Be more specific: "click the **blue** 'Submit' button in the **checkout form**" | +| 👁️ Agent says done but didn't finish | Objective too vague | Add explicit assertions: "assert the confirmation page shows order number" | +| 💀 Exit code 2, no steps | Auth or Chrome failure | Check `kane-cli whoami`, verify Chrome is available | +| ⏱️ Exit code 3 | Timeout or cancelled | Increase `--timeout` or `--max-steps`, or split into smaller objectives | +| 🚫 "CDP endpoint not reachable" | Chrome not running | Let kane-cli manage Chrome (remove `--cdp-endpoint`) | + +## Filing a bug report + +If the failure looks like a **kane-cli bug** (not auth, timeout, or a vague objective), offer to file a report: + +> This looks like it might be a bug in kane-cli. Want me to file a report? + +File at: **https://github.com/LambdaTest/kane-cli/issues**. Gather the details automatically — don't ask the user to dig through log files. + +**Do NOT suggest bug reports for:** auth issues, low timeouts, vague objectives, or website errors (500s, CAPTCHAs). diff --git a/skill-installer/skills/references/objectives-cookbook.md b/skill-installer/skills/references/objectives-cookbook.md new file mode 100644 index 0000000..c4ee42f --- /dev/null +++ b/skill-installer/skills/references/objectives-cookbook.md @@ -0,0 +1,372 @@ + + +# Writing Kane-CLI Objectives — Pattern Cookbook + +Read this whenever you're constructing the prose objective for `kane-cli run ""` or the body of a `## Step` in a `_test.md` file. Both surfaces feed the same agent and accept the same grammar. + +--- + +## 1. Anatomy of a good objective + +Three properties make an objective reliable: + +- **Specific** — name the site, the action, and the field values where they matter. +- **Action-oriented** — lead with a verb (`go to`, `search`, `open`, `fill`, `click`, `verify`). +- **Has a success criterion** — state what "done" looks like so the agent knows when to stop. + +Bad → better: + +| | Objective | +|---|---| +| ❌ | Test the login page. | +| ✅ | Open `https://app.example.com/login`, log in as `{{tester}}`, and verify the dashboard URL contains `/home`. | + +The bad version leaves "test" undefined and gives the agent no end state. The better version names the URL, the credentials, and the assertion that closes the loop. + +--- + +## 2. Action verbs — quick catalog + +Reference list. Use these in your prose; the agent recognizes them all. + +| Category | Verbs | +|---|---| +| **Navigation** | go to, open, navigate to, visit, reload, go back, switch to tab/window | +| **Input** | type, fill, enter, paste, clear, select (dropdown), check (checkbox), uncheck, toggle | +| **Click/hover** | click, double-click, right-click, hover, long-press | +| **Scroll/drag** | scroll to, scroll down/up, drag to, drop on | +| **Wait** | wait for, wait until, pause for | +| **File** | upload, attach, download | +| **Misc** | dismiss, accept dialog, switch frame, take screenshot | + +Always include a **starting URL** somewhere in the first action verb if the agent needs to navigate. Never assume the agent knows where to start. + +--- + +## 3. Assertions, extractions, and if/else — using checkpoints + +Checkpoints are the agent's verification primitives. There are three kinds, and each one works with every analyze method below: + +| Kind | Phrasing | What happens | +|---|---|---| +| **Assertion** | "Assert: …", "Verify …", "Confirm …" | Fails the run if the condition is false. | +| **Extraction** | "Store …", "Extract …", "Get …" | Saves a value into `run_end.final_state` for later use. | +| **If/Else** | "If … then … else …" | Branches the run based on a condition. | + +### 3.1 Analyze methods — where the agent looks + +The agent automatically picks the right method based on phrasing. To get the method you want, use the language column. + +| Method | Use it for | Phrasing the agent recognizes | +|---|---|---| +| **Visual** (default) | Visible text, prices, labels, counts, color names, visibility | "the price …", "the heading …", "is visible", "displays", "is shown" | +| **Textual (DOM)** | Element states, CSS properties, HTML attributes, exact CSS color values | "is disabled / enabled / checked / readonly", "the placeholder of …", "the aria-label of …", "the font-size of …", "rgb(…)" / "#hex" | +| **URL** | Address bar — path, query, fragment, redirects | "URL contains …", "URL path is …", "URL has param …", "redirected to …" | +| **Title** | Browser tab `document.title` | "page title contains …", "title is …" | +| **DevTools** | Things not visible on screen — network, console, performance, cookies, localStorage | see §3.2 below | + +### 3.2 DevTools analyze methods + +Five subdomains. Each one is the right choice when the data you care about lives in the browser's internals rather than on the page. + +#### Network (HTTP traffic) + +The agent captures every HTTP request/response per step. **Resets each step** — assert on traffic in the same step it happens (or extract and carry forward). + +Queryable fields: `method`, `url`, `domain`, `path`, `query_params`, `resource_type`, `request_headers`, `request_body`, `response_status`, `response_headers`, `response_body`, `timing.duration_ms`, `timing.ttfb_ms`, `failed`, `failure_reason`. + +```text +Assert: no API calls returned 5xx status codes +Assert: the POST /api/login returned HTTP status 200 +Assert: all API responses completed in under 2 seconds +Assert: no network requests failed with connection errors +Assert: the /posts endpoint returned at least 10 items in the response body + +Store the response body of the POST /api/login request +Extract the status code of the last API call to /api/users +Store all API request URLs + +If the /api/auth returned 200 then proceed to dashboard, else show error message +``` + +Limits: up to 5,000 requests per step, response bodies capped at 64KB, binary content (images/fonts/videos) skipped. + +#### Console (browser console output) + +Captures every `console.log/warn/error/info/debug` and every uncaught JS exception. **Resets each step**. Top frame only — iframes (payment widgets, third-party embeds) are not captured. + +Levels normalize to: `log`, `warning`, `error`, `info`, `debug`. `errors` includes both `console.error()` and uncaught exceptions; `exceptions` is just the uncaught-exception subset (where `is_exception: true`). + +```text +Assert: no console errors on the page +Assert: no uncaught JavaScript exceptions +Assert: no JS errors after clicking Submit +Assert: console contains "Amplitude SDK triggered" +Assert: no console warnings + +Store all console error messages +Extract the first console error text + +If console contains "feature_flag_enabled" then use new flow, else use legacy flow +``` + +#### Performance (Core Web Vitals) + +Point-in-time read of the **last full page navigation's** metrics. Place the assertion after the page has loaded; use a wait step if the page needs time to settle. + +Available metrics with good thresholds: + +| Metric | Measures | Good | +|---|---|---| +| **LCP** | Largest Contentful Paint | < 2,500ms | +| **CLS** | Cumulative Layout Shift | < 0.1 | +| **INP** | Interaction to Next Paint (requires user interaction) | < 200ms | +| **FCP** | First Contentful Paint | < 1,800ms | +| **TTFB** | Time to First Byte | < 800ms | + +```text +Assert: page LCP is under 2500ms +Assert: CLS is below 0.1 +Assert: TTFB is under 800ms +Assert: page performance meets Core Web Vitals thresholds + +Store the page LCP value +Extract all web vitals metrics +``` + +#### Cookies + +Snapshot at assertion time. Sees `httpOnly` cookies too (unlike `document.cookie`). Cookies persist across steps; asserting on a different domain may show different cookies. + +Fields: `name`, `value`, `domain`, `path`, `expires`, `http_only`, `secure`, `same_site` (`Strict`/`Lax`/`None`). + +```text +Assert: a cookie named "session_id" exists +Assert: the session cookie is httpOnly +Assert: no cookies are set without the Secure flag +Assert: the auth cookie has sameSite set to "Strict" + +Store all cookies +Extract the value of the "session_id" cookie + +If a cookie named "auth_token" exists then go to dashboard, else go to login +``` + +#### localStorage + +Snapshot at assertion time. Per-origin (protocol + domain + port). Persists across steps as long as you stay on the same origin. Values are always strings — if the app stores JSON, the value is the raw JSON string but the agent will parse it to drill into fields. + +```text +Assert: auth_token exists in localStorage +Assert: the theme preference in localStorage is "dark" +Assert: localStorage has fewer than 10 items +Assert: the "theme" field in the user_prefs localStorage item is "dark" + +Store all localStorage items +Extract the auth_token from localStorage +Get all localStorage keys + +If localStorage has "onboarding_complete" then show dashboard, else start onboarding +``` + +### 3.3 Operators + +Assertions support these comparisons. Phrase them naturally — the agent maps to the right operator. + +| Operator | Meaning | Example | +|---|---|---| +| `equals` | Exact match | "price equals $29.99", "title is 'Home'" | +| `contains` | Substring match | "URL contains /checkout" | +| `not_contains` | Does not contain | "title not contains 'Error'" | +| `gt` / `gte` | Greater than / or equal | "items greater than 5" | +| `lt` / `lte` | Less than / or equal | "LCP less than 2500" | +| `not_equals` | Not equal | "status not equals 'failed'" | + +### 3.4 Picking the right method when in doubt + +- "Is the price $29.99?" — **Visual** (it's on screen). +- "Is the submit button disabled?" — **Textual/DOM** (state, not visible text). +- "Does this red background match exactly `rgb(220, 38, 38)`?" — **Textual/DOM** (exact CSS). +- "Are we on the checkout page?" — **URL** (address bar). +- "Did the page send any failed API calls?" — **DevTools/Network**. +- "Are there console errors?" — **DevTools/Console**. +- "Is the page fast?" — **DevTools/Performance** (LCP/FCP/TTFB). +- "Did the login set a session cookie?" — **DevTools/Cookies**. +- "Did the app store the auth token?" — **DevTools/localStorage**. + +If you're not sure which method, default to **Visual** — that's what the agent does too. + +--- + +## 4. Extraction — the "store as" rule + +Vague phrasing like "read", "tell me", "report" does NOT reliably persist data. The agent may *observe* the value but won't *capture* it into `run_end.final_state`. + +```text +❌ "go to example.com and read the page title" +❌ "go to example.com and tell me the price" + +✅ "go to example.com, store the page title as 'page_title'" +✅ "go to example.com, store the price of the first item as 'price'" +``` + +For DevTools extractions, the same rule applies — use "store" or "extract": + +```text +✅ "store the response body of the POST /api/login as 'login_response'" +✅ "extract the value of the session_id cookie as 'session'" +``` + +Stored values land in `run_end.final_state` and feed the second results table per `SKILL.md §1.4`. + +--- + +## 5. Chaining — action → extraction → assertion + +Multi-clause objectives are fine — and often preferable to splitting into multiple steps when the operations are tightly coupled. + +```text +"go to {{app_url}}/dashboard, + store the welcome message as 'welcome_text', + store the user role in the sidebar as 'role', + assert the role is 'Admin'" +``` + +```text +"open https://shop.example.com, + add the first 'Wireless Headphones' result to the cart, + navigate to the cart, + store the cart total as 'total', + assert the cart contains exactly one item" +``` + +```text +"go to {{app_url}}/api-health, + store the API response body as 'health', + assert no console errors, + assert no API calls returned 5xx" +``` + +When chaining, keep each clause as a complete instruction. The agent processes them in order. + +### Splitting vs. chaining — when to break into multiple steps + +| Chain in one objective | Split into separate steps | +|---|---| +| ≤ 15 clauses, related state | > 15 reasoning steps expected | +| All happen on one page or flow | Different flows / different user roles | +| Extraction needed for the assertion in the same objective | Each step is independently testable | + +For `_test.md` step bodies, each step is its own objective — split aggressively. For one-shot `kane-cli run`, chain when the operations share state. + +--- + +## 6. Variables and context + +Use `{{name}}` syntax for values that should be parameterized: + +```text +"Log in as {{username}} with password {{password}}, then verify the dashboard loads" +``` + +**Always parameterize:** credentials, API keys, tokens, environment-specific URLs. +**OK to hardcode:** one-off URLs, static UI text, navigation paths. + +Mark credentials with `secret: true` in the variables JSON so they're masked in logs and routed to the secrets store: + +```json +{ + "username": { "value": "alice", "secret": false }, + "password": { "value": "s3cret!", "secret": true } +} +``` + +For the full variables-loading precedence and context-file behavior, Read `references/setup-and-config.md`. + +--- + +## 7. Conditional and negative patterns + +Conditional objectives let the agent handle optional UI states without failing: + +```text +"go to {{app_url}}, if a cookie banner appears then dismiss it, then assert the homepage loads" + +"open the dashboard, if a 'What's new' modal is visible then close it, then click Settings" +``` + +Negative assertions verify the *absence* of something: + +```text +"after submitting, assert no error message or red banner is visible" +"assert no console errors after clicking Save" +"assert no API calls failed during the checkout flow" +``` + +Positional assertions check where something is on the page: + +```text +"assert 'Settings' appears in the left sidebar navigation" +"assert the 'Cancel' button is on the right side of the modal footer" +``` + +--- + +## 8. Common pitfalls + +| ❌ Don't | ✅ Do | Why | +|---|---|---| +| "Test the checkout flow" | "Go to /cart, click Checkout, fill the address form with {{tester}}, click Pay, assert the order confirmation page loads" | "Test" has no end state — the agent doesn't know when to stop. | +| "Add the item" | "Click the 'Add to Cart' button on the first product card" | Vague target — agent may click the wrong element. | +| "Tell me the price" | "Store the cart total as 'total'" | Vague verbs don't extract — use "store" / "extract" / "get". | +| Hardcode credentials in the objective | Use `{{username}}` / `{{password}}` from `--variables-file` | Credentials in plain text leak into logs and TMS. | +| Omit the URL | "Go to https://example.com/login first, then …" | Agent doesn't know where to start. | +| Cram 25 operations into one objective | Split at logical boundaries (login, navigate, action, verify) | Long runs drift and stall. | +| "Check the page is fast" | "Assert LCP is under 2500ms and CLS is below 0.1" | Use the explicit web-vital metric, not a vague "fast." | +| "Make sure no errors" | "Assert no console errors and no API calls returned 5xx" | Be explicit about which kind of error you're checking. | + +--- + +## 9. Worked end-to-end examples + +### Example A — Single-page assertion suite + +```text +"go to https://shop.example.com/products/42, + assert the product title is 'Wireless Headphones', + assert the price is $129.99, + store the SKU as 'sku', + assert URL contains /products/42, + assert page LCP is under 2500ms, + assert no console errors" +``` + +This exercises Visual (title, price), Extraction (SKU), URL, Performance, and Console — all in one objective. + +### Example B — Login + dashboard verification + +```text +"open https://app.example.com/login, + log in with email {{tester.email}} and password {{tester.password}}, + assert the URL redirected to /dashboard, + assert a cookie named 'session_id' exists and is httpOnly, + assert no API calls returned 5xx during login, + store the user role from the sidebar as 'role', + assert the role is 'Admin'" +``` + +### Example C — testmd step body (same grammar) + +In a `_test.md` file: + +```markdown +## Verify checkout flow happy path +Open https://shop.example.com, log in as {{tester}}, add the first +'Wireless Headphones' result to the cart, navigate to checkout, +fill the shipping address with {{tester.address}}, click Pay. +Assert the order confirmation page loads. +Assert no console errors and no API calls returned 5xx. +Store the order number as 'order_id'. +``` + +The step body is exactly the same grammar as `kane-cli run`. Everything in this cookbook applies. diff --git a/skill-installer/skills/references/parallel.md b/skill-installer/skills/references/parallel.md new file mode 100644 index 0000000..e9a6cbc --- /dev/null +++ b/skill-installer/skills/references/parallel.md @@ -0,0 +1,65 @@ + + +# Parallel Execution + +For multiple independent browser tasks, decompose and run in parallel using the Agent tool. + +## When to Split + +- **>15 steps** — long runs drift and get stuck +- **Independent flows** — login test and search test don't depend on each other +- **Different pages/features** — settings vs checkout vs admin +- **Different user roles** — admin flow vs regular user flow + +## How to Split + +Each sub-objective must be **self-contained**: navigates to its own URL, authenticates independently, asserts its own outcomes. No sub-objective depends on another having run first. + +## Execution Pattern + +1. Decompose the user's request into N independent sub-objectives +2. Spawn N Agent tool calls in a **single message** — each runs: + ```bash + kane-cli run "Go to and " --agent --headless --timeout 120 + ``` +3. Each agent parses the NDJSON output, waits for `run_end`, returns: status, steps, duration, summary, session path +4. After ALL agents complete, format the batch summary + +## Agent Prompt Template + +```text +Run this kane-cli browser test and report results: + + kane-cli run "Go to and " --agent --headless --timeout 120 + +After the command completes: +1. Capture the exit code +2. Parse the run_end NDJSON event from stdout +3. If failed, read the failing step's screenshot from run_dir +4. Return: {status, steps, duration, summary, session_dir, failure_step, screenshot_path} +``` + +## Batch Summary Format + +```markdown +## 🧪 Test Suite: + +| # | Test | Status | Steps | Time | What happened | +|---|------|--------|-------|------|---------| +| 1 | Login + dashboard | ✅ | 5 | 12s | Welcome banner visible | +| 2 | Product search | ✅ | 7 | 18s | 3 results for 'shoes' | +| 3 | Checkout flow | ❌ | 9 | 25s | Payment form did not load | +| 4 | Admin CSV export | ✅ | 6 | 15s | CSV downloaded (42 rows) | + +### 📊 Overall +- **Pass rate:** 3/4 (75%) +- **Total steps:** 27 · **Total time:** 1m10s + +### ❌ Failures +**#3 Checkout flow** — Payment form did not load after clicking "Credit Card". +📸 [screenshot of the failure shown inline] +``` + +Status icons: ✅ passed · ❌ failed · ⚠️ stuck/timeout + +**Do not** show raw file paths (like `~/.testmuai/kaneai/sessions/...`) in the summary. Instead, read the screenshot and show it inline, or offer to inspect logs only if the user asks. diff --git a/skill-installer/skills/references/parsing.md b/skill-installer/skills/references/parsing.md new file mode 100644 index 0000000..517b817 --- /dev/null +++ b/skill-installer/skills/references/parsing.md @@ -0,0 +1,101 @@ + + +# Parsing --agent Output + +> **Internal reference only.** Everything in this section (field names, event types, JSON structure) is for you to parse programmatically. **Never expose these internal terms to the user.** The user should see plain-language summaries, not `run_end`, `final_state`, `bifurcation`, `NDJSON`, `session_dir`, or any raw JSON fields. + +With `--agent`, kane-cli outputs one JSON object per line to **stdout**. Progress UI renders to **stderr**. + +## Event Types + +**Progress events** (bulk of the output — one per step): + +```json +{"step": 1, "status": "passed", "remark": "Navigated to amazon.in"} +{"step": 2, "status": "passed", "remark": "Typed 'laptop' in search box"} +{"step": 3, "status": "failed", "remark": "Could not find Add to Cart button"} +``` + +| Field | Type | Description | +|-------|------|-------------| +| `step` | number | Step index (1-based) | +| `status` | string | `"passed"` or `"failed"` | +| `remark` | string | What the agent did or why it failed | + +These are **untyped** — they have no `type` field. Do **not** key on `event.type === 'step_start'` or `'step_end'`; those event types are not emitted. + +**Flow events:** + +| Event (`type` field) | Key Fields | Purpose | +|-------|-----------|---------| +| `bifurcation` | `flows[]`, `count` | Agent split objective into sub-flows | +| `child_agent_start` | `child_id`, `objective`, `parent_step` | Child agent spawned | +| `child_agent_end` | `child_id`, `success`, `steps_taken`, `summary` | Child agent finished | +| `ask_user` | `question`, `step_index`, `options?` | Agent needs user input | +| `error` | `message` | Error occurred | + +**Note:** There is no `run_start` event — the first line is either a `bifurcation` or a progress object. + +**Note:** `ask_user` is auto-disabled when stdin is not a TTY. Since agents typically run kane-cli as a subprocess, ask_user events will not be emitted. Write objectives that don't require interactive input. + +## Parsing Strategy + +Since progress events lack a `type` field, distinguish them from typed events like this: + +``` +for each line of NDJSON: + if obj.type === "run_end" → terminal event, stop parsing + if obj.type === "bifurcation" → flow split + if obj.type exists → other typed event + if obj.step exists → progress event (step/status/remark) +``` + +**Build automation on `run_end`** — it is the only event guaranteed to have a stable schema across versions. Use progress events for live status display only. + +**Terminal event** (always the last line): + +```json +{ + "type": "run_end", + "status": "passed", + "summary": "Searched for laptop and added first result to cart", + "one_liner": "Searched for laptop on Amazon and added to cart", + "reason": "Objective completed", + "duration": 45.2, + "credits": 12, + "final_state": { + "price": "$29.99", + "product_name": "Wireless Headphones" + }, + "context": { + "memory": {}, + "variables": {}, + "pointer": "(passed) Searched for laptop and added first result to cart" + }, + "session_dir": "~/.testmuai/kaneai/sessions/a1b2c3d4-e5f6-7890-abcd-ef1234567890", + "run_dir": "~/.testmuai/kaneai/sessions/a1b2c3d4-e5f6-7890-abcd-ef1234567890/runs/0", + "test_url": "https://test-manager.lambdatest.com/projects/123/test-cases/456" +} +``` + +Key `run_end` fields: +- `status` — `"passed"` or `"failed"` +- `summary` — what the agent did +- `one_liner` — short summary for display +- `reason` — why it stopped +- `credits` — credits consumed by the run (when reported) +- `final_state` — extracted values from "store as" objectives +- `test_url` — link to KaneAI dashboard (if upload succeeded) +- `session_dir` / `run_dir` — paths to log files + +## Responding to `ask_user` (if stdin is a TTY) + +```json +{"type": "user_response", "answer": "Medium size"} +``` + +To cancel a run: + +```json +{"type": "cancel"} +``` diff --git a/skill-installer/skills/references/setup-and-config.md b/skill-installer/skills/references/setup-and-config.md new file mode 100644 index 0000000..81a4fe6 --- /dev/null +++ b/skill-installer/skills/references/setup-and-config.md @@ -0,0 +1,140 @@ + + +# kane-cli Setup, Variables, and Config Reference + +## Install and auth + +Before first use, verify installation and auth. + +### Install + +```bash +npm install -g @testmuai/kane-cli +``` + +### Check Auth Status + +```bash +kane-cli whoami +``` + +If this shows "not configured" or errors, run login: + +### Login (Basic Auth) + +```bash +kane-cli login --username --access-key +``` + +This creates the default profile with basic auth, auto-selects the KaneAI project, and marks setup complete. Credentials come from the user's TestmuAI dashboard (Settings → Keys). + +Optional flag: +- `--profile ` — profile name (default: last selected profile check using `config show`) + +### Login (OAuth) + +```bash +kane-cli login --oauth +``` + +This opens the browser for OAuth consent and waits for the callback. Works in both TTY and non-TTY (agent) mode. + +### Login (Interactive — TTY only) + +In a terminal, run `kane-cli login` with no flags for the interactive wizard (auth method → project picker → folder picker). If the user needs this, ask them to run it directly: + +> Please run `! kane-cli login` and complete the sign-in. + +### Verify + +```bash +kane-cli whoami # Auth status +kane-cli config show # Current configuration +``` + +## Variables — full precedence chain + +Variables parameterize objectives with reusable values and secrets. Use `{{key}}` syntax in objectives. + +**Format:** +```json +{ + "username": { "value": "alice", "secret": false }, + "password": { "value": "s3cret!", "secret": true } +} +``` + +`secret: true` masks the value in logs and routes it to TestmuAI's secrets store instead of being synced as plain TMS variables. + +**Loading order** (later wins): +1. `~/.testmuai/kaneai/variables/*.json` (global, alphabetical) +2. `{cwd}/.testmuai/variables/*.json` (local project overrides) +3. `--variables-file ` +4. `--variables '{...}'` (inline JSON) + +**Always parameterize:** credentials, API keys, tokens, environment-specific URLs. +**OK to hardcode:** one-off URLs, static UI text, navigation paths. + +## Context files + +Context files provide additional instructions to the agent: +- **Global:** `~/.testmuai/kaneai/global-memory.md` — shared across all runs +- **Local:** `.testmuai/context.md` in cwd — project-specific + +Override per-run with `--global-context` / `--local-context` flags. + +## Config commands + +```bash +kane-cli config show # Show all current settings +kane-cli config set-window x # Browser window size (e.g. 1920x1080) +kane-cli config chrome-profile # Chrome profile path (or interactive picker in TTY) +kane-cli config project # TMS project ID (or interactive picker in TTY) +kane-cli config folder # TMS folder ID (or interactive picker in TTY) +``` + +### Feedback + +Submit feedback on a completed test run: +```bash +kane-cli feedback --test-id --feedback-type --details "..." +``` + +## Directory structure + +```text +~/.testmuai/kaneai/ +├── tui-config.json # Persistent CLI settings +├── config.json # Shared auth configuration +├── global-memory.md # Global agent context +├── chrome-profile/ # Default Chrome user profile +├── profiles/ # Stored credentials +│ └── {profile}/{env}/ +│ └── credentials +├── sessions/ # Session history +│ └── {session-id}/ +│ ├── session.json # Metadata, run list, upload status +│ ├── tui.log # Session event log +│ ├── runs/{n}/ +│ │ └── run-test/ +│ │ └── actions.ndjson # Step-by-step record of agent actions +│ └── code-export/ # (when --code-export) generated code files +└── variables/ # Global variable files + └── *.json + +# Project-local overrides (in cwd): +.testmuai/ +├── context.md # Project-specific agent context +└── variables/ + └── *.json # Project-specific variables +``` + +## Chrome management + +kane-cli auto-launches Chrome with CDP (DevTools Protocol) on ports 9222–9230. Chrome runs as a detached process and outlives the CLI. + +- `--headless` — runs Chrome in headless mode (no visible window) +- `--cdp-endpoint ` — connect to an already-running Chrome instance +- `--ws-endpoint ` — connect to a remote browser (LambdaTest grid) + +If Chrome fails to launch, ensure Google Chrome is installed and no other process is using CDP ports 9222–9230. diff --git a/skill-installer/skills/references/testmd.md b/skill-installer/skills/references/testmd.md new file mode 100644 index 0000000..5ee1fbb --- /dev/null +++ b/skill-installer/skills/references/testmd.md @@ -0,0 +1,217 @@ + + +# Saving & Replaying Tests with testmd + +The §3 `run` command is the **primary** mode — one-shot, ephemeral. `testmd` is the **secondary** mode: tests live as `_test.md` files on disk, each step is cached on the first run, and every later run **replays from cache** with no LLM cost. + +Use `testmd` whenever the user wants the test to persist. The decision is binary — once a test exists as a file, every later invocation is `testmd run`, never `run`. + +## When to switch from `run` to `testmd` + +| User says | Use | +|---|---| +| "save this test", "commit this", "keep this", "add this to the suite" | `testmd` | +| "regression test", "smoke test", "make this replayable" | `testmd` | +| "this is a test", "test the X flow end-to-end" (suite-shaped) | `testmd` | +| "run this once", "check if X works right now", "try X" | `run` (§3) | +| "search for", "click", "fill", "verify" (one-shot) | `run` (§3) | + +If unclear, ask: "Do you want me to save this test so you can re-run it later?" + +## Quick start + +Write the file (any path; filename must end in `_test.md`): + +```markdown +--- +mode: testing +max_steps: 30 +--- + +# Amazon search + +## Open Amazon +Open https://www.amazon.com. + +## Search for headphones +Type "wireless headphones" into the search box and submit. +Verify at least one product result is visible. +``` + +Run it: + +```bash +kane-cli testmd run amazon_test.md --agent +``` + +## File format + +Four parts in order: + +1. **YAML frontmatter** — between `--- ... ---` at the very top. +2. **`# Title`** — decorative; everything before the first `## ` is ignored. +3. **`## H2` step headings** — one per step. The agent reads the step body, not the heading. +4. **Step body** — either prose **or** a single `@import ` line. Never both. Prose bodies are objectives with the same grammar as `kane-cli run` — for the full pattern catalog (action verbs, assertion analyze methods, checkpoint types, chaining, worked examples), Read `references/objectives-cookbook.md`. + +Per-step `yaml` overrides go immediately under the heading, in a fenced block: + +````markdown +## Submit the form +```yaml +timeout: 90 +optional: true +``` +Click submit and verify the confirmation banner. +```` + +**Frontmatter keys to use:** + +| Key | Scope | Description | +|---|---|---| +| `mode` | root | `action` (halts on auth walls) or `testing` (default — pushes through so negative-test assertions can fire) | +| `max_steps` | root + step | Max agent reasoning steps. Default `30`. | +| `timeout` | root + step | Hard kill per step in seconds. | +| `headless` | root | No browser window. | +| `variables` | root + step | `{{name}}` params, same shape as §3, with `secret: true` for credentials | +| `global_context` / `local_context` | root + step | Inline Markdown or path | +| `code_export` / `code_language` | root + step | Generate Playwright after the run; language `python` or `javascript` | + +Files ending in `_test.md` are tests (valid entry points). Any other `.md` is a helper — reachable only via `@import`. + +## The replay & cascade rule (CRITICAL) + +On the **first** run of a test, the agent authors each step and saves a recording. On **every later run**, each step replays from its recording — no agent, no LLM cost, much faster. + +A step replays only if **all** of these hold: +- A recording for that step exists, +- Its prose is unchanged since the recording, +- Its `yaml` block is unchanged, +- No earlier step in the file invalidated it. + +**Editing step N re-authors step N AND every step after it in the same file.** Each step starts where the previous step left off (URL, login, tabs). When step 3 changes, step 4 cannot safely replay against state that no longer exists. + +Consequences when editing tests: +- A one-line tweak at the top of a 20-step test re-authors all 20 steps on the next run. +- To re-record only one step, edit only that step (or steps after it). +- `--author` forces full authoring for one run (debugging only). +- `rm -rf output-/` wipes the cache entirely. + +## `@import` for reusing flows + +Extract a repeating flow (login, setup, cookie banner dismissal) into a helper file: + +```markdown +## Sign in +@import ./helpers/login.md +``` + +Rules: +- Helper filename **must not** end in `_test.md`. +- Path resolves relative to the **importing file**, not the shell's cwd. +- The step body must be exactly `@import ` — no mixed prose, no extra lines. +- The step's `yaml` block may contain **only** `optional`. Other keys are rejected. +- `optional: true` on `@import` is allowed only at the root file, not on a nested import. + +Variables and context propagate into helpers. Chrome / `mode` / auth do not (root-only). + +Editing a helper re-authors that step in **every test that imports it**, plus everything after the import in those tests. Same cascade rule. + +## Commands + +| Command | Use | +|---|---| +| `kane-cli testmd run --agent [flags]` | Run a test | +| `kane-cli testmd list` | List `*_test.md` files under cwd (NDJSON when non-TTY) | +| `kane-cli testmd status ` | Test Manager identity + local-sync state | +| `kane-cli testmd export [--code-language python\|javascript]` | Regenerate code export from existing recordings (no browser launch) | +| `kane-cli testmd delete ` | Local-only delete: removes source + `output-/`. Does NOT delete from Test Manager. | + +**Flags on `testmd run` that don't exist on §3 `run`:** + +| Flag | Default | Description | +|---|---|---| +| `--name ` | none | Persist the run under this name. Regex `[a-zA-Z0-9_-]+`. | +| `--on-lock-conflict ` | none | Behavior when another user holds the test's edit lock. `readonly` = replay-only / no upload, `fail` = exit 2, `wait` = block until released | +| `--retry` | off | On replay failure, restart with a shrinking replay window | +| `--retry-count ` | `3` | Max retry restarts before falling back to full re-author | +| `--author` | off | Force authoring every step (skip replay decision) | + +All §3 `run` flags also apply (`--agent`, `--headless`, `--max-steps`, `--timeout`, `--variables`, etc.). + +Flag wins over frontmatter for everything **except** `variables` — the file owns variables; you can add new keys via flags but cannot override file-defined ones. + +## Output: `output-/` and `Result.md` + +After a run: + +```text +amazon_test.md +output-amazon/ + Result.md # human-readable run report + .internal/ # cached recordings — do not edit + playwright-python-code/ # only if code_export enabled +``` + +**`output-/` is commit-safe and should be committed to git.** That's how teammates and CI replay the same recordings. + +For tests using `@import`, helper recordings land next to the helper file in `helper-output---/` directories. Also commit-safe. + +**`Result.md`** opens in any Markdown viewer. It contains: +- Frontmatter — `status`, `started`, `duration_s`, `session_id` +- One entry per root step with one of `✓ passed`, `✗ failed`, `⏭ skipped`, optionally suffixed `(optional)` when a soft-failing step failed but the run continued +- For `@import` steps that failed, a path to the failing sub-step inside the helper + +When the user asks "did the test pass?" or "where did it fail?" for a previously-run test, read `Result.md` rather than re-running the test. + +## Recording a `_test.md` from a live session + +If the user runs an ad-hoc objective with §3 `run` and decides to keep it: + +```bash +kane-cli run "Search for noise-cancelling headphones on amazon.com" --name amazon-search +``` + +On exit, kane-cli writes `/.testmuai/tests/amazon-search_test.md`. Move that file into the user's repo and re-run it with `testmd run`. Without `--name`, an ad-hoc `run` is ephemeral and nothing is written. + +## CI invocation + +```bash +kane-cli testmd run ./tests/checkout_test.md \ + --agent \ + --headless \ + --on-lock-conflict wait \ + --retry +``` + +- `--agent` — NDJSON to stdout (auto-enabled when stdin is not a TTY; pass explicitly anyway). +- `--headless` — no window. +- `--on-lock-conflict wait` — block instead of failing if a teammate is editing the same test. +- `--retry` — automatically recover transient replay failures. + +Exit codes: + +| Code | Meaning | +|------|---------| +| 0 | ✅ Passed | +| 1 | ❌ Failed | +| 2 | ⚠️ Error (auth, setup, infra) — for `testmd`, also includes parse errors and `--on-lock-conflict fail` | +| 3 | ⏱️ Timeout or cancelled — for `testmd`, also includes `--on-lock-conflict wait` timeout | + +## Parse errors (when writing a `_test.md`) + +Parse errors abort **before** any browser launch with exit `2`. Common ones and the fix: + +| Message | Fix | +|---|---| +| `frontmatter is missing closing '---'` | Add the trailing `---` | +| `invalid YAML in frontmatter` | Re-validate the YAML block | +| `step body must be exactly one of prose / @import` | Split into two steps | +| `step config on @import may only contain 'optional'` | Remove other keys from the yaml block | +| `cannot @import a test file` | Imports may only reference helpers (not ending in `_test.md`) | +| `cyclic reference` | Restructure helpers to break the loop | +| `chrome config is global-only` | Move Chrome key to root frontmatter | +| `'' is run-level and cannot be set per-step` | Move `mode` / `on_lock_conflict` to root frontmatter | +| `unknown config key` | Remove or fix the key | +| `auth/identity keys are CLI-only` | Pass `username` / `access_key` as CLI flags, not in frontmatter | + +When the user reports a parse error, fix the file before retrying — don't loop on the same error.