Azure · xinlian12 · Feb 27, 2026 · Feb 27, 2026 · Feb 27, 2026 · Feb 27, 2026
@@ -0,0 +1 @@
+../../sdk/cosmos/azure-cosmos-benchmark/copilot/agents/cosmos-benchmark.agent.md
@@ -0,0 +1 @@
+../../sdk/cosmos/azure-cosmos-benchmark/copilot/skills/cosmos-benchmark-analyze
@@ -0,0 +1 @@
+../../sdk/cosmos/azure-cosmos-benchmark/copilot/skills/cosmos-benchmark-run
@@ -0,0 +1 @@
+../../sdk/cosmos/azure-cosmos-benchmark/copilot/skills/cosmos-benchmark-setup-resources
@@ -0,0 +1 @@
+../../sdk/cosmos/azure-cosmos-benchmark/copilot/skills/skill-creator
@@ -0,0 +1,5 @@
+# Benchmark config (contains secrets, tenant configs, VM connection info)
+benchmark-config/
+
+# Results directory
+results/
@@ -0,0 +1,47 @@
+---
+name: Cosmos Benchmark
+description: Cosmos DB benchmark agent — set up resources, run benchmarks, and analyze results. Supports both single-tenant and multi-tenant configurations. Use for benchmark/DR drill workflows.
+tools: ['readFile', 'listDir', 'runInTerminal', 'search', 'grep', 'fileSearch', 'agent']
+argument-hint: "setup resources, run benchmark, or analyze results"
+---
+
+# Cosmos Benchmark Agent
+
+You are a Cosmos DB benchmark specialist. You help with the full benchmark/DR drill lifecycle: provisioning infrastructure, running benchmarks, and analyzing results.
+
+## Routing
+
+Determine user intent and follow the matching workflow:
+
+| User wants to... | Skill to load |
+|---|---|
+| Set up resources (create/reuse Cosmos accounts, App Insights, VMs, install tools) | Read `sdk/cosmos/azure-cosmos-benchmark/copilot/skills/cosmos-benchmark-setup-resources/SKILL.md` |
+| Run a benchmark (clone repo, build, configure, execute scenarios) | Read `sdk/cosmos/azure-cosmos-benchmark/copilot/skills/cosmos-benchmark-run/SKILL.md` |
+| Analyze results (CSV metrics, compare runs, heap/thread dumps, reports, Kusto) | Read `sdk/cosmos/azure-cosmos-benchmark/copilot/skills/cosmos-benchmark-analyze/SKILL.md` |
+
+When a skill references files in its `references/` directory, read them from the skill's directory (e.g., `sdk/cosmos/azure-cosmos-benchmark/copilot/skills/cosmos-benchmark-analyze/references/thresholds.md`).
+
+## Subagent Usage
+
+For complex multi-step workflows, use subagents to keep context clean:
+
+- **Analyze after run**: Spawn a subagent to analyze results so run context doesn't pollute analysis.
+- **Parallel analysis**: Spawn parallel subagents for multiple result directories.
+- **Parallel resource creation**: During setup resources, the `provision-all.sh` script handles parallelism automatically.
+
+## Benchmark Modes
+
+The framework supports two modes — the choice is purely configuration:
+
+- **Single-tenant**: Pass connection details directly via CLI flags
+- **Multi-tenant**: Pass `-tenantsFile tenants.json` with multiple account configurations
+
+Both use the same JAR, orchestrator, and monitoring infrastructure.
+
+## Workflow Chaining
+
+After completing one task, suggest the natural next step:
+
+- After **setup resources** → suggest **run**
+- After **run** → suggest **analyze**
+- After **analyze** (if baseline exists) → suggest comparing with previous run
@@ -0,0 +1,127 @@
+---
+name: cosmos-benchmark-analyze
+description: Analyze Cosmos DB benchmark results — download from VM, generate markdown reports with time-series charts and comparison tables, apply pass/fail thresholds. Triggers on "analyze results", "compare runs", "leak check", "did it pass", "generate report", "regression check", or result directories.
+---
+
+# Analyze Benchmark Results
+
+Download results, generate a markdown report with metrics analysis, time-series charts, and multi-run comparison.
+
+## Step 1 — Download Results
+
+Download results from the VM to the local config directory:
+
+```bash
+# List available runs on VM
+bash scripts/download-results.sh --config-dir "$CONFIG_DIR" --list
+
+# Download a specific run
+bash scripts/download-results.sh --config-dir "$CONFIG_DIR" --run-name <run-name>
+
+# Download all runs
+bash scripts/download-results.sh --config-dir "$CONFIG_DIR" --all
+```
+
+Results are saved to `$CONFIG_DIR/results/<run-name>/`. Each run directory contains:
+
+```
+<run-name>/
+├── monitor.csv            # JVM metrics (threads, heap, FDs, GC, RSS, CPU)
+├── metrics/               # Codahale CSV metrics (throughput, latency per operation)
+│   ├── #Successful Operations.csv
+│   ├── #Unsuccessful Operations.csv
+│   └── ...                # Per-tenant and per-operation variants
+├── git-info.json          # branch, commit SHA
+├── gc.log                 # G1GC log
+└── benchmark.log          # Benchmark stdout/stderr
+```
+
+## Step 2 — Generate Report
+
+Generate a markdown report from the downloaded results:
+
+```bash
+python3 scripts/generate-report.py \
+  --results-dir "$CONFIG_DIR/results" \
+  --output "$CONFIG_DIR/results/report.md"
+```
+
+To analyze specific runs only:
+
+```bash
+python3 scripts/generate-report.py \
+  --results-dir "$CONFIG_DIR/results" \
+  --runs "20260302-SIMPLE-main,20260302-SIMPLE-fix-leak"
+```
+
+### What the report contains
+
+#### Per-run summary
+
+For each run, the report includes:
+- **Git info**: branch, commit
+- **JVM metrics table**: baseline, peak, and final values for threads, heap, RSS, FDs, GC
+- **Pass/fail verdict**: thread leak (delta ≤2) and memory leak (ratio ≤1.1) checks
+- **Throughput table**: Codahale metrics (ops/sec mean, 1m, 5m rates) from `metrics/*.csv`
+- **Time-series SVG charts**: inline sparklines for threads, heap, FDs, RSS, GC count, CPU over time
+
+#### Multi-run comparison table (when ≥2 runs)
+
+If multiple runs are present, the report includes:
+- **Side-by-side metrics comparison**: threads, heap, heap ratio, thread delta, FDs, GC, RSS for each run
+- **Throughput comparison**: ops/sec for each operation across runs
+
+### Metrics analyzed
+
+From **`monitor.csv`** (JVM-level, sampled every 60s):
+| Metric | Description |
+|---|---|
+| threads | Live thread count |
+| heap_used_kb | Used heap (S1U+EU+OU from jstat) |
+| heap_max_kb | Max heap capacity |
+| rss_kb | Resident set size |
+| fds | Open file descriptors |
+| cpu_pct | CPU usage percentage |
+| gc_count | Cumulative GC count |
+| gc_time_ms | Cumulative GC time |
+
+From **`metrics/*.csv`** (Codahale, per-operation):
+| File | Metrics |
+|---|---|
+| `#Successful Operations.csv` | count, mean_rate, m1_rate, m5_rate |
+| `#Unsuccessful Operations.csv` | count, mean_rate, m1_rate, m5_rate |
+| Per-tenant/operation variants | Same columns per operation type |
+
+### Pass/fail thresholds
+
+See `references/thresholds.md` for full details:
+
+| Check | Threshold | Verdict |
+|---|---|---|
+| Thread delta (final − baseline) | ≤ 2 | ✅ / 🔴 |
+| Heap ratio (final / baseline) | ≤ 1.1 | ✅ / 🔴 |
+| P99 latency scaling | < 5× at N=100 vs N=1 | 🟡 warn |
+| Throughput scaling | > 0.7× at N=100 vs N=1 | 🟡 warn |
+
+## Step 3 — Thread Dump Analysis (optional)
+
+If thread dumps were captured during the run (via `capture-diagnostics.sh`):
+
+Look for:
+- **Thread count growth**: compare total counts across dumps
+- **Stuck threads**: same thread in same stack across dumps
+- **Leaked pools**: threads that should have been shut down after client close
+
+Key Cosmos SDK thread name patterns:
+- `cosmos-parallel-*` — SDK parallel scheduler
+- `reactor-http-*` — Reactor Netty event loop
+- `boundedElastic-*` — Reactor bounded elastic pool
+- `globalEndpointManager-*` — Cosmos endpoint refresh
+
+## Scripts Reference
+
+| Script | Purpose |
+|---|---|
+| `scripts/download-results.sh` | Download results from VM to `$CONFIG_DIR/results/`. |
+| `scripts/generate-report.py` | Generate markdown report with metrics, charts, and comparison tables. |
+| `references/thresholds.md` | Pass/fail thresholds and monitor.csv column definitions. |
@@ -0,0 +1,49 @@
+# Pass/Fail Thresholds
+
+## Hard Fail
+
+| Metric | Threshold | Verdict |
+|---|---|---|
+| Threads after close > baseline + 2 | Hard fail | 🔴 LEAK DETECTED |
+| Heap after close > baseline × 1.1 | Hard fail | 🔴 MEMORY LEAK |
+
+## Soft Warn
+
+| Metric | Threshold | Verdict |
+|---|---|---|
+| P99 latency at N=100 > 5× P99 at N=1 | Soft warn | 🟡 INVESTIGATE |
+| Throughput at N=100 < 0.7× throughput at N=1 | Soft warn | 🟡 INVESTIGATE |
+| GC pause max > 200ms | Soft warn | 🟡 Tune GC |
+
+## CSV Columns (monitor.csv from monitor.sh)
+
+| Column | Type | Description |
+|---|---|---|
+| `timestamp` | ISO 8601 | Snapshot time (UTC) |
+| `threads` | int | Live thread count (from /proc/PID/task) |
+| `fds` | int | Open file descriptors (from /proc/PID/fd) |
+| `rss_kb` | int | Resident set size in KB |
+| `cpu_pct` | float | CPU usage percentage |
+| `heap_used_kb` | long | Used heap (S1U+EU+OU from jstat) |
+| `heap_max_kb` | long | Max heap capacity (S0C+S1C+EC+OC from jstat) |
+| `gc_count` | int | Cumulative GC count (YGC+FGC+CGC) |
+| `gc_time_ms` | long | Cumulative GC time in ms |
+
+## Key Snapshots
+
+Identify snapshots using lifecycle events from the benchmark log file (pattern: `[LIFECYCLE] <event> timestamp=<ISO>`):
+
+- **Baseline** = first monitor.csv row after `PRE_CREATE` lifecycle event
+- **Peak** = row with highest `heap_used_kb`
+- **Final** = last monitor.csv row (after `POST_CLOSE` lifecycle event + settle time)
+
+## Computed Metrics
+
+- `thread_delta = final.threads - baseline.threads`
+- `heap_ratio = final.heap_used_kb / baseline.heap_used_kb`
+
+## Status Indicators
+
+- ✅ = passed / improved (delta ≤ 0 or within threshold)
+- 🟡 = marginal (<10% change)
+- 🔴 = failed / regressed (>10% worse or threshold exceeded)
@@ -0,0 +1,87 @@
+#!/bin/bash
+# download-results.sh — Download benchmark results from VM to local machine
+#
+# Usage:
+#   ./download-results.sh --config-dir <path> --run-name <name> [--output-dir ./results]
+#   ./download-results.sh --config-dir <path> --all [--output-dir ./results]
+#   ./download-results.sh --config-dir <path> --list
+#
+# Modes:
+#   --run-name <name>  Download a specific run directory
+#   --all              Download all runs from the VM
+#   --list             List available runs on the VM (no download)
+
+set -euo pipefail
+
+CONFIG_DIR=""
+RUN_NAME=""
+OUTPUT_DIR=""
+ALL=false
+LIST=false
+REMOTE_RESULTS="~/azure-sdk-for-java/sdk/cosmos/azure-cosmos-benchmark/results"
+
+while [[ $# -gt 0 ]]; do
+  case $1 in
+    --config-dir) CONFIG_DIR="$2"; shift 2 ;;
+    --run-name)   RUN_NAME="$2"; shift 2 ;;
+    --output-dir) OUTPUT_DIR="$2"; shift 2 ;;
+    --all)        ALL=true; shift ;;
+    --list)       LIST=true; shift ;;
+    *) echo "Unknown option: $1" >&2; exit 1 ;;
+  esac
+done
+
+if [[ -z "$CONFIG_DIR" ]]; then
+  echo "Usage: $0 --config-dir <path> (--run-name <name> | --all | --list)" >&2
+  exit 1
+fi
+
+# Default output to $CONFIG_DIR/results
+[[ -z "$OUTPUT_DIR" ]] && OUTPUT_DIR="$CONFIG_DIR/results"
+
+VM_IP=$(cat "$CONFIG_DIR/vm-ip")
+VM_USER=$(cat "$CONFIG_DIR/vm-user")
+VM_KEY=$(cat "$CONFIG_DIR/vm-key")
+SCP_CMD="scp -i $VM_KEY -o StrictHostKeyChecking=no -r"
+SSH_CMD="ssh -i $VM_KEY -o StrictHostKeyChecking=no $VM_USER@$VM_IP"
+
+if [[ "$LIST" == "true" ]]; then
+  echo "Available runs on VM ($VM_IP):"
+  $SSH_CMD "ls -1d $REMOTE_RESULTS/*/ 2>/dev/null | while read d; do
+    NAME=\$(basename \$d)
+    HAS_MONITOR=\$(test -f \$d/monitor.csv && echo '📊' || echo '❌')
+    GIT_INFO=''
+    if [[ -f \$d/git-info.json ]]; then
+      GIT_INFO=\$(python3 -c \"import json; d=json.load(open('\$d/git-info.json')); print(f\\\"branch={d.get('branch','?')} commit={d.get('commit','?')}\\\")\" 2>/dev/null || echo '')
+    fi
+    echo \"  \$HAS_MONITOR \$NAME  \$GIT_INFO\"
+  done" 2>/dev/null || echo "  (no runs found)"
+  exit 0
+fi
+
+mkdir -p "$OUTPUT_DIR"
+
+if [[ -n "$RUN_NAME" ]]; then
+  echo "Downloading: $RUN_NAME"
+  $SCP_CMD "$VM_USER@$VM_IP:$REMOTE_RESULTS/$RUN_NAME" "$OUTPUT_DIR/"
+  echo "✅ Downloaded to $OUTPUT_DIR/$RUN_NAME"
+
+elif [[ "$ALL" == "true" ]]; then
+  echo "Downloading all runs from VM..."
+  RUNS=$($SSH_CMD "ls -1d $REMOTE_RESULTS/*/ 2>/dev/null | xargs -I{} basename {}" || echo "")
+  if [[ -z "$RUNS" ]]; then
+    echo "No runs found on VM"
+    exit 0
+  fi
+  COUNT=0
+  for RUN in $RUNS; do
+    echo "  Downloading: $RUN"
+    $SCP_CMD "$VM_USER@$VM_IP:$REMOTE_RESULTS/$RUN" "$OUTPUT_DIR/"
+    COUNT=$((COUNT + 1))
+  done
+  echo "✅ Downloaded $COUNT run(s) to $OUTPUT_DIR/"
+
+else
+  echo "Provide --run-name, --all, or --list" >&2
+  exit 1
+fi
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		../../sdk/cosmos/azure-cosmos-benchmark/copilot/agents/cosmos-benchmark.agent.md