Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/agents/cosmos-benchmark.agent.md
1 change: 1 addition & 0 deletions .github/skills/cosmos-benchmark-analyze
1 change: 1 addition & 0 deletions .github/skills/cosmos-benchmark-run
1 change: 1 addition & 0 deletions .github/skills/cosmos-benchmark-setup-resources
1 change: 1 addition & 0 deletions .github/skills/skill-creator
5 changes: 5 additions & 0 deletions sdk/cosmos/azure-cosmos-benchmark/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Benchmark config (contains secrets, tenant configs, VM connection info)
benchmark-config/

# Results directory
results/
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
name: Cosmos Benchmark
description: Cosmos DB benchmark agent — set up resources, run benchmarks, and analyze results. Supports both single-tenant and multi-tenant configurations. Use for benchmark/DR drill workflows.
tools: ['readFile', 'listDir', 'runInTerminal', 'search', 'grep', 'fileSearch', 'agent']
argument-hint: "setup resources, run benchmark, or analyze results"
---

# Cosmos Benchmark Agent

You are a Cosmos DB benchmark specialist. You help with the full benchmark/DR drill lifecycle: provisioning infrastructure, running benchmarks, and analyzing results.

## Routing

Determine user intent and follow the matching workflow:

| User wants to... | Skill to load |
|---|---|
| Set up resources (create/reuse Cosmos accounts, App Insights, VMs, install tools) | Read `sdk/cosmos/azure-cosmos-benchmark/copilot/skills/cosmos-benchmark-setup-resources/SKILL.md` |
| Run a benchmark (clone repo, build, configure, execute scenarios) | Read `sdk/cosmos/azure-cosmos-benchmark/copilot/skills/cosmos-benchmark-run/SKILL.md` |
| Analyze results (CSV metrics, compare runs, heap/thread dumps, reports, Kusto) | Read `sdk/cosmos/azure-cosmos-benchmark/copilot/skills/cosmos-benchmark-analyze/SKILL.md` |

When a skill references files in its `references/` directory, read them from the skill's directory (e.g., `sdk/cosmos/azure-cosmos-benchmark/copilot/skills/cosmos-benchmark-analyze/references/thresholds.md`).

## Subagent Usage

For complex multi-step workflows, use subagents to keep context clean:

- **Analyze after run**: Spawn a subagent to analyze results so run context doesn't pollute analysis.
- **Parallel analysis**: Spawn parallel subagents for multiple result directories.
- **Parallel resource creation**: During setup resources, the `provision-all.sh` script handles parallelism automatically.

## Benchmark Modes

The framework supports two modes — the choice is purely configuration:

- **Single-tenant**: Pass connection details directly via CLI flags
- **Multi-tenant**: Pass `-tenantsFile tenants.json` with multiple account configurations

Both use the same JAR, orchestrator, and monitoring infrastructure.

## Workflow Chaining

After completing one task, suggest the natural next step:

- After **setup resources** → suggest **run**
- After **run** → suggest **analyze**
- After **analyze** (if baseline exists) → suggest comparing with previous run
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
---
name: cosmos-benchmark-analyze
description: Analyze Cosmos DB benchmark results — download from VM, generate markdown reports with time-series charts and comparison tables, apply pass/fail thresholds. Triggers on "analyze results", "compare runs", "leak check", "did it pass", "generate report", "regression check", or result directories.
---

# Analyze Benchmark Results

Download results, generate a markdown report with metrics analysis, time-series charts, and multi-run comparison.

## Step 1 — Download Results

Download results from the VM to the local config directory:

```bash
# List available runs on VM
bash scripts/download-results.sh --config-dir "$CONFIG_DIR" --list

# Download a specific run
bash scripts/download-results.sh --config-dir "$CONFIG_DIR" --run-name <run-name>

# Download all runs
bash scripts/download-results.sh --config-dir "$CONFIG_DIR" --all
```

Results are saved to `$CONFIG_DIR/results/<run-name>/`. Each run directory contains:

```
<run-name>/
├── monitor.csv # JVM metrics (threads, heap, FDs, GC, RSS, CPU)
├── metrics/ # Codahale CSV metrics (throughput, latency per operation)
│ ├── #Successful Operations.csv
│ ├── #Unsuccessful Operations.csv
│ └── ... # Per-tenant and per-operation variants
├── git-info.json # branch, commit SHA
├── gc.log # G1GC log
└── benchmark.log # Benchmark stdout/stderr
```

## Step 2 — Generate Report

Generate a markdown report from the downloaded results:

```bash
python3 scripts/generate-report.py \
--results-dir "$CONFIG_DIR/results" \
--output "$CONFIG_DIR/results/report.md"
```

To analyze specific runs only:

```bash
python3 scripts/generate-report.py \
--results-dir "$CONFIG_DIR/results" \
--runs "20260302-SIMPLE-main,20260302-SIMPLE-fix-leak"
```

### What the report contains

#### Per-run summary

For each run, the report includes:
- **Git info**: branch, commit
- **JVM metrics table**: baseline, peak, and final values for threads, heap, RSS, FDs, GC
- **Pass/fail verdict**: thread leak (delta ≤2) and memory leak (ratio ≤1.1) checks
- **Throughput table**: Codahale metrics (ops/sec mean, 1m, 5m rates) from `metrics/*.csv`
- **Time-series SVG charts**: inline sparklines for threads, heap, FDs, RSS, GC count, CPU over time

#### Multi-run comparison table (when ≥2 runs)

If multiple runs are present, the report includes:
- **Side-by-side metrics comparison**: threads, heap, heap ratio, thread delta, FDs, GC, RSS for each run
- **Throughput comparison**: ops/sec for each operation across runs

### Metrics analyzed

From **`monitor.csv`** (JVM-level, sampled every 60s):
| Metric | Description |
|---|---|
| threads | Live thread count |
| heap_used_kb | Used heap (S1U+EU+OU from jstat) |
| heap_max_kb | Max heap capacity |
| rss_kb | Resident set size |
| fds | Open file descriptors |
| cpu_pct | CPU usage percentage |
| gc_count | Cumulative GC count |
| gc_time_ms | Cumulative GC time |

From **`metrics/*.csv`** (Codahale, per-operation):
| File | Metrics |
|---|---|
| `#Successful Operations.csv` | count, mean_rate, m1_rate, m5_rate |
| `#Unsuccessful Operations.csv` | count, mean_rate, m1_rate, m5_rate |
| Per-tenant/operation variants | Same columns per operation type |

### Pass/fail thresholds

See `references/thresholds.md` for full details:

| Check | Threshold | Verdict |
|---|---|---|
| Thread delta (final − baseline) | ≤ 2 | ✅ / 🔴 |
| Heap ratio (final / baseline) | ≤ 1.1 | ✅ / 🔴 |
| P99 latency scaling | < 5× at N=100 vs N=1 | 🟡 warn |
| Throughput scaling | > 0.7× at N=100 vs N=1 | 🟡 warn |

## Step 3 — Thread Dump Analysis (optional)

If thread dumps were captured during the run (via `capture-diagnostics.sh`):

Look for:
- **Thread count growth**: compare total counts across dumps
- **Stuck threads**: same thread in same stack across dumps
- **Leaked pools**: threads that should have been shut down after client close

Key Cosmos SDK thread name patterns:
- `cosmos-parallel-*` — SDK parallel scheduler
- `reactor-http-*` — Reactor Netty event loop
- `boundedElastic-*` — Reactor bounded elastic pool
- `globalEndpointManager-*` — Cosmos endpoint refresh

## Scripts Reference

| Script | Purpose |
|---|---|
| `scripts/download-results.sh` | Download results from VM to `$CONFIG_DIR/results/`. |
| `scripts/generate-report.py` | Generate markdown report with metrics, charts, and comparison tables. |
| `references/thresholds.md` | Pass/fail thresholds and monitor.csv column definitions. |
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Pass/Fail Thresholds

## Hard Fail

| Metric | Threshold | Verdict |
|---|---|---|
| Threads after close > baseline + 2 | Hard fail | 🔴 LEAK DETECTED |
| Heap after close > baseline × 1.1 | Hard fail | 🔴 MEMORY LEAK |

## Soft Warn

| Metric | Threshold | Verdict |
|---|---|---|
| P99 latency at N=100 > 5× P99 at N=1 | Soft warn | 🟡 INVESTIGATE |
| Throughput at N=100 < 0.7× throughput at N=1 | Soft warn | 🟡 INVESTIGATE |
| GC pause max > 200ms | Soft warn | 🟡 Tune GC |

## CSV Columns (monitor.csv from monitor.sh)

| Column | Type | Description |
|---|---|---|
| `timestamp` | ISO 8601 | Snapshot time (UTC) |
| `threads` | int | Live thread count (from /proc/PID/task) |
| `fds` | int | Open file descriptors (from /proc/PID/fd) |
| `rss_kb` | int | Resident set size in KB |
| `cpu_pct` | float | CPU usage percentage |
| `heap_used_kb` | long | Used heap (S1U+EU+OU from jstat) |
| `heap_max_kb` | long | Max heap capacity (S0C+S1C+EC+OC from jstat) |
| `gc_count` | int | Cumulative GC count (YGC+FGC+CGC) |
| `gc_time_ms` | long | Cumulative GC time in ms |

## Key Snapshots

Identify snapshots using lifecycle events from the benchmark log file (pattern: `[LIFECYCLE] <event> timestamp=<ISO>`):

- **Baseline** = first monitor.csv row after `PRE_CREATE` lifecycle event
- **Peak** = row with highest `heap_used_kb`
- **Final** = last monitor.csv row (after `POST_CLOSE` lifecycle event + settle time)

## Computed Metrics

- `thread_delta = final.threads - baseline.threads`
- `heap_ratio = final.heap_used_kb / baseline.heap_used_kb`

## Status Indicators

- ✅ = passed / improved (delta ≤ 0 or within threshold)
- 🟡 = marginal (<10% change)
- 🔴 = failed / regressed (>10% worse or threshold exceeded)
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
#!/bin/bash
# download-results.sh — Download benchmark results from VM to local machine
#
# Usage:
# ./download-results.sh --config-dir <path> --run-name <name> [--output-dir ./results]
# ./download-results.sh --config-dir <path> --all [--output-dir ./results]
# ./download-results.sh --config-dir <path> --list
#
# Modes:
# --run-name <name> Download a specific run directory
# --all Download all runs from the VM
# --list List available runs on the VM (no download)

set -euo pipefail

CONFIG_DIR=""
RUN_NAME=""
OUTPUT_DIR=""
ALL=false
LIST=false
REMOTE_RESULTS="~/azure-sdk-for-java/sdk/cosmos/azure-cosmos-benchmark/results"

while [[ $# -gt 0 ]]; do
case $1 in
--config-dir) CONFIG_DIR="$2"; shift 2 ;;
--run-name) RUN_NAME="$2"; shift 2 ;;
--output-dir) OUTPUT_DIR="$2"; shift 2 ;;
--all) ALL=true; shift ;;
--list) LIST=true; shift ;;
*) echo "Unknown option: $1" >&2; exit 1 ;;
esac
done

if [[ -z "$CONFIG_DIR" ]]; then
echo "Usage: $0 --config-dir <path> (--run-name <name> | --all | --list)" >&2
exit 1
fi

# Default output to $CONFIG_DIR/results
[[ -z "$OUTPUT_DIR" ]] && OUTPUT_DIR="$CONFIG_DIR/results"

VM_IP=$(cat "$CONFIG_DIR/vm-ip")
VM_USER=$(cat "$CONFIG_DIR/vm-user")
VM_KEY=$(cat "$CONFIG_DIR/vm-key")
SCP_CMD="scp -i $VM_KEY -o StrictHostKeyChecking=no -r"
SSH_CMD="ssh -i $VM_KEY -o StrictHostKeyChecking=no $VM_USER@$VM_IP"

if [[ "$LIST" == "true" ]]; then
echo "Available runs on VM ($VM_IP):"
$SSH_CMD "ls -1d $REMOTE_RESULTS/*/ 2>/dev/null | while read d; do
NAME=\$(basename \$d)
HAS_MONITOR=\$(test -f \$d/monitor.csv && echo '📊' || echo '❌')
GIT_INFO=''
if [[ -f \$d/git-info.json ]]; then
GIT_INFO=\$(python3 -c \"import json; d=json.load(open('\$d/git-info.json')); print(f\\\"branch={d.get('branch','?')} commit={d.get('commit','?')}\\\")\" 2>/dev/null || echo '')
fi
echo \" \$HAS_MONITOR \$NAME \$GIT_INFO\"
done" 2>/dev/null || echo " (no runs found)"
exit 0
fi

mkdir -p "$OUTPUT_DIR"

if [[ -n "$RUN_NAME" ]]; then
echo "Downloading: $RUN_NAME"
$SCP_CMD "$VM_USER@$VM_IP:$REMOTE_RESULTS/$RUN_NAME" "$OUTPUT_DIR/"
echo "✅ Downloaded to $OUTPUT_DIR/$RUN_NAME"

elif [[ "$ALL" == "true" ]]; then
echo "Downloading all runs from VM..."
RUNS=$($SSH_CMD "ls -1d $REMOTE_RESULTS/*/ 2>/dev/null | xargs -I{} basename {}" || echo "")
if [[ -z "$RUNS" ]]; then
echo "No runs found on VM"
exit 0
fi
COUNT=0
for RUN in $RUNS; do
echo " Downloading: $RUN"
$SCP_CMD "$VM_USER@$VM_IP:$REMOTE_RESULTS/$RUN" "$OUTPUT_DIR/"
COUNT=$((COUNT + 1))
done
echo "✅ Downloaded $COUNT run(s) to $OUTPUT_DIR/"

else
echo "Provide --run-name, --all, or --list" >&2
exit 1
fi
Loading
Loading