code · pull · Feb 23, 2026 · Feb 23, 2026 · Feb 23, 2026
diff --git a/.agents/skills/analyze-github-action-logs/SKILL.md b/.agents/skills/analyze-github-action-logs/SKILL.md
@@ -0,0 +1,115 @@
+---
+name: analyze-github-action-logs
+description: Analyze recent GitHub Actions workflow runs to identify patterns, mistakes, and improvements. Use when asked to "analyze workflow logs", "review action runs", or "analyze GitHub Actions".
+compatibility: Requires gh CLI and access to the GitHub repository.
+---
+
+# Analyze GitHub Action Logs
+
+Fetch and analyze recent GitHub Actions runs for a given workflow. Review agent/step performance, identify wasted effort and mistakes, and produce a report with actionable improvements.
+
+## Input
+
+You need:
+
+- **`workflow`** (required) — The workflow file name or ID (e.g., `issue-triage.yml`, `deploy.yml`).
+- **`repo`** (optional) — The GitHub repository in `OWNER/REPO` format. Defaults to `withastro/astro`.
+- **`count`** (optional) — Number of recent completed runs to analyze. Defaults to `5`.
+
+## Step 1: List Recent Runs
+
+Fetch the most recent completed runs for the workflow. Filter by `--status=completed`:
+
+```bash
+gh run list --workflow=<workflow> -R <repo> --status=completed -L <count>
+```
+
+Present the list to orient yourself: run IDs, titles, status (success/failure), and duration. Pick the runs to analyze — prefer a mix of successes and failures if available, and prefer runs that exercised more steps (longer runs tend to go through more stages, while shorter runs may exit early).
+
+## Step 2: Fetch Logs
+
+For each run you want to analyze, save the full log to a temp file:
+
+```bash
+gh run view <run_id> -R <repo> --log > /tmp/actions-run-<run_id>.log
+```
+
+## Step 3: Identify Step/Skill Boundaries
+
+Search each log file for markers that indicate where each step or skill starts and ends. The markers depend on the workflow — look for patterns like:
+
+- **Flue skill markers**: `[flue] skill("..."): starting` / `completed`
+- **GitHub Actions step markers**: Step name headers in the log output
+- **Custom markers**: Any `START`/`END` or similar delimiters the workflow uses
+
+```bash
+grep -n "skill(\|step\|START\|END\|starting\|completed" /tmp/actions-run-<run_id>.log | head -50
+```
+
+From this, determine which line ranges correspond to each step/skill. Also find any result markers:
+
+```bash
+grep -n "RESULT_START\|RESULT_END\|extractResult" /tmp/actions-run-<run_id>.log
+```
+
+Note: Some log files may contain binary/null bytes. Use `grep -a` if needed.
+
+## Step 4: Analyze Each Step (Use Subagents)
+
+For each step/skill that ran, **launch a subagent** to analyze that section's log. This is critical to avoid polluting your context with thousands of log lines.
+
+For each subagent, provide:
+
+1. The log file path and the line range for that step
+2. If skill instruction files exist for the workflow, tell the subagent to read them first for context
+3. The run title/context so the subagent understands what was being done
+4. The analysis criteria below
+
+### Analysis Criteria
+
+Tell each subagent to evaluate:
+
+1. **Correctness** — Was the step's final result/verdict correct?
+2. **Efficiency** — How long did it take? What's a reasonable baseline? Where was time wasted?
+3. **Mistakes** — Wrong tool calls, failed commands retried without changes, unnecessary rebuilds, etc.
+4. **Instruction compliance** — If skill instructions exist, did the agent follow them? Where did it deviate?
+5. **Scope creep** — Did the agent do work that belongs in a different step?
+6. **Suggestions** — Specific, actionable changes that would prevent the issues found.
+
+Tell each subagent to return a structured response with: Summary, Time Analysis, Issues Found (with estimated time wasted for each), and Suggestions for Improvement.
+
+## Step 5: Consolidate Report
+
+After all subagents return, synthesize their findings into a single report. Structure it as:
+
+### Per-Run Summary Table
+
+For each run analyzed, include a table:
+
+| Step/Skill | Time | Result | Time Wasted | Top Issue |
+| ---------- | ---- | ------ | ----------- | --------- |
+
+### Cross-Cutting Patterns
+
+Identify issues that appeared across multiple runs or multiple steps. These are the highest-value improvements. Common patterns to look for:
+
+- **TodoWrite abuse** — Agent wasting time on task list management during automated runs
+- **Server management failures** — Port conflicts, failed process kills, stale log files
+- **Tool misuse** — Using `curl` instead of `gh`, `jq` not found, etc.
+- **Scope creep** — One step doing work that belongs in another
+- **Unnecessary rebuilds** — Building packages multiple times without changes
+- **Test timeouts** — Running slow E2E/Playwright tests that time out
+- **Instruction violations** — Agent doing something the instructions explicitly forbid
+- **Redundant work** — Re-reading files, re-running searches, re-installing dependencies
+
+### Prioritized Recommendations
+
+Rank your improvement suggestions by estimated time savings across all runs. For each recommendation:
+
+1. **What to change** — Which file(s) to edit and what to add/modify
+2. **Why** — What pattern it addresses, with evidence from the runs
+3. **Estimated impact** — How much time it would save per run
+
+## Output
+
+Present the full consolidated report. Do NOT edit any workflow or skill files — only report findings and recommendations. The user will decide which changes to apply.
diff --git a/.agents/skills/triage/comment.md b/.agents/skills/triage/comment.md
@@ -4,6 +4,8 @@ Generate a GitHub issue comment from triage findings.
 
 **CRITICAL: You MUST always read `report.md` and produce a GitHub comment as your final output, regardless of what input files are available. Even if `report.md` is missing or empty, you must still produce a comment. In that case, produce a minimal comment stating that automated triage could not be completed.**
 
+**SCOPE: Your job is comment generation only. Finish your work once you've completed this workflow. Do NOT go further than this. It is no longer time to attempt reproduction, diagnosis, or fixing of the issue.**
+
 ## Prerequisites
 
 These variables are referenced throughout this skill. They may be passed as args by an orchestrator, or inferred from the conversation when run standalone.
@@ -41,16 +43,16 @@ The **Fix** line in the template has three possible forms. Choose the one that m
 
 The **Priority** line communicates the severity of this issue to maintainers. Its goal is to answer the question: **"How bad is it?"**
 
-Select exactly ONE priority label from the `priorityLabels` arg. Use the label descriptions to guide your decision, combined with the triage report's root cause and impact analysis. Render the chosen label name in square brackets, in bold, formatted with the `- ` prefix removed (Example: `**[P2: Has Workaround].**). Then, follow it with 1-2 sentences explaining **why** you chose that priority. Answer: "who is likely to be affected and under what conditions?". If you are unsure, use your best judgment based on the label descriptions and the triage findings.
+Select exactly ONE priority label from the `priorityLabels` arg. Use the label descriptions to guide your decision, combined with the triage report's root cause and impact analysis. Render it in bold, with the `- ` prefix removed, like this: `**Priorty P2: Has Workaround.** Then, follow it with 1-2 sentences explaining _why_ you chose that priority. Answer: "who is likely to be affected and under what conditions?". If you are unsure, use your best judgment based on the label descriptions and the triage findings.
 
 ### Template
 
 ```markdown
 **[I was able to reproduce this issue. / I was unable to reproduce this issue.]** [2-3 sentences describing the root cause, result, and key observations.]
 
-**Fix:** **[See "Fix" Instructions above.]** [1-2 sentences describing the solution, where/when it was already fixed, or guidance on where a fix might be.] [If `branchName` is non-null: [View Suggested Fix](https://github.com/withastro/astro/compare/{branchName}?expand=1)]
+**[See "Fix" Instructions above.]** [1-2 sentences describing the solution, where/when it was already fixed, or guidance on where a fix might be.] [If `branchName` is non-null: [View Suggested Fix](https://github.com/withastro/astro/compare/{branchName}?expand=1)]
 
-**Priority:** **[See "Priority" Instructions above.]** [1-2 sentences explaining why this priority was chosen, who is likely to be affected, and under what conditions (this section should answer the question: "how bad is it?")]
+**[See "Priority" Instructions above.]** [1-2 sentences explaining why this priority was chosen, who is likely to be affected, and under what conditions (this section should answer the question: "how bad is it?")]
 
 <details>
 <summary><em>Full Triage Report</em></summary>

diff --git a/.agents/skills/triage/diagnose.md b/.agents/skills/triage/diagnose.md
@@ -4,6 +4,8 @@ Find the root cause of a reproduced bug in the Astro source code.
 
 **CRITICAL: You MUST always read `report.md` and append to `report.md` before finishing, regardless of outcome. Even if you cannot identify the root cause, hit errors, or the investigation is inconclusive — always update `report.md` with your findings. The orchestrator and downstream skills depend on this file to determine what happened.**
 
+**SCOPE: Your job is diagnosis only. Finish your work once you've completed this workflow. Do NOT go further than this (no larger verification of the issue, no fixing of the issue, etc.).**
+
 ## Prerequisites
 
 These variables are referenced throughout this skill. They may be passed as args by an orchestrator, or inferred from the conversation when run standalone.

diff --git a/.agents/skills/triage/reproduce.md b/.agents/skills/triage/reproduce.md
@@ -4,6 +4,8 @@ Reproduce a GitHub issue to determine if a bug is valid and reproducible.
 
 **CRITICAL: You MUST always read `report.md` and write `report.md` to the triage directory before finishing, regardless of outcome. Even if you encounter errors, cannot reproduce the bug, hit unexpected problems, or need to skip — always write `report.md`. The orchestrator and downstream skills depend on this file to determine what happened. If you finish without writing it, the entire pipeline fails silently.**
 
+**SCOPE: Your job is reproduction only. Finish your work once you've completed this workflow. Do NOT go further than this (no larger diagnosis of the issue, no fixing of the issue, etc.).**
+
 ## Prerequisites
 
 These variables are referenced throughout this skill. They may be passed as args by an orchestrator, or inferred from the conversation when run standalone.
@@ -67,7 +69,7 @@ Skip if the bug is specific to Bun or Deno. Our sandbox only supports Node.js.
 
 ### Maintainer Override (`maintainer-override`)
 
-Skip if a repository maintainer has commented that this issue should not be reproduced here. To determine if a commenter is a maintainer, check the `author_association` field on their comment in `issueDetails` — values of `MEMBER`, `COLLABORATOR`, or `OWNER` indicate a maintainer.
+Skip if a repository maintainer has commented that this issue should not be reproduced here. To determine if a commenter is a maintainer, check the `authorAssociation` field on their comment in `issueDetails` — values of `MEMBER`, `COLLABORATOR`, or `OWNER` indicate a maintainer.
 
 ## Step 3: Set Up Reproduction Project
 

diff --git a/.agents/skills/triage/verify.md b/.agents/skills/triage/verify.md
@@ -4,6 +4,8 @@ Verify whether a GitHub issue describes an actual bug or a misunderstanding of i
 
 **CRITICAL: You MUST always read `report.md` and append to `report.md` before finishing, regardless of outcome. Even if you cannot reach a conclusion — always update `report.md` with your findings. The orchestrator and downstream skills depend on this file to determine what happened.**
 
+**SCOPE: Your job is verification only. Finish your work once you've completed this workflow. Do NOT go further than this (no fixing of the issue, etc.).**
+
 ## Prerequisites
 
 These variables are referenced throughout this skill. They may be passed as args by an orchestrator, or inferred from the conversation when run standalone.
@@ -46,28 +48,29 @@ Look at the relevant source code in `packages/`. Pay close attention to:
 - **Comments explaining "why"** — If a developer left a comment explaining why the code works a certain way, that is strong evidence of intentional design. Treat these comments as authoritative unless they are clearly outdated.
 - **Explicit conditionals and early returns** — Code that explicitly checks for the reported scenario and handles it differently than the reporter expects is likely intentional.
 - **Named constants and configuration** — Behavior controlled by a named config option or constant was probably a deliberate choice.
-- **Git blame on key lines** — If `report.md` identifies specific files and line numbers, run `git blame` on the relevant lines to find the commit that introduced the behavior. Then read the full commit message with `git show --no-patch <commit>` and review the associated PR if referenced. You can fetch PR details with `curl -s "https://api.github.com/repos/withastro/astro/pulls/<number>"`. A commit message or PR description that explains the rationale is strong evidence of intentional design.
 
-### 2c: Search prior GitHub issues and PRs
+### 2c: Git blame on key lines
+
+If `report.md` identifies specific files and line numbers, run `git blame` on the relevant lines to find the commit that introduced the behavior. Then read the full commit message with `git show --no-patch <commit>` and review the associated PR if referenced. You can fetch PR details with `gh pr view <number>`. A commit message, PR description, or PR comment from the author explaining the rationale is strong evidence of intentional design.
+
+### 2d: Search prior GitHub issues and PRs
 
 Search for prior issues and PRs that discuss the same behavior using the GitHub API. This can reveal whether the behavior was previously discussed, intentionally introduced, or already reported and closed as "not a bug."
 
 ```bash
 # Search issues for keywords related to the reported behavior
-curl -s "https://api.github.com/search/issues?q=<url-encoded-keywords>+repo:withastro/astro+is:issue&per_page=10"
+gh search issues "<keywords>"
 # Search PRs that may have introduced or discussed the behavior
-curl -s "https://api.github.com/search/issues?q=<url-encoded-keywords>+repo:withastro/astro+is:pr&per_page=10"
+gh search prs "<keywords>"
 # Read a specific issue for context
-curl -s "https://api.github.com/repos/withastro/astro/issues/<number>"
-# Read issue comments
-curl -s "https://api.github.com/repos/withastro/astro/issues/<number>/comments"
+gh issue view <number> --comments
 # Read a specific PR for context
-curl -s "https://api.github.com/repos/withastro/astro/pulls/<number>"
+gh pr view <number> --comments
 ```
 
 If you find a closed issue where a maintainer explained why the behavior is intentional, or a PR that deliberately introduced it, that is strong evidence of intended behavior.
 
-### 2d: Distinguish bugs from non-bugs
+### 2e: Distinguish bugs from non-bugs
 
 This is the most important and most error-prone step. For triage purposes, the definitions are:
 

diff --git a/.flue/sandbox/AGENTS.md b/.flue/sandbox/AGENTS.md
@@ -0,0 +1,4 @@
+# Sandbox Environment (CI)
+
+- You are running inside a Docker container in CI.
+- Always use `CI=true` with `pnpm install` (no TTY available)
diff --git a/.flue/Dockerfile.sandbox → .flue/sandbox/Dockerfile b/.flue/Dockerfile.sandbox → .flue/sandbox/Dockerfile
@@ -7,7 +7,7 @@ ENV DEBIAN_FRONTEND=noninteractive
 # The slim image includes Node.js and npm but not git, curl, or wget.
 RUN apt-get update \
     && apt-get install -y --no-install-recommends \
-       ca-certificates curl wget git \
+       ca-certificates curl wget git jq lsof procps \
     && rm -rf /var/lib/apt/lists/*
 
 # --- pnpm ---
@@ -44,14 +44,29 @@ RUN apt-get update \
     && chmod -R o+rx /opt/pw-browsers \
     && npm uninstall -g playwright
 
-# NOTE: gh CLI is intentionally NOT installed in the sandbox due to lack of tokens.
+# --- GitHub CLI (for read-only public repo operations without auth) ---
+RUN (type -p wget >/dev/null || (apt-get update && apt-get install wget -y)) \
+    && mkdir -p -m 755 /etc/apt/keyrings \
+    && out=$(mktemp) && wget -nv -O$out https://cli.github.com/packages/githubcli-archive-keyring.gpg \
+    && cat $out | tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null \
+    && chmod go+r /etc/apt/keyrings/githubcli-archive-keyring.gpg \
+    && echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" \
+       | tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
+    && apt-get update \
+    && apt-get install gh -y \
+    && rm -rf /var/lib/apt/lists/*
 
 # --- Compatibility fixes ---
 # Allow any directory as a git safe.directory. The host workspace is bind-mounted
 # at its original host path (e.g. /home/runner/work/astro/astro) and the container
 # runs as a non-root UID via --user, so git would otherwise refuse to operate.
 RUN git config --system --add safe.directory '*'
 
+# --- Global OpenCode rules for CI sessions ---
+# The flue CLI sets HOME=/tmp at runtime, so OpenCode reads global rules from
+# /tmp/.config/opencode/AGENTS.md. This injects CI-specific instructions.
+COPY .flue/sandbox/AGENTS.md /tmp/.config/opencode/AGENTS.md
+
 EXPOSE 48765
 
 # Default: start OpenCode server listening on all interfaces