Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 115 additions & 0 deletions .agents/skills/analyze-github-action-logs/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
---
name: analyze-github-action-logs
description: Analyze recent GitHub Actions workflow runs to identify patterns, mistakes, and improvements. Use when asked to "analyze workflow logs", "review action runs", or "analyze GitHub Actions".
compatibility: Requires gh CLI and access to the GitHub repository.
---

# Analyze GitHub Action Logs

Fetch and analyze recent GitHub Actions runs for a given workflow. Review agent/step performance, identify wasted effort and mistakes, and produce a report with actionable improvements.

## Input

You need:

- **`workflow`** (required) — The workflow file name or ID (e.g., `issue-triage.yml`, `deploy.yml`).
- **`repo`** (optional) — The GitHub repository in `OWNER/REPO` format. Defaults to `withastro/astro`.
- **`count`** (optional) — Number of recent completed runs to analyze. Defaults to `5`.

## Step 1: List Recent Runs

Fetch the most recent completed runs for the workflow. Filter by `--status=completed`:

```bash
gh run list --workflow=<workflow> -R <repo> --status=completed -L <count>
```

Present the list to orient yourself: run IDs, titles, status (success/failure), and duration. Pick the runs to analyze — prefer a mix of successes and failures if available, and prefer runs that exercised more steps (longer runs tend to go through more stages, while shorter runs may exit early).

## Step 2: Fetch Logs

For each run you want to analyze, save the full log to a temp file:

```bash
gh run view <run_id> -R <repo> --log > /tmp/actions-run-<run_id>.log
```

## Step 3: Identify Step/Skill Boundaries

Search each log file for markers that indicate where each step or skill starts and ends. The markers depend on the workflow — look for patterns like:

- **Flue skill markers**: `[flue] skill("..."): starting` / `completed`
- **GitHub Actions step markers**: Step name headers in the log output
- **Custom markers**: Any `START`/`END` or similar delimiters the workflow uses

```bash
grep -n "skill(\|step\|START\|END\|starting\|completed" /tmp/actions-run-<run_id>.log | head -50
```

From this, determine which line ranges correspond to each step/skill. Also find any result markers:

```bash
grep -n "RESULT_START\|RESULT_END\|extractResult" /tmp/actions-run-<run_id>.log
```

Note: Some log files may contain binary/null bytes. Use `grep -a` if needed.

## Step 4: Analyze Each Step (Use Subagents)

For each step/skill that ran, **launch a subagent** to analyze that section's log. This is critical to avoid polluting your context with thousands of log lines.

For each subagent, provide:

1. The log file path and the line range for that step
2. If skill instruction files exist for the workflow, tell the subagent to read them first for context
3. The run title/context so the subagent understands what was being done
4. The analysis criteria below

### Analysis Criteria

Tell each subagent to evaluate:

1. **Correctness** — Was the step's final result/verdict correct?
2. **Efficiency** — How long did it take? What's a reasonable baseline? Where was time wasted?
3. **Mistakes** — Wrong tool calls, failed commands retried without changes, unnecessary rebuilds, etc.
4. **Instruction compliance** — If skill instructions exist, did the agent follow them? Where did it deviate?
5. **Scope creep** — Did the agent do work that belongs in a different step?
6. **Suggestions** — Specific, actionable changes that would prevent the issues found.

Tell each subagent to return a structured response with: Summary, Time Analysis, Issues Found (with estimated time wasted for each), and Suggestions for Improvement.

## Step 5: Consolidate Report

After all subagents return, synthesize their findings into a single report. Structure it as:

### Per-Run Summary Table

For each run analyzed, include a table:

| Step/Skill | Time | Result | Time Wasted | Top Issue |
| ---------- | ---- | ------ | ----------- | --------- |

### Cross-Cutting Patterns

Identify issues that appeared across multiple runs or multiple steps. These are the highest-value improvements. Common patterns to look for:

- **TodoWrite abuse** — Agent wasting time on task list management during automated runs
- **Server management failures** — Port conflicts, failed process kills, stale log files
- **Tool misuse** — Using `curl` instead of `gh`, `jq` not found, etc.
- **Scope creep** — One step doing work that belongs in another
- **Unnecessary rebuilds** — Building packages multiple times without changes
- **Test timeouts** — Running slow E2E/Playwright tests that time out
- **Instruction violations** — Agent doing something the instructions explicitly forbid
- **Redundant work** — Re-reading files, re-running searches, re-installing dependencies

### Prioritized Recommendations

Rank your improvement suggestions by estimated time savings across all runs. For each recommendation:

1. **What to change** — Which file(s) to edit and what to add/modify
2. **Why** — What pattern it addresses, with evidence from the runs
3. **Estimated impact** — How much time it would save per run

## Output

Present the full consolidated report. Do NOT edit any workflow or skill files — only report findings and recommendations. The user will decide which changes to apply.
8 changes: 5 additions & 3 deletions .agents/skills/triage/comment.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ Generate a GitHub issue comment from triage findings.

**CRITICAL: You MUST always read `report.md` and produce a GitHub comment as your final output, regardless of what input files are available. Even if `report.md` is missing or empty, you must still produce a comment. In that case, produce a minimal comment stating that automated triage could not be completed.**

**SCOPE: Your job is comment generation only. Finish your work once you've completed this workflow. Do NOT go further than this. It is no longer time to attempt reproduction, diagnosis, or fixing of the issue.**

## Prerequisites

These variables are referenced throughout this skill. They may be passed as args by an orchestrator, or inferred from the conversation when run standalone.
Expand Down Expand Up @@ -41,16 +43,16 @@ The **Fix** line in the template has three possible forms. Choose the one that m

The **Priority** line communicates the severity of this issue to maintainers. Its goal is to answer the question: **"How bad is it?"**

Select exactly ONE priority label from the `priorityLabels` arg. Use the label descriptions to guide your decision, combined with the triage report's root cause and impact analysis. Render the chosen label name in square brackets, in bold, formatted with the `- ` prefix removed (Example: `**[P2: Has Workaround].**). Then, follow it with 1-2 sentences explaining **why** you chose that priority. Answer: "who is likely to be affected and under what conditions?". If you are unsure, use your best judgment based on the label descriptions and the triage findings.
Select exactly ONE priority label from the `priorityLabels` arg. Use the label descriptions to guide your decision, combined with the triage report's root cause and impact analysis. Render it in bold, with the `- ` prefix removed, like this: `**Priorty P2: Has Workaround.** Then, follow it with 1-2 sentences explaining _why_ you chose that priority. Answer: "who is likely to be affected and under what conditions?". If you are unsure, use your best judgment based on the label descriptions and the triage findings.

### Template

```markdown
**[I was able to reproduce this issue. / I was unable to reproduce this issue.]** [2-3 sentences describing the root cause, result, and key observations.]

**Fix:** **[See "Fix" Instructions above.]** [1-2 sentences describing the solution, where/when it was already fixed, or guidance on where a fix might be.] [If `branchName` is non-null: [View Suggested Fix](https://github.com/withastro/astro/compare/{branchName}?expand=1)]
**[See "Fix" Instructions above.]** [1-2 sentences describing the solution, where/when it was already fixed, or guidance on where a fix might be.] [If `branchName` is non-null: [View Suggested Fix](https://github.com/withastro/astro/compare/{branchName}?expand=1)]

**Priority:** **[See "Priority" Instructions above.]** [1-2 sentences explaining why this priority was chosen, who is likely to be affected, and under what conditions (this section should answer the question: "how bad is it?")]
**[See "Priority" Instructions above.]** [1-2 sentences explaining why this priority was chosen, who is likely to be affected, and under what conditions (this section should answer the question: "how bad is it?")]

<details>
<summary><em>Full Triage Report</em></summary>
Expand Down
2 changes: 2 additions & 0 deletions .agents/skills/triage/diagnose.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ Find the root cause of a reproduced bug in the Astro source code.

**CRITICAL: You MUST always read `report.md` and append to `report.md` before finishing, regardless of outcome. Even if you cannot identify the root cause, hit errors, or the investigation is inconclusive — always update `report.md` with your findings. The orchestrator and downstream skills depend on this file to determine what happened.**

**SCOPE: Your job is diagnosis only. Finish your work once you've completed this workflow. Do NOT go further than this (no larger verification of the issue, no fixing of the issue, etc.).**

## Prerequisites

These variables are referenced throughout this skill. They may be passed as args by an orchestrator, or inferred from the conversation when run standalone.
Expand Down
4 changes: 3 additions & 1 deletion .agents/skills/triage/reproduce.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ Reproduce a GitHub issue to determine if a bug is valid and reproducible.

**CRITICAL: You MUST always read `report.md` and write `report.md` to the triage directory before finishing, regardless of outcome. Even if you encounter errors, cannot reproduce the bug, hit unexpected problems, or need to skip — always write `report.md`. The orchestrator and downstream skills depend on this file to determine what happened. If you finish without writing it, the entire pipeline fails silently.**

**SCOPE: Your job is reproduction only. Finish your work once you've completed this workflow. Do NOT go further than this (no larger diagnosis of the issue, no fixing of the issue, etc.).**

## Prerequisites

These variables are referenced throughout this skill. They may be passed as args by an orchestrator, or inferred from the conversation when run standalone.
Expand Down Expand Up @@ -67,7 +69,7 @@ Skip if the bug is specific to Bun or Deno. Our sandbox only supports Node.js.

### Maintainer Override (`maintainer-override`)

Skip if a repository maintainer has commented that this issue should not be reproduced here. To determine if a commenter is a maintainer, check the `author_association` field on their comment in `issueDetails` — values of `MEMBER`, `COLLABORATOR`, or `OWNER` indicate a maintainer.
Skip if a repository maintainer has commented that this issue should not be reproduced here. To determine if a commenter is a maintainer, check the `authorAssociation` field on their comment in `issueDetails` — values of `MEMBER`, `COLLABORATOR`, or `OWNER` indicate a maintainer.

## Step 3: Set Up Reproduction Project

Expand Down
21 changes: 12 additions & 9 deletions .agents/skills/triage/verify.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ Verify whether a GitHub issue describes an actual bug or a misunderstanding of i

**CRITICAL: You MUST always read `report.md` and append to `report.md` before finishing, regardless of outcome. Even if you cannot reach a conclusion — always update `report.md` with your findings. The orchestrator and downstream skills depend on this file to determine what happened.**

**SCOPE: Your job is verification only. Finish your work once you've completed this workflow. Do NOT go further than this (no fixing of the issue, etc.).**

## Prerequisites

These variables are referenced throughout this skill. They may be passed as args by an orchestrator, or inferred from the conversation when run standalone.
Expand Down Expand Up @@ -46,28 +48,29 @@ Look at the relevant source code in `packages/`. Pay close attention to:
- **Comments explaining "why"** — If a developer left a comment explaining why the code works a certain way, that is strong evidence of intentional design. Treat these comments as authoritative unless they are clearly outdated.
- **Explicit conditionals and early returns** — Code that explicitly checks for the reported scenario and handles it differently than the reporter expects is likely intentional.
- **Named constants and configuration** — Behavior controlled by a named config option or constant was probably a deliberate choice.
- **Git blame on key lines** — If `report.md` identifies specific files and line numbers, run `git blame` on the relevant lines to find the commit that introduced the behavior. Then read the full commit message with `git show --no-patch <commit>` and review the associated PR if referenced. You can fetch PR details with `curl -s "https://api.github.com/repos/withastro/astro/pulls/<number>"`. A commit message or PR description that explains the rationale is strong evidence of intentional design.

### 2c: Search prior GitHub issues and PRs
### 2c: Git blame on key lines

If `report.md` identifies specific files and line numbers, run `git blame` on the relevant lines to find the commit that introduced the behavior. Then read the full commit message with `git show --no-patch <commit>` and review the associated PR if referenced. You can fetch PR details with `gh pr view <number>`. A commit message, PR description, or PR comment from the author explaining the rationale is strong evidence of intentional design.

### 2d: Search prior GitHub issues and PRs

Search for prior issues and PRs that discuss the same behavior using the GitHub API. This can reveal whether the behavior was previously discussed, intentionally introduced, or already reported and closed as "not a bug."

```bash
# Search issues for keywords related to the reported behavior
curl -s "https://api.github.com/search/issues?q=<url-encoded-keywords>+repo:withastro/astro+is:issue&per_page=10"
gh search issues "<keywords>"
# Search PRs that may have introduced or discussed the behavior
curl -s "https://api.github.com/search/issues?q=<url-encoded-keywords>+repo:withastro/astro+is:pr&per_page=10"
gh search prs "<keywords>"
# Read a specific issue for context
curl -s "https://api.github.com/repos/withastro/astro/issues/<number>"
# Read issue comments
curl -s "https://api.github.com/repos/withastro/astro/issues/<number>/comments"
gh issue view <number> --comments
# Read a specific PR for context
curl -s "https://api.github.com/repos/withastro/astro/pulls/<number>"
gh pr view <number> --comments
```

If you find a closed issue where a maintainer explained why the behavior is intentional, or a PR that deliberately introduced it, that is strong evidence of intended behavior.

### 2d: Distinguish bugs from non-bugs
### 2e: Distinguish bugs from non-bugs

This is the most important and most error-prone step. For triage purposes, the definitions are:

Expand Down
4 changes: 4 additions & 0 deletions .flue/sandbox/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sandbox Environment (CI)

- You are running inside a Docker container in CI.
- Always use `CI=true` with `pnpm install` (no TTY available)
19 changes: 17 additions & 2 deletions .flue/Dockerfile.sandbox → .flue/sandbox/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ ENV DEBIAN_FRONTEND=noninteractive
# The slim image includes Node.js and npm but not git, curl, or wget.
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
ca-certificates curl wget git \
ca-certificates curl wget git jq lsof procps \
&& rm -rf /var/lib/apt/lists/*

# --- pnpm ---
Expand Down Expand Up @@ -44,14 +44,29 @@ RUN apt-get update \
&& chmod -R o+rx /opt/pw-browsers \
&& npm uninstall -g playwright

# NOTE: gh CLI is intentionally NOT installed in the sandbox due to lack of tokens.
# --- GitHub CLI (for read-only public repo operations without auth) ---
RUN (type -p wget >/dev/null || (apt-get update && apt-get install wget -y)) \
&& mkdir -p -m 755 /etc/apt/keyrings \
&& out=$(mktemp) && wget -nv -O$out https://cli.github.com/packages/githubcli-archive-keyring.gpg \
&& cat $out | tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null \
&& chmod go+r /etc/apt/keyrings/githubcli-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" \
| tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
&& apt-get update \
&& apt-get install gh -y \
&& rm -rf /var/lib/apt/lists/*

# --- Compatibility fixes ---
# Allow any directory as a git safe.directory. The host workspace is bind-mounted
# at its original host path (e.g. /home/runner/work/astro/astro) and the container
# runs as a non-root UID via --user, so git would otherwise refuse to operate.
RUN git config --system --add safe.directory '*'

# --- Global OpenCode rules for CI sessions ---
# The flue CLI sets HOME=/tmp at runtime, so OpenCode reads global rules from
# /tmp/.config/opencode/AGENTS.md. This injects CI-specific instructions.
COPY .flue/sandbox/AGENTS.md /tmp/.config/opencode/AGENTS.md

EXPOSE 48765

# Default: start OpenCode server listening on all interfaces
Expand Down
Loading
Loading