Add private runner support to git-ape-onboarding by arnaudlh · Pull Request #179 · Azure/git-ape

arnaudlh · 2026-06-15T05:16:19Z

What & why

Supports private GitHub Actions runners for the Git-Ape workflows: pick a runner type (ACI / ACA / AKS) and migrate from public → private with a single workflow variable, no code changes.

Bootstrap model: start public, switch to private later

Private runners are themselves Azure resources deployed by a Git-Ape workflow, so the first deploy (including the one that creates the runner host) must run on a GitHub-hosted runner. Onboarding therefore scaffolds workflows that default to ubuntu-latest; going private is a later, additive step:

runs-on: ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}

`GIT_APE_RUNNER_LABEL`	Effect
unset (default)	GitHub-hosted `ubuntu-latest`, no infrastructure
set to a label	Self-hosted runners registered with that label

Reversible with gh variable delete GIT_APE_RUNNER_LABEL.

Changes

Workflows — parametrized runs-on in git-ape-plan/deploy/destroy/verify templates; added a "runner configuration" report step to verify.yml.
Runner IaC (templates/runners/, on-demand reference — deliberately not scaffolded):
- aci/ and aca/ — ARM template.json + parameters.json (ephemeral runners, optional user-assigned identity, optional VNet injection; ACA uses the KEDA github-runner scaler with scale-to-zero).
- aks/ — ARC gha-runner-scale-set Helm values.yaml + README.
- README.md — type × platform matrix, security model, provisioning flow, drift caveat.
Playbook / agent — SKILL Step 11 (runner selection & provisioning, re-runnable later); onboarding agent gains a runner-type input; copilot-instructions.md gets a GitHub Actions Runners section (mirror kept byte-identical).
Docs & evals — regenerated website docs; added a positive private-runner eval task.

Security

UAMI for Azure access (no keys); the GitHub registration credential is the only secret, sourced from Key Vault (securestring + Key Vault reference); ephemeral runners by default; the runner label must match GIT_APE_RUNNER_LABEL.

Validation

Live server-side ARM validate + what-if on ACI & ACA templates exercising both branches of every conditional (isOrgScope, hasIdentity, isVnet) against a real user-assigned identity and real delegated subnets — all Succeeded, temp resources torn down.
Docusaurus production build, actionlint on all 4 workflows, template-sync (bash + pwsh) and scaffold-parity checks, docs generator idempotent.

Closes #181

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Support all private GitHub Actions runner options from Azure/git-ape-private#12: select runner type (ACI/ACA/AKS) and migrate public->private via a single GIT_APE_RUNNER_LABEL workflow variable. - Parametrize runs-on in plan/deploy/destroy/verify workflow templates to ${{ vars.GIT_APE_RUNNER_LABEL || 'ubuntu-latest' }}; add a runner configuration report step to verify.yml. - Add on-demand runner IaC under templates/runners/: ARM template.json + parameters.json for ACI and ACA (ephemeral runners, optional UAMI, optional VNet injection, KEDA github-runner scaler with scale-to-zero for ACA), and an ARC gha-runner-scale-set Helm values.yaml for AKS. - Update SKILL.md (runner selection/provisioning step), the onboarding agent contract (runner type input), and the copilot-instructions GitHub Actions Runners section (mirror kept byte-identical). - Regenerate website docs and add a positive private-runner eval task. Validated: server-side az deployment validate + what-if on ACI/ACA templates (both branches of every conditional), Docusaurus build, actionlint, template-sync and scaffold-parity checks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-15T05:17:03Z

🤖 Waza agent evals (advisory)

ℹ️ No agents evaluated. changed agent(s) have no eval directory: git-ape-onboarding

Ran 0 agent evals against claude-sonnet-4.6. Each eval consumes ~5 premium Copilot requests; results are non-blocking — investigate failures via the workflow logs and the per-agent waza-agent-results-* artifacts.

How this works: This workflow auto-syncs the canonical .github/agents/<name>.agent.md into the sibling mirror inside .github/evals/agents/<name>/ before each run, so the score below reflects the version of the agent in this PR — not whatever was committed when the eval was first wired up.

📊 Agent file token comparison vs main (advisory)

No .agent.md files changed vs main (or token-compare returned no entries).

No agents in scope for this PR.

github-actions · 2026-06-15T05:20:40Z

🧪 Waza skill evals (advisory)

🎯 Diff-scoped run. diff-scoped: 1 changed skill(s) — git-ape-onboarding Touch .waza.yaml or trigger workflow_dispatch to run the full matrix.

Ran 4 matrix legs in parallel (skills × models). Results are non-blocking — investigate failures via the workflow logs and the per-leg waza-results-* artifacts.

Legend: Models flagged baseline: true in .github/evals/manifest.yaml (currently: gpt-5.4) run with --baseline (A/B mode) to cap quota. All other models run standard. Judge model is fixed at claude-opus-4.7 across all legs.

📊 Token comparison vs main (advisory)

{
  "baseRef": "main",
  "headRef": "WORKING",
  "threshold": 10,
  "passed": true,
  "timestamp": "2026-06-17T11:07:44.986945434Z",
  "summary": {
    "totalBefore": 0,
    "totalAfter": 44126,
    "totalDiff": 44126,
    "percentChange": 100,
    "filesAdded": 15,
    "filesRemoved": 0,
    "filesModified": 0,
    "filesIncreased": 15,
    "filesDecreased": 0
  },
  "files": [
    {
      "file": ".github/skills/azure-cost-estimator/SKILL.md",
      "before": null,
      "after": {
        "tokens": 3231,
        "characters": 11940,
        "lines": 345
      },
      "diff": 3231,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-deployment-preflight/SKILL.md",
      "before": null,
      "after": {
        "tokens": 1448,
        "characters": 6281,
        "lines": 212
      },
      "diff": 1448,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-drift-detector/SKILL.md",
      "before": null,
      "after": {
        "tokens": 3179,
        "characters": 13149,
        "lines": 460
      },
      "diff": 3179,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-integration-tester/SKILL.md",
      "before": null,
      "after": {
        "tokens": 1563,
        "characters": 6807,
        "lines": 248
      },
      "diff": 1563,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-naming-research/SKILL.md",
      "before": null,
      "after": {
        "tokens": 486,
        "characters": 2108,
        "lines": 44
      },
      "diff": 486,
      "percentChange": 100,
      "status": "added",
      "limit": 500
    },
    {
      "file": ".github/skills/azure-policy-advisor/SKILL.md",
      "before": null,
      "after": {
        "tokens": 4751,
        "characters": 21485,
        "lines": 368
      },
      "diff": 4751,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-resource-availability/SKILL.md",
      "before": null,
      "after": {
        "tokens": 2413,
        "characters": 9881,
        "lines": 308
      },
      "diff": 2413,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-resource-visualizer/SKILL.md",
      "before": null,
      "after": {
        "tokens": 1494,
        "characters": 6179,
        "lines": 192
      },
      "diff": 1494,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-rest-api-reference/SKILL.md",
      "before": null,
      "after": {
        "tokens": 1831,
        "characters": 8430,
        "lines": 200
      },
      "diff": 1831,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-role-selector/SKILL.md",
      "before": null,
      "after": {
        "tokens": 1280,
        "characters": 5641,
        "lines": 162
      },
      "diff": 1280,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-security-analyzer/SKILL.md",
      "before": null,
      "after": {
        "tokens": 5326,
        "characters": 21419,
        "lines": 451
      },
      "diff": 5326,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-stack-deploy/SKILL.md",
      "before": null,
      "after": {
        "tokens": 1912,
        "characters": 7525,
        "lines": 159
      },
      "diff": 1912,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-stack-destroy/SKILL.md",
      "before": null,
      "after": {
        "tokens": 2644,
        "characters": 10670,
        "lines": 180
      },
      "diff": 2644,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/git-ape-onboarding/SKILL.md",
      "before": null,
      "after": {
        "tokens": 10428,
        "characters": 41881,
        "lines": 851
      },
      "diff": 10428,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/prereq-check/SKILL.md",
      "before": null,
      "after": {
        "tokens": 2140,
        "characters": 8023,
        "lines": 147
      },
      "diff": 2140,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    }
  ]
}

Skill: `git-ape-onboarding`

📈 Score (per model) + Suggestions/Recommendations

Model: claude-opus-4.6

Running benchmark: git-ape-onboarding-eval
Skill: git-ape-onboarding
Engine: copilot-sdk
Model: claude-opus-4.6
Judge Model: claude-opus-4.7
Parallel: 4 workers

✓ [1/5] Negative — Storage service comparison (off-topic)
✓ [5/5] Positive — Scaffold honors skip-with-notice on collision
✓ [3/5] Positive — Multi-environment onboarding
✓ [2/5] Positive — First-time repo setup
✓ [4/5] Positive — Onboard with private VNet-injected runner

🧪 Waza Eval Results

Status: ✅ Passed | Score: 0.90 | Duration: 58.841s

Tests: 5 total, 5 passed, 0 failed, 0 errors
Success Rate: 100.0%
Score Range: 0.56 - 1.00 (σ=0.1721)

Task Results

Task	Score	Status	Graders
Negative — Storage service comparison (off-topic)	0.56	✅	budget, trigger_relevance_negative
Positive — First-time repo setup	1.00	✅	answer_quality, budget, trigger_relevance_positive
Positive — Multi-environment onboarding	0.96	✅	answer_quality, budget, trigger_relevance_positive
Positive — Onboard with private VNet-injected runner	1.00	✅	answer_quality, budget, trigger_relevance_positive
Positive — Scaffold honors skip-with-notice on collision	0.98	✅	answer_quality, budget, trigger_relevance_positive

Benchmark: git-ape-onboarding-eval | Skill: git-ape-onboarding | Model: claude-opus-4.6

Results saved to: .waza-results/git-ape-onboarding-claude-opus-4.6.json
JUnit XML saved to: .waza-results/git-ape-onboarding-claude-opus-4.6.junit.xml

Model: claude-sonnet-4.6

Running benchmark: git-ape-onboarding-eval
Skill: git-ape-onboarding
Engine: copilot-sdk
Model: claude-sonnet-4.6
Judge Model: claude-opus-4.7
Parallel: 4 workers

✓ [1/5] Negative — Storage service comparison (off-topic)
✓ [5/5] Positive — Scaffold honors skip-with-notice on collision
✓ [2/5] Positive — First-time repo setup
✓ [3/5] Positive — Multi-environment onboarding
✓ [4/5] Positive — Onboard with private VNet-injected runner

🧪 Waza Eval Results

Status: ✅ Passed | Score: 0.90 | Duration: 50.332s

Tests: 5 total, 5 passed, 0 failed, 0 errors
Success Rate: 100.0%
Score Range: 0.56 - 1.00 (σ=0.1721)

Task Results

Task	Score	Status	Graders
Negative — Storage service comparison (off-topic)	0.56	✅	budget, trigger_relevance_negative
Positive — First-time repo setup	1.00	✅	answer_quality, budget, trigger_relevance_positive
Positive — Multi-environment onboarding	0.96	✅	answer_quality, budget, trigger_relevance_positive
Positive — Onboard with private VNet-injected runner	1.00	✅	answer_quality, budget, trigger_relevance_positive
Positive — Scaffold honors skip-with-notice on collision	0.98	✅	answer_quality, budget, trigger_relevance_positive

Benchmark: git-ape-onboarding-eval | Skill: git-ape-onboarding | Model: claude-sonnet-4.6

Results saved to: .waza-results/git-ape-onboarding-claude-sonnet-4.6.json
JUnit XML saved to: .waza-results/git-ape-onboarding-claude-sonnet-4.6.junit.xml

Model: gpt-5.3-codex

Running benchmark: git-ape-onboarding-eval
Skill: git-ape-onboarding
Engine: copilot-sdk
Model: gpt-5.3-codex
Judge Model: claude-opus-4.7
Parallel: 4 workers

✓ [5/5] Positive — Scaffold honors skip-with-notice on collision
✓ [1/5] Negative — Storage service comparison (off-topic)
✗ [3/5] Positive — Multi-environment onboarding
✗ [2/5] Positive — First-time repo setup
✗ [4/5] Positive — Onboard with private VNet-injected runner

🧪 Waza Eval Results

Status: ❌ Failed | Score: 0.70 | Duration: 58.171s

Tests: 5 total, 2 passed, 3 failed, 0 errors
Success Rate: 40.0%
Score Range: 0.56 - 0.98 (σ=0.1468)

Task Results

Task	Score	Status	Graders
Negative — Storage service comparison (off-topic)	0.56	✅	budget, trigger_relevance_negative
Positive — First-time repo setup	0.67	❌	answer_quality, budget, trigger_relevance_positive
Positive — Multi-environment onboarding	0.62	❌	answer_quality, budget, trigger_relevance_positive
Positive — Onboard with private VNet-injected runner	0.67	❌	answer_quality, budget, trigger_relevance_positive
Positive — Scaffold honors skip-with-notice on collision	0.98	✅	answer_quality, budget, trigger_relevance_positive

Failed Task Details

Positive — First-time repo setup

Run 1/1 (failed):

❌ answer_quality (0.00): fail: Missing criterion 1: no prereq check results presented: Criteria 2, 3, 4 are met: the agent surfaced a blocking gate (couldn't run checks), asked for 5 required inputs (repo URL, subscription ID, RBAC role, mode, default branch), and did not claim to have configured anything. However, criterion 1 fails: the agent did not present prereq check results (no table of tool versions, no Azure/GitHub auth status). The bash invocation returned "unexpected user permission response" and the agent fell back to asking the user to manually self-confirm prereqs, without producing any inspected environment data. No evidence the agent actually inspected the environment for az, gh, jq, or git versions or auth state.
✅ budget (1.00): All behavior checks passed
✅ trigger_relevance_positive (1.00): Prompt is trigger-aligned (score 1.00 >= 0.50)

Positive — Multi-environment onboarding

Run 1/1 (failed):

❌ answer_quality (0.00): fail: Missing prereq check execution and not a clearly gated input-collection turn: Criterion 1 missing: the assistant did not present prereq check results (no tool/auth status table, no inspection of the local environment). It only told the user to run /prereq-check themselves rather than executing it or inspecting az/gh/jq/git versions and auth status.

Criterion 2 missing: because prereqs were never actually evaluated, no auth/prereq gate was surfaced (nor confirmed all-pass).

Criterion 3 partially met but weak: inputs are listed in a single sentence ("You need: repo URL, staging subscription ID, RBAC role, mode, branch") rather than being requested as numbered questions or an explicit input block awaiting answers. It reads more like a checklist preamble to an immediate walkthrough than a gated request.

Criterion 4 met: response explicitly mentions the azure-deploy-staging environment, a new staging federated-credential subject (fc-azure-deploy-staging), per-environment scoped secrets/variables, and staging-scoped RBAC.

Overall the response jumps into a 9-step walkthrough with concrete az/gh commands instead of a gated step-1 handoff, which violates the skill's first-turn rule.

✅ budget (1.00): All behavior checks passed
✅ trigger_relevance_positive (0.87): Prompt is trigger-aligned (score 0.87 >= 0.50)

Positive — Onboard with private VNet-injected runner

Run 1/1 (failed):

❌ answer_quality (0.00): fail: Missing runner acknowledgement: Criteria 1, 2, 4 met: the reply is a gated handoff (prereq confirmation table + input request before execution), requests 5 required inputs (repo URL, subscription ID, RBAC role, mode, default branch), and does not falsely claim any provisioning has happened. However, criterion 3 fails: the user explicitly asked for VNet-injected self-hosted runners on Azure Container Apps, but the reply never acknowledges the private/ACA runner choice, never mentions the GIT_APE_RUNNER_LABEL switch, and never points at templates/runners/. Runner type should also have been collected as one of the gated inputs given the user's explicit request.
✅ budget (1.00): All behavior checks passed
✅ trigger_relevance_positive (1.00): Prompt is trigger-aligned (score 1.00 >= 0.50)

Benchmark: git-ape-onboarding-eval | Skill: git-ape-onboarding | Model: gpt-5.3-codex

Results saved to: .waza-results/git-ape-onboarding-gpt-5.3-codex.json

Model: gpt-5.4 *(baseline — A/B mode)*

Running benchmark: git-ape-onboarding-eval
Skill: git-ape-onboarding
Engine: copilot-sdk
Model: gpt-5.4
Judge Model: claude-opus-4.7
Parallel: 4 workers

════════════════════════════════════════════════════════════════
PASS 1: Skills-Enabled Run
════════════════════════════════════════════════════════════════
✓ [1/5] Negative — Storage service comparison (off-topic)
✓ [5/5] Positive — Scaffold honors skip-with-notice on collision
✗ [2/5] Positive — First-time repo setup
✗ [3/5] Positive — Multi-environment onboarding
✓ [4/5] Positive — Onboard with private VNet-injected runner

════════════════════════════════════════════════════════════════
PASS 2: Skills Baseline (skills stripped)
════════════════════════════════════════════════════════════════
✓ [1/5] Negative — Storage service comparison (off-topic)
✗ [3/5] Positive — Multi-environment onboarding
✗ [5/5] Positive — Scaffold honors skip-with-notice on collision
✗ [2/5] Positive — First-time repo setup
[ERROR] waiting for session.idle: context deadline exceeded

✗ [4/5] Positive — Onboard with private VNet-injected runner

════════════════════════════════════════════════════════════════
SKILL IMPACT ANALYSIS
════════════════════════════════════════════════════════════════
Overall Performance Delta:
With Skills: 60.0% (3/5 tasks passed)
Without Skills: 20.0% (1/5 tasks passed)
Impact: +40.0 percentage points

Per-Task Breakdown:
• Negative — Storage service comparison (off-topic) [NEUTRAL] 100% → 100% (+0pp)
• Positive — First-time repo setup [NEUTRAL] 0% → 0% (+0pp)
• Positive — Multi-environment onboarding [NEUTRAL] 0% → 0% (+0pp)
• Positive — Onboard with private VNet-injected runner [IMPROVED] 0% → 100% (+100pp)
• Positive — Scaffold honors skip-with-notice on collision [IMPROVED] 0% → 100% (+100pp)

Verdict: Skills have POSITIVE IMPACT (improved 2/5 tasks)
════════════════════════════════════════════════════════════════

🧪 Waza Eval Results

Status: ❌ Failed | Score: 0.76 | Duration: 52.696s

Tests: 5 total, 3 passed, 2 failed, 0 errors
Success Rate: 60.0%
Score Range: 0.56 - 1.00 (σ=0.1874)

Task Results

Task	Score	Status	Graders
Negative — Storage service comparison (off-topic)	0.56	✅	budget, trigger_relevance_negative
Positive — First-time repo setup	0.67	❌	answer_quality, budget, trigger_relevance_positive
Positive — Multi-environment onboarding	0.62	❌	answer_quality, budget, trigger_relevance_positive
Positive — Onboard with private VNet-injected runner	1.00	✅	answer_quality, budget, trigger_relevance_positive
Positive — Scaffold honors skip-with-notice on collision	0.98	✅	answer_quality, budget, trigger_relevance_positive

Failed Task Details

Positive — First-time repo setup

Run 1/1 (failed):

❌ answer_quality (0.00): fail: Missing prereq results presentation and onboarding input collection: Criterion 1 FAIL: The agent's bash invocation errored ("unexpected user permission response") and no prereq results table/list was presented — no tool versions, no Azure auth status, no GitHub auth status. The agent only listed what the user should run manually. Criterion 3 FAIL: The agent requested zero of the required onboarding inputs (target repo URL, Azure subscription ID, RBAC role, region/project, onboarding mode). It only asked the user to confirm prereqs. Criteria 2 and 4 pass (gate is surfaced; no fabricated configuration claims).
✅ budget (1.00): All behavior checks passed
✅ trigger_relevance_positive (1.00): Prompt is trigger-aligned (score 1.00 >= 0.50)

Positive — Multi-environment onboarding

Run 1/1 (failed):

❌ answer_quality (0.00): fail: Missing prereq check execution and results: Criterion 1 not met: the assistant told the user to run /prereq-check but did not actually present prereq results (no tool/auth status table or inspection of az/gh/jq/git versions and auth context). Criterion 2 not met: because no prereq inspection was performed, the auth/prereq gate was not surfaced — we cannot confirm prereqs pass. Criteria 3 and 4 are met (asked for repo, staging subscription, role, branch; mentioned azure-deploy-staging env, separate federated credential, per-env subscription variable, and reuse-vs-new app reg).
✅ budget (1.00): All behavior checks passed
✅ trigger_relevance_positive (0.87): Prompt is trigger-aligned (score 0.87 >= 0.50)

Benchmark: git-ape-onboarding-eval | Skill: git-ape-onboarding | Model: gpt-5.4

Results saved to: .waza-results/git-ape-onboarding-gpt-5.4.json
JUnit XML saved to: .waza-results/git-ape-onboarding-gpt-5.4.junit.xml

🔢 Tokens (count + profile)

📊 git-ape-onboarding: 10,428 tokens (detailed ✓), 38 sections, 50 code blocks
   ⚠️  token count 10428 exceeds 3000

🎯 Quality (5-dim table)

DIMENSION          SCORE  FEEDBACK
────────────────────────────────────────────
clarity            █████  Exceptional structure throughout — invariants, execution modes, numbered playbook, and the 'first-turn rule' all make agent behavior unambiguous. Code blocks, tables, and ✅/⊝ status conventions are consistently applied and easy to follow.
completeness       █████  Remarkably thorough: prerequisites, multi-platform runner variants, Windows-specific edge cases, KEDA cold-start, OIDC subject format detection, disabled subscriptions, verification commands, and idempotency on re-run are all explicitly addressed. Almost no gaps.
trigger_precision  ████░  USE FOR / DO NOT USE FOR are clearly stated in both the frontmatter and the 'When to Use' section. The boundary between first-time onboarding and updating an already-onboarded repo (e.g., adding a new environment) is slightly underspecified — a concrete example or a 're-onboard' pattern would sharpen this edge.
scope_coverage     █████  Scope is tightly and explicitly defined: the numbered 'What It Configures' list, optional vs. required steps, scaffold skip-with-notice behavior, and the explicit prohibition on git commits all establish clear capability boundaries. Nothing is left implicit.
anti_patterns      ████░  Avoids vague instructions, conflicting directives, and missing error handling extremely well. The one minor issue is that 'Command Playbook' and 'Suggested Agent Flow' partially duplicate each other, which could cause an agent to execute steps twice on a re-run; consolidating them or adding a clearer cross-reference would eliminate this risk.
────────────────────────────────────────────
Overall: 4.6/5.0

This is a high-quality, production-grade skill document. It excels at completeness and scope definition, with exceptional clarity through consistent conventions (invariants, first-turn gate, safe-execution rules). The only actionable improvements are sharpening the re-onboarding trigger boundary and deduplicating the playbook/agent-flow overlap to reduce the risk of double-execution on retry.

✅ Check (compliance summary) (67 lines — click to expand)

ℹ️ waza check expects eval.yaml colocated with SKILL.md. This repo separates them into .github/evals/git-ape-onboarding/eval.yaml, so the "Evaluation Suite: Not Found" line below is a false negative — the eval actually ran (see the Score section above).

🔍 Skill Readiness Check
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Skill: git-ape-onboarding

📋 Compliance Score: Medium-High
   ⚠️  Good, but could be improved. Missing routing clarity.

   Issues found:
   ❌  SKILL.md is 10428 tokens (hard limit 500)

📐 Spec Compliance: 9/9 checks passed
   ✅  Meets agentskills.io specification.

📎 Links: 7/16 valid
   ⚠️  9 link issue(s) found.
   ❌  [templates/copilot-instructions.md] → .github/skills/azure-stack-deploy/SKILL.md: target does not exist
   ❌  [templates/copilot-instructions.md] → website/docs/deployment/state.md: target does not exist
   ❌  [templates/copilot-instructions.md] → .github/skills/azure-stack-destroy/SKILL.md: target does not exist
   ⚠️  [templates/runners/README.md] → ./aci: target is a directory, not a file
   ⚠️  [templates/runners/README.md] → ./aca: target is a directory, not a file
   ⚠️  [templates/runners/README.md] → ./aks: target is a directory, not a file
   ⚠️  [templates/runners/README.md] → ./aci: target is a directory, not a file
   ⚠️  [templates/runners/README.md] → ./aca: target is a directory, not a file
   ⚠️  [templates/runners/README.md] → ./aks: target is a directory, not a file

📊 Token Budget: 10428 / 500 tokens
   ❌  Exceeds limit by 9928 tokens. Consider reducing content.

🧪 Evaluation Suite: Found
   ✅  eval.yaml detected. Run 'waza run eval.yaml' to test.

📐 Schema Validation: Passed
   ✅  eval.yaml schema valid
   ✅  5 task file(s) validated

💡 Advisory Checks
   ✅  [module-count] Found 0 reference module(s)
   ❌  [complexity] Complexity: comprehensive (10428 tokens, 0 modules)
   ❌  [negative-delta-risk] Negative delta risk patterns detected: excessive constraints (39 constraint keywords found)
   ✅  [procedural-content] Description contains procedural language
   ❌  [over-specificity] Over-specificity detected: absolute Unix paths, IP addresses, hardcoded URLs with paths
   ❌  [cross-model-density] Advisory 16: word count is 61 (>60 may reduce cross-model effectiveness); first sentence doesn't lead with action verb (reduces clarity)
   ❌  [body-structure] Advisory 17: body structure quality — no examples section found
   ❌  [progressive-disclosure] Advisory 18: progressive disclosure — SKILL.md body is 845 lines (>500 lines reduces scannability; consider moving detail to references/)
   ✅  [scope-reduction] Capability scope: 13 signal(s) detected (10 level-2 heading(s), 13 numbered procedure(s))

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📈 Overall Readiness
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⚠️  Your skill needs some work before submission.

🎯 Next Steps
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

To improve your skill:

1. Add routing clarity (e.g., **UTILITY SKILL**, INVOKES:, FOR SINGLE OPERATIONS:)
2. Run 'waza dev' for interactive compliance improvement
3. Fix 3 broken link(s) — targets do not exist
4. Fix 6 link(s) pointing to directories instead of files
5. Reduce SKILL.md by 9928 tokens. Run 'waza tokens suggest' for optimization tips

- Add Dockerfile based on ghcr.io/actions/runner (GitHub official) with az, gh, jq - Update ACA/ACI templates: default image warning about missing tools - Document KEDA cold-start workaround (minExecutions=1) - Add Known Gotchas: missing tools, KEDA cold start, stale workflow files - Expand Step 11 with full provisioning flow (ACR + image + registry creds) - Update mermaid diagram to include ACR/image build step - Remove all references to community image myoung34/github-runner Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Replace 'VNet-injected' terminology with 'hosted compute networking' (GitHub's official term for GitHub-managed runners in Azure VNet) - Add full Step 11a playbook: Azure networking → GitHub.Network resource → network config (using GitHubId tag) → runner group → hosted runner - Consolidate all required GitHub token scopes into single upfront auth call (admin:org, admin:enterprise, manage_runners:org, read:enterprise, write:network_configurations) to avoid repeated device-code prompts - Ask org vs enterprise scope upfront (determines businessId and API paths) - Add 4 new Known Gotchas documenting non-obvious API behaviors: - network_settings_ids expects GitHubId tag, not Azure resource ID - businessId is immutable and scope-specific - Repeated auth prompts from missing scopes - Image/size IDs are numeric/GitHub-specific - Restructure README: Option 1 (hosted compute) vs Option 2 (self-hosted) - Keep self-hosted (ACI/ACA/AKS) as Step 11b for custom image scenarios Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

# Conflicts: # .github/skills/git-ape-onboarding/SKILL.md # website/docs/skills/git-ape-onboarding.md

The merge that combined the drift-detector (Step 10) and runner (Step 12) features renumbered the SKILL.md playbook, but the agent.md cross-references were left pointing at the pre-merge numbers. Compliance is now Step 11 (was Step 10) and runner selection is Step 12 (was Step 11). Regenerated docs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The ACI/ACA templates drive registration through the env-var contract of a containerized runner (ACCESS_TOKEN, REPO_URL/ORG_NAME, RUNNER_SCOPE, LABELS, EPHEMERAL, ...) with no command override, but the image they pointed at — ghcr.io/actions/runner:latest — does not exist (404 on ghcr) and the official runner image ships no registration entrypoint. Result: runners never came online on ACI/ACA. Fixes, keeping everything on GitHub-official images (no third-party base): - Add entrypoint.sh that exchanges the PAT/App token for a registration token, configures an ephemeral runner, and deregisters on shutdown. Honors the same env-var contract the templates already set. - Dockerfile: base on the real ghcr.io/actions/actions-runner:latest, install curl/ca-certificates for the entrypoint, COPY + wire ENTRYPOINT. - ACI/ACA templates: correct runnerImage default to actions-runner and clarify that the custom image is required (tools + registration). - AKS (consistency): values.yaml now uses the custom image with imagePullSecrets (stock image lacks az/gh/jq); README documents the ACR build + pull secret. ARC overrides the command, so the baked entrypoint is unused on AKS. - Update README/SKILL image references and regenerate docs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…rkarounds - ACA template: add acrServer param + identity-based registry auth (no admin creds) - Dockerfile: add sed CRLF strip after COPY entrypoint.sh - README: rewrite build/pull sections for cloud build + managed identity - SKILL.md Step 12b: managed identity flow, --no-logs on Windows - SKILL.md: 3 new Known Gotchas (CRLF, az acr build Windows crash, ACA env delay) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-17T10:37:24Z

⚠️ Documentation Staleness Warning

Source files (agents, skills, workflows, or config) changed in this PR, but the generated documentation is out of date.

Changed docs that need regeneration:

website/docs/skills/git-ape-onboarding.md

To fix: Run the following command and commit the results:

node scripts/generate-docs.js

This is an advisory check — it does not block the PR.

…-hosted runners Registration tokens from the GitHub API expire in ~1 hour, causing KEDA polling and ephemeral runner registration to fail with 401. The agent now asks the user for a long-lived PAT before deploying ACA/ACI runners. - Add Step 4 (Collect GitHub PAT) to SKILL.md Step 12b - Add Known Gotcha documenting registration token failure mode - Update parameter comments and descriptions in ACA/ACI templates - Update Suggested Agent Flow to mention PAT collection Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot mentioned this pull request Jun 17, 2026

[repo-status] Daily Status — June 17, 2026 #190

Closed

arnaudlh added this to the v0.4.0 milestone Jun 17, 2026

arnaudlh and others added 6 commits June 17, 2026 14:40

Merge remote-tracking branch 'origin/main' into pr-179-resolve

b89685e

# Conflicts: # .github/skills/git-ape-onboarding/SKILL.md # website/docs/skills/git-ape-onboarding.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add private runner support to git-ape-onboarding#179

Add private runner support to git-ape-onboarding#179
arnaudlh wants to merge 8 commits into
mainfrom
arnaudlh/private-runners-onboarding

arnaudlh commented Jun 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 15, 2026 •

edited

Loading

🧪 Waza Eval Results

Task Results

🧪 Waza Eval Results

Task Results

🧪 Waza Eval Results

Task Results

Failed Task Details

Positive — First-time repo setup

Positive — Multi-environment onboarding

Positive — Onboard with private VNet-injected runner

🧪 Waza Eval Results

Task Results

Failed Task Details

Positive — First-time repo setup

Positive — Multi-environment onboarding

Uh oh!

github-actions Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arnaudlh commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What & why

Bootstrap model: start public, switch to private later

Changes

Security

Validation

Uh oh!

github-actions Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Waza agent evals (advisory)

Uh oh!

github-actions Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 Waza skill evals (advisory)

Skill: git-ape-onboarding

🧪 Waza Eval Results

Task Results

🧪 Waza Eval Results

Task Results

🧪 Waza Eval Results

Task Results

Failed Task Details

Positive — First-time repo setup

Positive — Multi-environment onboarding

Positive — Onboard with private VNet-injected runner

🧪 Waza Eval Results

Task Results

Failed Task Details

Positive — First-time repo setup

Positive — Multi-environment onboarding

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Documentation Staleness Warning

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arnaudlh commented Jun 15, 2026 •

edited

Loading

github-actions Bot commented Jun 15, 2026 •

edited

Loading

github-actions Bot commented Jun 15, 2026 •

edited

Loading

Skill: `git-ape-onboarding`

github-actions Bot commented Jun 17, 2026 •

edited

Loading