Skip to content

Add optional drift-detector onboarding to git-ape-onboarding#188

Merged
arnaudlh merged 3 commits into
mainfrom
arnaudlh/onboard-drift-detector
Jun 16, 2026
Merged

Add optional drift-detector onboarding to git-ape-onboarding#188
arnaudlh merged 3 commits into
mainfrom
arnaudlh/onboard-drift-detector

Conversation

@arnaudlh

@arnaudlh arnaudlh commented Jun 15, 2026

Copy link
Copy Markdown
Member

Closes #187

Summary

The /git-ape-onboarding skill scaffolds the drift-detection workflow (git-ape-drift.lock.yml) but never provisions the credential it needs, so the workflow fails its first preflight gate:

❌ Error: None of the following secrets are set: COPILOT_GITHUB_TOKEN

git-ape-drift is a gh-aw GitHub Agentic Workflow running on the Copilot engine; its compiled .lock.yml has a hard "Validate COPILOT_GITHUB_TOKEN secret" gate with no fallback to GITHUB_TOKEN. This PR makes onboarding optionally provision that token, as a skippable step.

Changes

Source

  • SKILL.md — new optional Step 10: Onboard the drift detector workflow (provision COPILOT_GITHUB_TOKEN as a repository secret, PAT with an active Copilot seat, gated on asking the user first). Renumbered compliance→11 / verify→12. Added to "What It Configures", Suggested Agent Flow, and Verification Commands.
  • templates/workflows/git-ape-verify.yml — non-blocking COPILOT_GITHUB_TOKEN check: emits a ::warning:: if absent but never counts toward MISSING, so plan/deploy/destroy verification still passes when drift detection isn't enabled.

Docs

  • website/docs/getting-started/onboarding.md — hand-written (Optional) Enable drift detection subsection with a smoke test.
  • website/docs/skills/git-ape-onboarding.md, website/docs/skills/overview.md + website/docs/workflows/git-ape-verify.md — regenerated via node scripts/generate-docs.js to mirror the sources.

Additional scope beyond #187 — onboarding agent-flow hardening (commit aff5c52)

Disclosing per review Finding 2. Separately from the token work, commit aff5c52 also hardens the behavioral quality of the git-ape-onboarding skill. These are isolated in their own commit and gated by the eval suite:

  1. Frontmatter description rewrite — adds explicit USE FOR: / DO NOT USE FOR: trigger phrases so the agent router fires the skill on the right intents (first-time setup, multi-env onboarding) and not on adjacent ones (re-deploy, secret rotation, drift-detection alone).
  2. Safe-Execution rule README: add a monkey emoji 🐒 #8 — idempotency on re-run — on re-invocation after a partial failure, resume from the last failing step and surface already-provisioned resources as ⊝ Already exists instead of creating duplicate Entra apps / federated credentials / role assignments.
  3. "Suggested Agent Flow" rewrite — adds a first-turn gated handoff (must surface prereq results + collect inputs, not a walkthrough), enumerates the 5 mandatory inputs (repo URL, subscription ID, RBAC role, mode, branch), and makes the prereq check a hard gate.

Why keep them here: they are framework-aligned trigger-precision + safe-execution improvements to the same skill this PR already edits, and they move the onboarding eval 0.68 → 0.70 (+0.016) overall, +0.064 on first-time repo setup, green across all 4 models. Per reviewer's offer ("disclose or split"), keeping them in-PR and disclosing here.

Review follow-ups

  • Finding 1 (stale doc mirror) — fixed. Re-ran node scripts/generate-docs.js; committed only the two onboarding pages (git-ape-onboarding.md, overview.md) that the final SKILL.md requires. The unrelated gh-aw v0.78.3 → v0.79.8 lock-doc churn (daily-repo-status, issue-triage) is left out, keeping PR scope clean.
  • Finding 2 (undisclosed scope) — disclosed in the section above.

Validation

  • actionlint passes clean on the edited git-ape-verify.yml template.
  • Docs regenerated with the repo generator; trailing-newline state of SKILL.md preserved.
  • git-ape-onboarding evals green across all 4 models.

Note on scope

The generator also re-touches two unrelated lock docs (daily-repo-status-lock.md, issue-triage-agent-lock.md) — pre-existing staleness (lock files already at gh-aw setup v0.79.8, committed docs at v0.78.3). Those are reverted to keep this PR focused. The Docs Check workflow is advisory/non-blocking and may flag them; happy to fold those regenerations in if preferred.

Onboarding now optionally provisions COPILOT_GITHUB_TOKEN so the agentic
drift-detection workflow (git-ape-drift.lock.yml) can run. That workflow
runs on the Copilot engine and fails its preflight gate without this
token, with no fallback to the built-in GITHUB_TOKEN.

- SKILL.md: optional Step 10 to provision the token (gated on user
  consent), renumbered compliance/verify steps, and updated the config
  list, suggested agent flow, and verification commands
- git-ape-verify.yml: non-blocking warning when the token is absent
  (emits ::warning:: but never counts toward MISSING)
- docs: optional "Enable drift detection" subsection in the
  getting-started guide; regenerated skill + verify workflow doc pages

Tracks #187

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@arnaudlh arnaudlh added enhancement New feature or request agentic-workflows labels Jun 15, 2026
@arnaudlh arnaudlh requested a review from sendtoshailesh June 15, 2026 09:26
@arnaudlh arnaudlh added this to the v0.3.0 milestone Jun 15, 2026
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

⚠️ Documentation Staleness Warning

Source files (agents, skills, workflows, or config) changed in this PR, but the generated documentation is out of date.

Changed docs that need regeneration:

  • website/docs/workflows/daily-repo-status-lock.md
  • website/docs/workflows/issue-triage-agent-lock.md

To fix: Run the following command and commit the results:

node scripts/generate-docs.js

This is an advisory check — it does not block the PR.

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

🧪 Waza skill evals (advisory)

🎯 Diff-scoped run. diff-scoped: 1 changed skill(s) — git-ape-onboarding Touch .waza.yaml or trigger workflow_dispatch to run the full matrix.

Ran 4 matrix legs in parallel (skills × models). Results are non-blocking — investigate failures via the workflow logs and the per-leg waza-results-* artifacts.

Legend: Models flagged baseline: true in .github/evals/manifest.yaml (currently: gpt-5.4) run with --baseline (A/B mode) to cap quota. All other models run standard. Judge model is fixed at claude-opus-4.7 across all legs.

📊 Token comparison vs main (advisory)
{
  "baseRef": "main",
  "headRef": "WORKING",
  "threshold": 10,
  "passed": true,
  "timestamp": "2026-06-16T09:23:50.419527552Z",
  "summary": {
    "totalBefore": 0,
    "totalAfter": 38094,
    "totalDiff": 38094,
    "percentChange": 100,
    "filesAdded": 15,
    "filesRemoved": 0,
    "filesModified": 0,
    "filesIncreased": 15,
    "filesDecreased": 0
  },
  "files": [
    {
      "file": ".github/skills/azure-cost-estimator/SKILL.md",
      "before": null,
      "after": {
        "tokens": 3231,
        "characters": 11940,
        "lines": 345
      },
      "diff": 3231,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-deployment-preflight/SKILL.md",
      "before": null,
      "after": {
        "tokens": 1448,
        "characters": 6281,
        "lines": 212
      },
      "diff": 1448,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-drift-detector/SKILL.md",
      "before": null,
      "after": {
        "tokens": 3179,
        "characters": 13149,
        "lines": 460
      },
      "diff": 3179,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-integration-tester/SKILL.md",
      "before": null,
      "after": {
        "tokens": 1563,
        "characters": 6807,
        "lines": 248
      },
      "diff": 1563,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-naming-research/SKILL.md",
      "before": null,
      "after": {
        "tokens": 486,
        "characters": 2108,
        "lines": 44
      },
      "diff": 486,
      "percentChange": 100,
      "status": "added",
      "limit": 500
    },
    {
      "file": ".github/skills/azure-policy-advisor/SKILL.md",
      "before": null,
      "after": {
        "tokens": 4751,
        "characters": 21485,
        "lines": 368
      },
      "diff": 4751,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-resource-availability/SKILL.md",
      "before": null,
      "after": {
        "tokens": 2413,
        "characters": 9881,
        "lines": 308
      },
      "diff": 2413,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-resource-visualizer/SKILL.md",
      "before": null,
      "after": {
        "tokens": 1494,
        "characters": 6179,
        "lines": 192
      },
      "diff": 1494,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-rest-api-reference/SKILL.md",
      "before": null,
      "after": {
        "tokens": 1831,
        "characters": 8430,
        "lines": 200
      },
      "diff": 1831,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-role-selector/SKILL.md",
      "before": null,
      "after": {
        "tokens": 1280,
        "characters": 5641,
        "lines": 162
      },
      "diff": 1280,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-security-analyzer/SKILL.md",
      "before": null,
      "after": {
        "tokens": 5326,
        "characters": 21419,
        "lines": 451
      },
      "diff": 5326,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-stack-deploy/SKILL.md",
      "before": null,
      "after": {
        "tokens": 1912,
        "characters": 7525,
        "lines": 159
      },
      "diff": 1912,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/azure-stack-destroy/SKILL.md",
      "before": null,
      "after": {
        "tokens": 2644,
        "characters": 10670,
        "lines": 180
      },
      "diff": 2644,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/git-ape-onboarding/SKILL.md",
      "before": null,
      "after": {
        "tokens": 4396,
        "characters": 17979,
        "lines": 338
      },
      "diff": 4396,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    },
    {
      "file": ".github/skills/prereq-check/SKILL.md",
      "before": null,
      "after": {
        "tokens": 2140,
        "characters": 8023,
        "lines": 147
      },
      "diff": 2140,
      "percentChange": 100,
      "status": "added",
      "limit": 500,
      "overLimit": true
    }
  ]
}

Skill: git-ape-onboarding

📈 Score (per model) + Suggestions/Recommendations
Model: claude-opus-4.6

Running benchmark: git-ape-onboarding-eval
Skill: git-ape-onboarding
Engine: copilot-sdk
Model: claude-opus-4.6
Judge Model: claude-opus-4.7
Parallel: 4 workers

✓ [1/4] Negative — Storage service comparison (off-topic)
✓ [4/4] Positive — Scaffold honors skip-with-notice on collision
✓ [3/4] Positive — Multi-environment onboarding
✓ [2/4] Positive — First-time repo setup

🧪 Waza Eval Results

Status: ✅ Passed | Score: 0.86 | Duration: 42.5s

  • Tests: 4 total, 4 passed, 0 failed, 0 errors
  • Success Rate: 100.0%
  • Score Range: 0.56 - 1.00 (σ=0.1798)

Task Results

Task Score Status Graders
Negative — Storage service comparison (off-topic) 0.56 budget, trigger_relevance_negative
Positive — First-time repo setup 1.00 answer_quality, budget, trigger_relevance_positive
Positive — Multi-environment onboarding 0.91 answer_quality, budget, trigger_relevance_positive
Positive — Scaffold honors skip-with-notice on collision 0.98 answer_quality, budget, trigger_relevance_positive

Benchmark: git-ape-onboarding-eval | Skill: git-ape-onboarding | Model: claude-opus-4.6

Results saved to: .waza-results/git-ape-onboarding-claude-opus-4.6.json
JUnit XML saved to: .waza-results/git-ape-onboarding-claude-opus-4.6.junit.xml

Model: claude-sonnet-4.6

Running benchmark: git-ape-onboarding-eval
Skill: git-ape-onboarding
Engine: copilot-sdk
Model: claude-sonnet-4.6
Judge Model: claude-opus-4.7
Parallel: 4 workers

✓ [1/4] Negative — Storage service comparison (off-topic)
✓ [4/4] Positive — Scaffold honors skip-with-notice on collision
✓ [3/4] Positive — Multi-environment onboarding
✓ [2/4] Positive — First-time repo setup

🧪 Waza Eval Results

Status: ✅ Passed | Score: 0.86 | Duration: 1m24.144s

  • Tests: 4 total, 4 passed, 0 failed, 0 errors
  • Success Rate: 100.0%
  • Score Range: 0.56 - 1.00 (σ=0.1798)

Task Results

Task Score Status Graders
Negative — Storage service comparison (off-topic) 0.56 budget, trigger_relevance_negative
Positive — First-time repo setup 1.00 answer_quality, budget, trigger_relevance_positive
Positive — Multi-environment onboarding 0.91 answer_quality, budget, trigger_relevance_positive
Positive — Scaffold honors skip-with-notice on collision 0.98 answer_quality, budget, trigger_relevance_positive

Benchmark: git-ape-onboarding-eval | Skill: git-ape-onboarding | Model: claude-sonnet-4.6

Results saved to: .waza-results/git-ape-onboarding-claude-sonnet-4.6.json
JUnit XML saved to: .waza-results/git-ape-onboarding-claude-sonnet-4.6.junit.xml

Model: gpt-5.3-codex

Running benchmark: git-ape-onboarding-eval
Skill: git-ape-onboarding
Engine: copilot-sdk
Model: gpt-5.3-codex
Judge Model: claude-opus-4.7
Parallel: 4 workers

✓ [1/4] Negative — Storage service comparison (off-topic)
✓ [4/4] Positive — Scaffold honors skip-with-notice on collision
✓ [3/4] Positive — Multi-environment onboarding
✗ [2/4] Positive — First-time repo setup

🧪 Waza Eval Results

Status: ❌ Failed | Score: 0.78 | Duration: 49.603s

  • Tests: 4 total, 3 passed, 1 failed, 0 errors
  • Success Rate: 75.0%
  • Score Range: 0.56 - 0.98 (σ=0.1736)

Task Results

Task Score Status Graders
Negative — Storage service comparison (off-topic) 0.56 budget, trigger_relevance_negative
Positive — First-time repo setup 0.67 answer_quality, budget, trigger_relevance_positive
Positive — Multi-environment onboarding 0.91 answer_quality, budget, trigger_relevance_positive
Positive — Scaffold honors skip-with-notice on collision 0.98 answer_quality, budget, trigger_relevance_positive

Failed Task Details

Positive — First-time repo setup

Run 1/1 (failed):

  • answer_quality (0.00): fail: : Criterion 1 not met: the agent did not present actual prereq check results (tool versions, auth status). Shell execution failed with "unexpected user permission response" and the agent surfaced a blank checklist for the user to fill in manually rather than inspected versions/auth states. Criteria 2 (gate surfaced as "blocked at the prereq gate"), 3 (5 inputs requested in numbered list), and 4 (no false claims of configuration) are satisfied.
  • budget (1.00): All behavior checks passed
  • trigger_relevance_positive (1.00): Prompt is trigger-aligned (score 1.00 >= 0.50)

Benchmark: git-ape-onboarding-eval | Skill: git-ape-onboarding | Model: gpt-5.3-codex

Results saved to: .waza-results/git-ape-onboarding-gpt-5.3-codex.json

Model: gpt-5.4 *(baseline — A/B mode)*

Running benchmark: git-ape-onboarding-eval
Skill: git-ape-onboarding
Engine: copilot-sdk
Model: gpt-5.4
Judge Model: claude-opus-4.7
Parallel: 4 workers

════════════════════════════════════════════════════════════════
PASS 1: Skills-Enabled Run
════════════════════════════════════════════════════════════════
✓ [1/4] Negative — Storage service comparison (off-topic)
✓ [4/4] Positive — Scaffold honors skip-with-notice on collision
✓ [2/4] Positive — First-time repo setup
✓ [3/4] Positive — Multi-environment onboarding

════════════════════════════════════════════════════════════════
PASS 2: Skills Baseline (skills stripped)
════════════════════════════════════════════════════════════════
✓ [1/4] Negative — Storage service comparison (off-topic)
✗ [4/4] Positive — Scaffold honors skip-with-notice on collision
✗ [3/4] Positive — Multi-environment onboarding
✗ [2/4] Positive — First-time repo setup

════════════════════════════════════════════════════════════════
SKILL IMPACT ANALYSIS
════════════════════════════════════════════════════════════════
Overall Performance Delta:
With Skills: 100.0% (4/4 tasks passed)
Without Skills: 25.0% (1/4 tasks passed)
Impact: +75.0 percentage points

Per-Task Breakdown:
• Negative — Storage service comparison (off-topic) [NEUTRAL] 100% → 100% (+0pp)
• Positive — First-time repo setup [IMPROVED] 0% → 100% (+100pp)
• Positive — Multi-environment onboarding [IMPROVED] 0% → 100% (+100pp)
• Positive — Scaffold honors skip-with-notice on collision [IMPROVED] 0% → 100% (+100pp)

Verdict: Skills have POSITIVE IMPACT (improved 3/4 tasks)
════════════════════════════════════════════════════════════════

🧪 Waza Eval Results

Status: ✅ Passed | Score: 0.86 | Duration: 41.489s

  • Tests: 4 total, 4 passed, 0 failed, 0 errors
  • Success Rate: 100.0%
  • Score Range: 0.56 - 1.00 (σ=0.1798)

Task Results

Task Score Status Graders
Negative — Storage service comparison (off-topic) 0.56 budget, trigger_relevance_negative
Positive — First-time repo setup 1.00 answer_quality, budget, trigger_relevance_positive
Positive — Multi-environment onboarding 0.91 answer_quality, budget, trigger_relevance_positive
Positive — Scaffold honors skip-with-notice on collision 0.98 answer_quality, budget, trigger_relevance_positive

Benchmark: git-ape-onboarding-eval | Skill: git-ape-onboarding | Model: gpt-5.4

Results saved to: .waza-results/git-ape-onboarding-gpt-5.4.json
JUnit XML saved to: .waza-results/git-ape-onboarding-gpt-5.4.junit.xml

🔢 Tokens (count + profile)

📊 git-ape-onboarding: 4,396 tokens (detailed ✓), 18 sections, 18 code blocks
   ⚠️  token count 4396 exceeds 3000

🎯 Quality (5-dim table)

DIMENSION          SCORE  FEEDBACK
────────────────────────────────────────────
clarity            █████  Instructions are exceptionally well-ordered with numbered steps, named invariants, a canonical command playbook, and clear visual markers (✓/⊝). The 'first-turn rule' in Suggested Agent Flow eliminates ambiguity about when the agent may proceed.
completeness       █████  Covers prereq validation, single/multi-env modes, OIDC subject format variance, idempotency on re-run, optional drift detector, compliance preferences, safe-execution rules, verification commands, and two known gotchas with remediation steps. Edge cases (disabled subscriptions, org OIDC overrides, non-main branches, partial failures) are all explicitly addressed.
trigger_precision  ████░  USE FOR and DO NOT USE FOR triggers in the description and body are specific and non-overlapping. Minor gap: 'rotating or updating an existing secret or federated credential' is mentioned as out-of-scope but no alternative skill is named for that path, leaving routing slightly incomplete.
scope_coverage     █████  Scope is tightly defined — bootstrapping only, not deploying. Capabilities list in 'What It Configures' is exhaustive and numbered. Limitations are explicit (no overwrite, no git operations, no deployment). The boundary between this skill and git-ape is clearly articulated.
anti_patterns      ████░  Avoids vague instructions, conflicting directives, and missing error handling. The hard gate on prereq-check before collecting inputs is excellent. Minor: Step 11's conditional logic for copilot-instructions.md (exists/doesn't exist/exists-but-no-section) is slightly complex and could benefit from a decision table, but the prose handles it correctly.
────────────────────────────────────────────
Overall: 4.6/5.0

Exceptionally well-structured skill with strong invariants, idempotency guarantees, explicit edge case handling, and a clear agent execution model. The hard prereq gate, canonical command playbook, and dual-shell scaffold scripts demonstrate production-grade thinking. Minor improvements: name an alternative skill for secret rotation, and consider a decision table for the copilot-instructions.md update logic to reduce cognitive load.
✅ Check (compliance summary) (60 lines — click to expand)

ℹ️ waza check expects eval.yaml colocated with SKILL.md. This repo separates them into .github/evals/git-ape-onboarding/eval.yaml, so the "Evaluation Suite: Not Found" line below is a false negative — the eval actually ran (see the Score section above).

🔍 Skill Readiness Check
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Skill: git-ape-onboarding

📋 Compliance Score: Medium-High
   ⚠️  Good, but could be improved. Missing routing clarity.

   Issues found:
   ❌  SKILL.md is 4396 tokens (hard limit 500)

📐 Spec Compliance: 9/9 checks passed
   ✅  Meets agentskills.io specification.

📎 Links: 3/6 valid
   ⚠️  3 link issue(s) found.
   ❌  [templates/copilot-instructions.md] → .github/skills/azure-stack-deploy/SKILL.md: target does not exist
   ❌  [templates/copilot-instructions.md] → website/docs/deployment/state.md: target does not exist
   ❌  [templates/copilot-instructions.md] → .github/skills/azure-stack-destroy/SKILL.md: target does not exist

📊 Token Budget: 4396 / 500 tokens
   ❌  Exceeds limit by 3896 tokens. Consider reducing content.

🧪 Evaluation Suite: Found
   ✅  eval.yaml detected. Run 'waza run eval.yaml' to test.

📐 Schema Validation: Passed
   ✅  eval.yaml schema valid
   ✅  4 task file(s) validated

💡 Advisory Checks
   ✅  [module-count] Found 0 reference module(s)
   ❌  [complexity] Complexity: comprehensive (4396 tokens, 0 modules)
   ❌  [negative-delta-risk] Negative delta risk patterns detected: excessive constraints (17 constraint keywords found)
   ✅  [procedural-content] Description contains procedural language
   ✅  [over-specificity] No over-specificity patterns detected
   ❌  [cross-model-density] Advisory 16: word count is 61 (>60 may reduce cross-model effectiveness); first sentence doesn't lead with action verb (reduces clarity)
   ❌  [body-structure] Advisory 17: body structure quality — no examples section found; no error handling or troubleshooting section found
   ✅  [progressive-disclosure] Content structure supports progressive disclosure
   ✅  [scope-reduction] Capability scope: 10 signal(s) detected (10 level-2 heading(s), 6 numbered procedure(s))

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📈 Overall Readiness
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⚠️  Your skill needs some work before submission.

🎯 Next Steps
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

To improve your skill:

1. Add routing clarity (e.g., **UTILITY SKILL**, INVOKES:, FOR SINGLE OPERATIONS:)
2. Run 'waza dev' for interactive compliance improvement
3. Fix 3 broken link(s) — targets do not exist
4. Reduce SKILL.md by 3896 tokens. Run 'waza tokens suggest' for optimization tips

…d trigger precision

- Add first-turn rule: response must be gated handoff (prereq + inputs), not a walkthrough
- Strengthen Step 1: surface full results table; explicit checklist fallback when CLI unavailable; hard gate before advancing
- Strengthen Step 2: enumerate all 5 required inputs (repo URL, subscription ID, RBAC role, mode, branch)
- Add DO NOT USE FOR to When to Use (re-deploy, secret rotation, drift-detection alone)
- Update frontmatter description with USE FOR / DO NOT USE FOR trigger phrases
- Add Safe-Execution Rule 8: idempotency on re-run (surface existing resources as ⊝ Already exists)

Eval delta: 0.68 → 0.70 (+0.016); first-time repo setup +0.064

@sendtoshailesh sendtoshailesh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: solid core change, two things to address before merge

Thanks for this — the core feature is correct and maps cleanly to #187.

✅ What's right

  • git-ape-verify.yml adds the COPILOT_GITHUB_TOKEN check as optional + non-blocking (::warning::, never increments MISSING) — plan/deploy/destroy/verify stay green without it. actionlint-clean.
  • SKILL.md Step 10 is well-designed: repo secret, gated on asking the user, PAT-with-Copilot-seat requirement, never echo the token.
  • Evals green across all 4 models.
  • The two unrelated lock-doc touches (daily-repo-status, issue-triage) are pure gh-aw v0.78.3 → v0.79.8 version noise — pre-existing, correctly excluded, and disclosed. 👍

⚠️ Finding 1 — stale doc mirror (please fix before merge)

The committed docs are out of sync with the final SKILL.md. Running node scripts/generate-docs.js produces a deterministic diff on two tracked files:

  • website/docs/skills/git-ape-onboarding.md
  • website/docs/skills/overview.md

Root cause: the docs were regenerated after the Step-10 drift edits (those are present), but not after the later SKILL.md edits — First-turn rule, Collect the required inputs, and Idempotency on re-run are absent from the published doc page. So the public skill page misrepresents the skill's description and agent flow. check-docs is advisory/non-blocking, so it won't gate this.

Fix: re-run the generator and commit only those two onboarding files (leave the unrelated lock docs out, as you already did).

⚠️ Finding 2 — undisclosed scope beyond #187 (disclose or split)

Beyond the drift-detector work, the SKILL.md diff also silently includes:

  1. A full frontmatter description rewrite (USE FOR / DO NOT USE FOR clauses).
  2. A new Safe-Execution rule #8 — idempotency on re-run.
  3. A near-total "Suggested Agent Flow" rewrite — first-turn gated handoff, 5 mandatory inputs, hard gates.

None of these are in #187 (scoped to the token) or the PR description. They look like reasonable, framework-aligned improvements and the evals pass — but they materially change agent behavior and are hard to spot under a PR titled "add drift-detector onboarding." Could you either document + justify them in the PR body, or split them into their own PR? (Note: these are the same edits missing from the regenerated docs in Finding 1 — which is what flagged that they were added after the last doc regen.)

Verdict: core feature is mergeable and correct; neither finding mechanically blocks (mergeable + advisory check), but both matter for doc accuracy and clean history. Happy to re-review once the docs are regenerated.

Re-runs node scripts/generate-docs.js so the published skill pages mirror
the final SKILL.md. The earlier doc regen captured only the Step-10 drift
edits; the later SKILL.md edits (frontmatter description rewrite, the DO
NOT USE FOR block, Safe-Execution rule 8 idempotency, and the Suggested
Agent Flow first-turn rule + required-inputs rewrite) were missing from
the published pages.

Scoped to the two onboarding files; the unrelated gh-aw v0.78.3 -> v0.79.8
lock-doc churn (daily-repo-status, issue-triage) is left out, matching the
existing PR scope.

Addresses review Finding 1 on #188.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@arnaudlh

Copy link
Copy Markdown
Member Author

@sendtoshailesh thanks for the thorough review — both findings are addressed in f2e8459.

Finding 1 (stale doc mirror) — fixed. Re-ran node scripts/generate-docs.js and committed only the two onboarding pages the final SKILL.md requires:

  • website/docs/skills/git-ape-onboarding.md
  • website/docs/skills/overview.md

They now carry the previously-missing First-turn rule, Collect the required inputs, the DO NOT USE FOR block, Safe-Execution rule #8 (idempotency), and the frontmatter description rewrite. Verified deterministic: a fresh generator run now leaves the onboarding docs untouched — only the unrelated gh-aw v0.78.3 → v0.79.8 lock-doc churn (daily-repo-status, issue-triage) remains, which I again left out to keep scope clean.

Finding 2 (undisclosed scope) — disclosed. Updated the PR body with an "Additional scope beyond #187" section documenting the three behavioral changes in commit aff5c52 (frontmatter USE FOR/DO NOT USE FOR rewrite, Safe-Execution rule #8 idempotency, and the Suggested Agent Flow rewrite) plus rationale — trigger-precision + safe-execution hardening to the same skill this PR edits; eval 0.68 → 0.70 (+0.064 on first-time repo setup), green across all 4 models. Went with disclose-in-PR rather than split since they're scoped to this skill and eval-gated; happy to split into a follow-up PR instead if you'd prefer cleaner history.

Re-requesting review 🙏

@arnaudlh arnaudlh requested a review from sendtoshailesh June 16, 2026 09:24

@sendtoshailesh sendtoshailesh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review @ f2e8459 — both findings resolved ✅

Thanks for the fast turnaround and the clear disclosure.

Finding 1 — stale doc mirror → fixed. Independently verified: a fresh node scripts/generate-docs.js now leaves git-ape-onboarding.md and overview.md untouched (deterministic, zero drift). The previously-missing First-turn rule, Collect the required inputs, DO NOT USE FOR block, Safe-Execution rule #8, and the frontmatter description are all present in the published pages now. The two unrelated gh-aw v0.78.3 → v0.79.8 lock docs remain correctly excluded.

Finding 2 — scope → disclosed. The new "Additional scope beyond #187" section documents all three behavioral changes (commit aff5c52) with rationale and eval impact (0.68 → 0.70, +0.064 on first-time setup). Isolating them in their own commit + disclosing is a reasonable call given they're scoped to this skill and eval-gated — no need to split for me.

Also confirmed: all CI green on f2e8459 (incl. evals across all 4 models, scaffold parity, template↔docs sync); mergeable (the BLOCKED state is just the pending required approval).

One optional, non-blocking note for later (not for this PR): the SKILL.md additions push it further over the framework's ~500-line / ~5k-token guidance — the advisory flags ~3.9k tokens of headroom. Worth considering moving some L1 prose (e.g. the Step-10 token-requirements detail) into an references/ L2 file in a future pass. Purely housekeeping; the eval gain shows the current content is pulling its weight.

LGTM from my side. 👍

@arnaudlh arnaudlh merged commit 04975e4 into main Jun 16, 2026
17 checks passed
@arnaudlh arnaudlh deleted the arnaudlh/onboard-drift-detector branch June 16, 2026 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agentic-workflows enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Onboarding: optionally provision COPILOT_GITHUB_TOKEN for the drift detector workflow

2 participants