Add enterprise distribution mode (.github-private) to git-ape-onboarding#184
Add enterprise distribution mode (.github-private) to git-ape-onboarding#184arnaudlh wants to merge 3 commits into
.github-private) to git-ape-onboarding#184Conversation
Adds a second mode to the git-ape-onboarding skill that distributes Git-Ape
org/enterprise-wide via a `.github-private` repo, so the whole plugin (agents +
skills + azure-mcp) auto-installs on user authentication.
Why the plugin route: Git-Ape bundles agents + skills + an MCP server, so the
`.github-private` agents/ route (agents only) would ship them broken. Distribute
via `.github/copilot/managed-settings.json` (enterprise-managed plugin standards)
instead. Standalone org/enterprise skills are still "coming soon" per GitHub.
Changes:
- templates/github-private/: canonical README.md, .github/copilot/
managed-settings.json (enables git-ape@git-ape), agents/.gitkeep
- scripts/scaffold-enterprise.{sh,ps1}: byte-identical bash/pwsh parity
scaffolder mirroring scaffold-repo conventions (explicit MAPPINGS allow-list,
skip-with-notice, no git ops, Created/Skipped summary)
- git-ape-onboarding-template-check.yml: new scaffold-enterprise-parity-smoke
job + watched paths
- SKILL.md: broadened description, Onboarding Modes overview, enterprise
playbook (CLI steps + UI-only AI-controls/ruleset hand-off), safe-execution
rules, agent flow
- templates/README.md: document new templates + enterprise scaffolder
(scaffold-only, not mirrored) + parity requirement
github-private templates are scaffold-only (never mirrored into this repo).
Azure access remains a separate per-repo onboarding step.
Closes #183
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🧪 Waza skill evals (advisory)
Ran 4 matrix legs in parallel (skills × models). Results are non-blocking — investigate failures via the workflow logs and the per-leg
📊 Token comparison vs
|
| Task | Score | Status | Graders |
|---|---|---|---|
| Negative — Storage service comparison (off-topic) | 0.56 | ✅ | budget, trigger_relevance_negative |
| Positive — First-time repo setup | 1.00 | ✅ | answer_quality, budget, trigger_relevance_positive |
| Positive — Multi-environment onboarding | 0.62 | ❌ | answer_quality, budget, trigger_relevance_positive |
| Positive — Scaffold honors skip-with-notice on collision | 0.98 | ✅ | answer_quality, budget, trigger_relevance_positive |
Failed Task Details
Positive — Multi-environment onboarding
Run 1/1 (failed):
- ❌ answer_quality (0.00): fail: Missing prereq gate and status presentation: Criterion 1 FAIL: No prereq status table or equivalent inspection was presented. The agent attempted to run check-tools.sh and a manual command check, both returned "unexpected user permission response" errors, but the agent did not render any pass/fail table for az/gh/jq/git versions. Criterion 2 FAIL: No explicit prereq gate was surfaced. Instead of stopping at the blocking failure (couldn't verify tools/auth), the agent proceeded directly to give full step-by-step instructions with only a passive "confirm locally" suggestion. The skill rule "Stop at first blocking failure" was violated. Criterion 3 PASS: The agent requested 3 inputs (staging subscription ID, App Registration client ID with reuse implied, target repo). Criterion 4 PASS: Multi-environment awareness is clear — mentions azure-deploy-staging environment name, creating a new federated-credential entry scoped to that environment, and per-environment secrets/RBAC on the staging subscription.
- ✅ budget (1.00): All behavior checks passed
- ✅ trigger_relevance_positive (0.87): Prompt is trigger-aligned (score 0.87 >= 0.50)
Benchmark: git-ape-onboarding-eval | Skill: git-ape-onboarding | Model: claude-opus-4.6
Results saved to: .waza-results/git-ape-onboarding-claude-opus-4.6.json
Model: claude-sonnet-4.6
Running benchmark: git-ape-onboarding-eval
Skill: git-ape-onboarding
Engine: copilot-sdk
Model: claude-sonnet-4.6
Judge Model: claude-opus-4.7
Parallel: 4 workers
✓ [1/4] Negative — Storage service comparison (off-topic)
✓ [4/4] Positive — Scaffold honors skip-with-notice on collision
✓ [3/4] Positive — Multi-environment onboarding
✗ [2/4] Positive — First-time repo setup
🧪 Waza Eval Results
Status: ❌ Failed | Score: 0.79 | Duration: 37.631s
- Tests: 4 total, 3 passed, 1 failed, 0 errors
- Success Rate: 75.0%
- Score Range: 0.56 - 0.98 (σ=0.1829)
Task Results
| Task | Score | Status | Graders |
|---|---|---|---|
| Negative — Storage service comparison (off-topic) | 0.56 | ✅ | budget, trigger_relevance_negative |
| Positive — First-time repo setup | 0.67 | ❌ | answer_quality, budget, trigger_relevance_positive |
| Positive — Multi-environment onboarding | 0.96 | ✅ | answer_quality, budget, trigger_relevance_positive |
| Positive — Scaffold honors skip-with-notice on collision | 0.98 | ✅ | answer_quality, budget, trigger_relevance_positive |
Failed Task Details
Positive — First-time repo setup
Run 1/1 (failed):
- ❌ answer_quality (0.00): fail: Missing criterion 3: Criteria 1, 2, 4 met: agent attempted prereq checks (blocked by environment), surfaced the gate, and did not falsely claim any configuration. However, criterion 3 fails: the agent deferred input collection ("Once you confirm... I'll collect the five required inputs") rather than asking for at least three of the required inputs (repo URL, subscription ID, RBAC role, mode, default branch) in this turn. No numbered list or question block requesting inputs was presented.
- ✅ budget (1.00): All behavior checks passed
- ✅ trigger_relevance_positive (1.00): Prompt is trigger-aligned (score 1.00 >= 0.50)
Benchmark: git-ape-onboarding-eval | Skill: git-ape-onboarding | Model: claude-sonnet-4.6
Results saved to: .waza-results/git-ape-onboarding-claude-sonnet-4.6.json
Model: gpt-5.3-codex
Running benchmark: git-ape-onboarding-eval
Skill: git-ape-onboarding
Engine: copilot-sdk
Model: gpt-5.3-codex
Judge Model: claude-opus-4.7
Parallel: 4 workers
✓ [1/4] Negative — Storage service comparison (off-topic)
✓ [4/4] Positive — Scaffold honors skip-with-notice on collision
✗ [3/4] Positive — Multi-environment onboarding
✓ [2/4] Positive — First-time repo setup
🧪 Waza Eval Results
Status: ❌ Failed | Score: 0.79 | Duration: 52.575s
- Tests: 4 total, 3 passed, 1 failed, 0 errors
- Success Rate: 75.0%
- Score Range: 0.56 - 1.00 (σ=0.2022)
Task Results
| Task | Score | Status | Graders |
|---|---|---|---|
| Negative — Storage service comparison (off-topic) | 0.56 | ✅ | budget, trigger_relevance_negative |
| Positive — First-time repo setup | 1.00 | ✅ | answer_quality, budget, trigger_relevance_positive |
| Positive — Multi-environment onboarding | 0.62 | ❌ | answer_quality, budget, trigger_relevance_positive |
| Positive — Scaffold honors skip-with-notice on collision | 0.98 | ✅ | answer_quality, budget, trigger_relevance_positive |
Failed Task Details
Positive — Multi-environment onboarding
Run 1/1 (failed):
- ❌ answer_quality (0.00): fail: Missing prereq results presentation: Criterion 1 not met: the assistant attempted to run the prereq-check script but the bash tool returned "unexpected user permission response" and the agent did not present any status table or inspection results. Instead it punted by asking the user to paste outputs. Criteria 2 (gate surfaced via "can't execute prereq commands"), 3 (5 numbered inputs requested), and 4 (mentions azure-deploy-staging + per-env subscription mapping) are satisfied.
- ✅ budget (1.00): All behavior checks passed
- ✅ trigger_relevance_positive (0.87): Prompt is trigger-aligned (score 0.87 >= 0.50)
Benchmark: git-ape-onboarding-eval | Skill: git-ape-onboarding | Model: gpt-5.3-codex
Results saved to: .waza-results/git-ape-onboarding-gpt-5.3-codex.json
Model: gpt-5.4 *(baseline — A/B mode)*
Running benchmark: git-ape-onboarding-eval
Skill: git-ape-onboarding
Engine: copilot-sdk
Model: gpt-5.4
Judge Model: claude-opus-4.7
Parallel: 4 workers
════════════════════════════════════════════════════════════════
PASS 1: Skills-Enabled Run
════════════════════════════════════════════════════════════════
✓ [1/4] Negative — Storage service comparison (off-topic)
✓ [4/4] Positive — Scaffold honors skip-with-notice on collision
✗ [2/4] Positive — First-time repo setup
✓ [3/4] Positive — Multi-environment onboarding
════════════════════════════════════════════════════════════════
PASS 2: Skills Baseline (skills stripped)
════════════════════════════════════════════════════════════════
✓ [1/4] Negative — Storage service comparison (off-topic)
✓ [4/4] Positive — Scaffold honors skip-with-notice on collision
✗ [3/4] Positive — Multi-environment onboarding
[ERROR] waiting for session.idle: context deadline exceeded
✗ [2/4] Positive — First-time repo setup
════════════════════════════════════════════════════════════════
SKILL IMPACT ANALYSIS
════════════════════════════════════════════════════════════════
Overall Performance Delta:
With Skills: 75.0% (3/4 tasks passed)
Without Skills: 50.0% (2/4 tasks passed)
Impact: +25.0 percentage points
Per-Task Breakdown:
• Negative — Storage service comparison (off-topic) [NEUTRAL] 100% → 100% (+0pp)
• Positive — First-time repo setup [NEUTRAL] 0% → 0% (+0pp)
• Positive — Multi-environment onboarding [IMPROVED] 0% → 100% (+100pp)
• Positive — Scaffold honors skip-with-notice on collision [NEUTRAL] 100% → 100% (+0pp)
Verdict: Skills have POSITIVE IMPACT (improved 1/4 tasks)
════════════════════════════════════════════════════════════════
🧪 Waza Eval Results
Status: ❌ Failed | Score: 0.79 | Duration: 38.42s
- Tests: 4 total, 3 passed, 1 failed, 0 errors
- Success Rate: 75.0%
- Score Range: 0.56 - 0.98 (σ=0.1829)
Task Results
| Task | Score | Status | Graders |
|---|---|---|---|
| Negative — Storage service comparison (off-topic) | 0.56 | ✅ | budget, trigger_relevance_negative |
| Positive — First-time repo setup | 0.67 | ❌ | answer_quality, budget, trigger_relevance_positive |
| Positive — Multi-environment onboarding | 0.96 | ✅ | answer_quality, budget, trigger_relevance_positive |
| Positive — Scaffold honors skip-with-notice on collision | 0.98 | ✅ | answer_quality, budget, trigger_relevance_positive |
Failed Task Details
Positive — First-time repo setup
Run 1/1 (failed):
- ❌ answer_quality (0.00): fail: Missing criterion 1: no prereq inspection results presented: Criteria 3 and 4 are met (the agent asked for repo URL, subscription ID, RBAC role, mode, and branch; and did not claim any OIDC/RBAC/scaffold work was done). Criterion 2 is arguably surfaced by stating the session couldn't execute the prereq check. However, criterion 1 fails: the agent did NOT present a status table or list of tool versions and auth status. Its bash tool call returned "unexpected user permission response" and instead of retrying or showing partial results, it punted entirely to the user with "please confirm manually." No az/gh/jq/git versions, no auth status rows, no ✅/❌ markers per tool were rendered — so there is no evidence the environment was actually inspected.
- ✅ budget (1.00): All behavior checks passed
- ✅ trigger_relevance_positive (1.00): Prompt is trigger-aligned (score 1.00 >= 0.50)
Benchmark: git-ape-onboarding-eval | Skill: git-ape-onboarding | Model: gpt-5.4
Results saved to: .waza-results/git-ape-onboarding-gpt-5.4.json
JUnit XML saved to: .waza-results/git-ape-onboarding-gpt-5.4.junit.xml
🔢 Tokens (count + profile)
📊 git-ape-onboarding: 6,395 tokens (detailed ✓), 29 sections, 24 code blocks
⚠️ token count 6395 exceeds 3000
🎯 Quality (5-dim table)
DIMENSION SCORE FEEDBACK
────────────────────────────────────────────
clarity ████░ Structure is excellent — numbered playbook, invariants, two-mode separation, and agent flow sections are all well-ordered and easy to follow. Minor deduction: the skill is very long and some users may struggle to locate the right section quickly; a condensed TL;DR or section index at the top would help.
completeness █████ Exceptionally thorough: covers OIDC subject auto-detection, disabled-subscription edge case, idempotency on re-run, cross-platform (bash/PowerShell) parity, optional drift detection with clear opt-in gate, compliance preference capture, and enterprise-mode UI hand-off. Hard to find a missing scenario.
trigger_precision ████░ USE FOR / DO NOT USE FOR are stated in both the frontmatter description and the 'When to Use' section, which is good redundancy. Slight ambiguity: 'drift detection alone' is excluded, yet drift detection is an optional sub-step inside the skill — a user asking only about drift setup might incorrectly invoke or avoid this skill; clarify that the exclusion applies to standalone drift infrastructure provisioning outside onboarding context.
scope_coverage █████ Boundaries are exceptionally explicit: enterprise mode is scoped to tooling-distribution-only (not Azure access), UI-only steps are flagged and handed off rather than faked, and the scaffold helper's skip-with-notice behavior defines exactly what the skill will and won't touch on the filesystem.
anti_patterns ████░ The 'First-turn rule' hard gate, idempotency rule, and Safe-Execution Rules effectively prevent most common agent mistakes (rushing ahead, duplicating resources, printing secrets). One minor gap: the skill references ./templates/ and scaffold scripts as if they exist on disk, but provides no fallback or error path if a user runs the skill outside the Git-Ape plugin context where those scripts are absent.
────────────────────────────────────────────
Overall: 4.4/5.0
A high-quality, production-grade skill document. It stands out for its completeness (edge cases, dual-mode support, cross-platform parity) and anti-pattern prevention (hard gates, idempotency, safe-execution rules). The two areas to improve are: (1) adding a brief navigation index given the document's length, and (2) clarifying the 'drift detection alone' DO NOT USE trigger to avoid routing confusion, and providing a fallback if scaffold scripts are missing from the execution environment.
✅ Check (compliance summary) (62 lines — click to expand)
ℹ️
waza checkexpectseval.yamlcolocated withSKILL.md. This repo separates them into.github/evals/git-ape-onboarding/eval.yaml, so the "Evaluation Suite: Not Found" line below is a false negative — the eval actually ran (see the Score section above).
🔍 Skill Readiness Check
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Skill: git-ape-onboarding
📋 Compliance Score: Medium-High
⚠️ Good, but could be improved. Missing routing clarity.
Issues found:
❌ SKILL.md is 6395 tokens (hard limit 500)
📐 Spec Compliance: 9/9 checks passed
✅ Meets agentskills.io specification.
📎 Links: 11/15 valid
⚠️ 4 link issue(s) found.
❌ [templates/copilot-instructions.md] → .github/skills/azure-stack-deploy/SKILL.md: target does not exist
❌ [templates/copilot-instructions.md] → website/docs/deployment/state.md: target does not exist
❌ [templates/copilot-instructions.md] → .github/skills/azure-stack-destroy/SKILL.md: target does not exist
⚠️ [templates/github-private/README.md] → agents/: target is a directory, not a file
📊 Token Budget: 6395 / 500 tokens
❌ Exceeds limit by 5895 tokens. Consider reducing content.
🧪 Evaluation Suite: Found
✅ eval.yaml detected. Run 'waza run eval.yaml' to test.
📐 Schema Validation: Passed
✅ eval.yaml schema valid
✅ 4 task file(s) validated
💡 Advisory Checks
✅ [module-count] Found 0 reference module(s)
❌ [complexity] Complexity: comprehensive (6395 tokens, 0 modules)
❌ [negative-delta-risk] Negative delta risk patterns detected: excessive constraints (19 constraint keywords found)
✅ [procedural-content] Description contains procedural language
❌ [over-specificity] Over-specificity detected: absolute Windows paths
❌ [cross-model-density] Advisory 16: word count is 79 (>60 may reduce cross-model effectiveness); first sentence doesn't lead with action verb (reduces clarity)
❌ [body-structure] Advisory 17: body structure quality — no examples section found
❌ [progressive-disclosure] Advisory 18: progressive disclosure — SKILL.md body is 504 lines (>500 lines reduces scannability; consider moving detail to references/)
✅ [scope-reduction] Capability scope: 12 signal(s) detected (12 level-2 heading(s), 9 numbered procedure(s))
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📈 Overall Readiness
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ Your skill needs some work before submission.
🎯 Next Steps
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
To improve your skill:
1. Add routing clarity (e.g., **UTILITY SKILL**, INVOKES:, FOR SINGLE OPERATIONS:)
2. Run 'waza dev' for interactive compliance improvement
3. Fix 3 broken link(s) — targets do not exist
4. Fix 1 link(s) pointing to directories instead of files
5. Reduce SKILL.md by 5895 tokens. Run 'waza tokens suggest' for optimization tips
…nterprise-mode # Conflicts: # .github/skills/git-ape-onboarding/SKILL.md
sendtoshailesh
left a comment
There was a problem hiding this comment.
Review: strong feature, one fix before merge (stale generated docs)
Really nice addition — clean mapping to #183, and the invariants are respected. I verified it end-to-end.
✅ What's correct
- Maps cleanly to #183 — every scope item delivered, no undisclosed behavioral changes. (The drift-detector / "First-turn rule" lines that show in the diff are just #188 content sitting below the merge-base; the branch includes #188, so there's no regression —
MERGEABLE.) - The two scaffolders are sound and match
scaffold-repoconventions — explicit MAPPINGS allow-list, skip-with-notice on collision, no git ops, Created/Skipped summary, stderr for errors. Ran the bash side locally → emits the 3 expected files;managed-settings.jsonis valid JSON. Output parity is structurally guaranteed (bothcp/Copy-Itemthe same templates) and your new CI parity job confirms byte-identity.shellcheck --severity=warningclean. - New CI job mirrors the existing parity smoke and adds
jq emptyJSON validation; actionlint-clean; watched paths updated. - No-mirror invariant holds —
github-privateis absent fromsync-templates.{sh,ps1}, not present at repo root, andsync-templates.sh checkstays green. managed-settings.jsonmatches the documented plugin-route schema (Azure/git-apemarketplace +git-ape@git-apeenabled).
SKILL.md grew +173 lines but its generated mirror wasn't regenerated. A deterministic node scripts/generate-docs.js produces drift on 4 PR-owned files:
website/docs/skills/git-ape-onboarding.md(+174 — the whole Mode: Enterprise Distribution content is missing from the published page)website/docs/skills/overview.md(broadened description)website/docs/workflows/overview.md(newscaffold-enterprise-parity-smokejob row)website/docs/workflows/git-ape-onboarding-template-check.md(new job section)
I confirmed the drift is this PR's content (the committed doc already carries #188's content; only the enterprise additions are absent). check-docs is advisory/non-blocking so it won't gate this, but the public skill/workflow pages currently misrepresent the skill.
Fix: re-run
node scripts/generate-docs.jsand commit those 4 files. Leave the 2 unrelated gh-awv0.78.3 → v0.79.8lock docs (daily-repo-status,issue-triage) out, same as you did on #188.
Verdict: correct, well-scoped, and invariant-safe — mergeable except for the doc regen. Happy to re-review once the 4 docs are regenerated. 👍
Re-run `node scripts/generate-docs.js` to sync the generated website docs with the SKILL.md and workflow source changes in this PR: - website/docs/skills/git-ape-onboarding.md — adds the Mode: Enterprise Distribution content and broadened description - website/docs/skills/overview.md — broadened skill description - website/docs/workflows/overview.md — new scaffold-enterprise-parity-smoke job in the jobs column - website/docs/workflows/git-ape-onboarding-template-check.md — new scaffold-enterprise-parity-smoke job section Excludes the 2 unrelated gh-aw setup-action lock docs (daily-repo-status, issue-triage; v0.78.3 -> v0.79.8) that drift from main, keeping this PR scoped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks @sendtoshailesh — addressed Finding 1 (stale generated docs) in 0d459b5. Re-ran
Left the 2 unrelated gh-aw |
Summary
Adds a second mode to the
git-ape-onboardingskill that distributes Git-Ape org/enterprise-wide via a.github-privaterepository, alongside the existing per-repository CI/CD onboarding. With it, the whole Git-Ape plugin (agents + 13 skills +azure-mcp) auto-installs on user authentication — no per-usergh plugin install.Closes #183.
Why the plugin route (not
agents/alone).github-privateexposes two distinct features:agents/AGENT-NAME.md) — agents only..github/copilot/managed-settings.json) — auto-installs whole plugins.Git-Ape bundles agents + skills + an MCP server, so the
agents/-only route would ship the orchestrators broken (missing skills/MCP). This change distributes viamanaged-settings.json:{ "extraKnownMarketplaces": { "git-ape": { "source": { "source": "github", "repo": "Azure/git-ape" } } }, "enabledPlugins": { "git-ape@git-ape": true } }Changes
templates/github-private/— canonicalREADME.md,.github/copilot/managed-settings.json,agents/.gitkeep.scripts/scaffold-enterprise.{sh,ps1}— byte-identical bash/pwsh parity scaffolder mirroringscaffold-repoconventions (explicit MAPPINGS allow-list, skip-with-notice on collision, no git ops, Created/Skipped summary).git-ape-onboarding-template-check.yml— newscaffold-enterprise-parity-smokejob + watched paths.SKILL.md— broadened description, "Onboarding Modes" overview, full enterprise playbook (CLI steps + UI-only AI-controls/ruleset hand-off), safe-execution rules, agent flow.templates/README.md— documents the new templates + enterprise scaffolder (scaffold-only, not mirrored) + parity requirement.Manual (UI-only) hand-off
Performed by an enterprise owner — the skill hands these off, never claims to automate them:
.github-private.Invariants preserved
github-private/**is scaffold-only — never mirrored into this repo (sync check stays green; confirmed not referenced bysync-templates.{sh,ps1}).Verification
scaffold-enterprise.shvs.ps1→diff -rbyte-identicalmanaged-settings.jsonvalid JSON;git-ape@git-apeenabled; marketplaceAzure/git-apesync-templates.sh checkstill green (no mirror drift)github-privatenot present in sync mappings (scaffold-only)SKILL.mdcross-link anchors resolve