From 189567b3059e2cdfcb01795c43cf355b5361b484 Mon Sep 17 00:00:00 2001 From: Nick Sullivan Date: Wed, 18 Feb 2026 23:17:09 -0600 Subject: [PATCH 1/5] Add spec-quality-first philosophy to autonomous development workflow Inspired by the "5 Levels of AI Coding" framework, shift both the autonomous development rule and autotask command from implementation-centric to outcome-centric thinking. The key insight: spec quality is the bottleneck in autonomous development, not implementation speed. Changes: - autonomous-development-workflow.mdc v3.0: Add spec quality framing, outcome evaluation, thin duplicated content to let autotask own operational detail - autotask.md v2.2: Add spec quality gate in task-preparation scaled by complexity, outcome evaluation in completion-verification Co-Authored-By: Claude Opus 4.6 --- .../rules/autonomous-development-workflow.mdc | 41 ++++++++++++------- plugins/core/commands/autotask.md | 35 ++++++++++++++-- 2 files changed, 58 insertions(+), 18 deletions(-) diff --git a/.cursor/rules/autonomous-development-workflow.mdc b/.cursor/rules/autonomous-development-workflow.mdc index 74980ba..5967e35 100644 --- a/.cursor/rules/autonomous-development-workflow.mdc +++ b/.cursor/rules/autonomous-development-workflow.mdc @@ -1,7 +1,7 @@ --- description: When completing tasks autonomously without human supervision alwaysApply: false -version: 2.0.0 +version: 3.0.0 --- # Autonomous Development Workflow @@ -9,9 +9,20 @@ version: 2.0.0 For AI agents completing tasks without human supervision. The goal: deliver a clean pull request that passes all checks and gets merged without back-and-forth. +## Spec Quality Is the Bottleneck + +Invest more time understanding the problem than writing the implementation. A precise +understanding of the problem produces better code than any amount of iteration on an +ambiguous one. If the task description is ambiguous, clarify before spending tokens on +implementation. + ## Before Implementation -Read all cursor rules in `rules/`. These define the project's standards. Every +Assess whether you have enough clarity to implement correctly. Can you articulate the +problem being solved, what "done" looks like, the edge cases, and the assumptions you're +making? If not, ask before proceeding. + +Load project standards via `/load-rules` or read rules in `.cursor/rules/`. Every applicable rule must be followed. If `CLAUDE.md` or `AGENTS.md` exist in the project root, read those for additional @@ -41,20 +52,22 @@ ruff format . # Auto-format code pytest # Run tests ``` -If we added functionality, we add tests following project patterns. Aim for 95% -coverage - solid testing without obsessing over every edge case. +If we added functionality, we add tests following project patterns. Aim for 95% coverage +-- solid testing without obsessing over every edge case. -Only commit and push when all validation passes. Green checks make for happy merges! βœ… +Only commit and push when all validation passes. -## Self-Review +## Evaluate Outcomes -Run `git diff` and review every change, as a senior developer would. +Verify the implementation delivers what was asked for -- not just technically correct, but +useful. Consider whether a user would understand the resulting behavior without +explanation. Check that no unnecessary complexity was introduced. -Read through ALL cursor rules again and verify our code follows each applicable -guideline. +Run `git diff` and review the changes at the feature level. The question isn't "is every +line correct" -- it's "does this changeset solve the problem cleanly?" -Would we approve this in code review? If not, keep iterating until it's something we're -pleased with. +Read through applicable rules and verify compliance. Document any assumptions or choices +the reviewer needs to understand. ## Submission @@ -72,6 +85,6 @@ Ask first: major architectural changes, changes that would result in data loss, ## Success -A successful autonomous PR means: all automated checks pass, code follows all cursor -rules, tests are green, and the developer merges it without requesting changes. Use the -tooling to get there - it's our friend! πŸŽ‰ +A successful autonomous PR means: the implementation solves the stated problem, all +automated checks pass, code follows project standards, and the developer merges it +without requesting changes. diff --git a/plugins/core/commands/autotask.md b/plugins/core/commands/autotask.md index ac5703e..954363c 100644 --- a/plugins/core/commands/autotask.md +++ b/plugins/core/commands/autotask.md @@ -1,7 +1,7 @@ --- # prettier-ignore description: "Execute development task autonomously from description to PR-ready - handles implementation, testing, and git workflow without supervision" -version: 2.1.0 +version: 2.2.0 --- # /autotask - Autonomous Task Execution @@ -118,9 +118,31 @@ Doing exploratory work yourself fills context with raw data. This is about worki the right level. -Ensure task clarity before implementation. If the task description is unclear or -ambiguous, use AskUserQuestion to clarify requirements. If clear, proceed to planning or -implementation based on complexity level. +Spec quality determines implementation quality. Ambiguous specs produce software that +fills gaps with AI guesses instead of customer-centric decisions. + +Before implementation, evaluate the task description: + +- **Problem clarity**: Can you articulate what user pain this solves? If not, you don't + understand the task yet. +- **Acceptance criteria**: What does "done" look like? Not "it works" β€” specific + behavioral expectations. +- **Edge cases**: What inputs, states, or conditions could break the expected behavior? +- **Unstated assumptions**: What are you assuming about the system, the user, or the + context? Document them. + +**quick**: If the task changes a single file with no behavioral impact (typo fixes, +comment updates, config tweaks), proceed without deep evaluation. + +**balanced**: Verify you can describe the expected behavior in concrete terms before +writing code. If you can't, clarify with the user. + +**deep**: Write a brief spec (problem, expected behavior, edge cases, acceptance +criteria) and validate it before implementation. This is the most valuable step in the +entire workflow β€” a precise spec saves rewrites. + +If the task description is unclear or ambiguous, use AskUserQuestion to clarify +requirements. If clear, proceed to planning or implementation based on complexity level. @@ -233,6 +255,11 @@ Autotask is complete when ALL are true: - Review bots have completed (or confirmed none configured) - /address-pr-comments executed - All "Fix" items resolved or documented +- The implementation solves the stated problem (not just passes tests) + +Before reporting completion, step back and evaluate: If a user encounters this feature +tomorrow, will it make sense? Does it do what was asked, or did implementation drift +from the original intent? Tests verify correctness β€” this final check verifies value. Report format: From 7964929a1a73eaf57638605fdc91b5e3ba2f4d19 Mon Sep 17 00:00:00 2001 From: Nick Sullivan Date: Wed, 18 Feb 2026 23:20:43 -0600 Subject: [PATCH 2/5] Address claude-review feedback on autonomous workflow - Fix /load-rules as prescriptive path with explicit fallback - Document assumptions target clarified to PR description - Quick gate: behavioral impact + localized (not file count) - Trim redundant trailing sentences from task-preparation Co-Authored-By: Claude Opus 4.6 --- .cursor/rules/autonomous-development-workflow.mdc | 8 ++++---- plugins/core/commands/autotask.md | 5 ++--- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/.cursor/rules/autonomous-development-workflow.mdc b/.cursor/rules/autonomous-development-workflow.mdc index 5967e35..341f67b 100644 --- a/.cursor/rules/autonomous-development-workflow.mdc +++ b/.cursor/rules/autonomous-development-workflow.mdc @@ -22,8 +22,8 @@ Assess whether you have enough clarity to implement correctly. Can you articulat problem being solved, what "done" looks like, the edge cases, and the assumptions you're making? If not, ask before proceeding. -Load project standards via `/load-rules` or read rules in `.cursor/rules/`. Every -applicable rule must be followed. +Load project standards via `/load-rules`. If that's not available, fall back to reading +rules in `.cursor/rules/` directly. Every applicable rule must be followed. If `CLAUDE.md` or `AGENTS.md` exist in the project root, read those for additional context. @@ -66,8 +66,8 @@ explanation. Check that no unnecessary complexity was introduced. Run `git diff` and review the changes at the feature level. The question isn't "is every line correct" -- it's "does this changeset solve the problem cleanly?" -Read through applicable rules and verify compliance. Document any assumptions or choices -the reviewer needs to understand. +Read through applicable rules and verify compliance. Document any assumptions or +non-obvious choices in the PR description so reviewers understand the decisions made. ## Submission diff --git a/plugins/core/commands/autotask.md b/plugins/core/commands/autotask.md index 954363c..626665e 100644 --- a/plugins/core/commands/autotask.md +++ b/plugins/core/commands/autotask.md @@ -131,7 +131,7 @@ Before implementation, evaluate the task description: - **Unstated assumptions**: What are you assuming about the system, the user, or the context? Document them. -**quick**: If the task changes a single file with no behavioral impact (typo fixes, +**quick**: If the task has no behavioral impact and is a localized change (typo fixes, comment updates, config tweaks), proceed without deep evaluation. **balanced**: Verify you can describe the expected behavior in concrete terms before @@ -141,8 +141,7 @@ writing code. If you can't, clarify with the user. criteria) and validate it before implementation. This is the most valuable step in the entire workflow β€” a precise spec saves rewrites. -If the task description is unclear or ambiguous, use AskUserQuestion to clarify -requirements. If clear, proceed to planning or implementation based on complexity level. +If ambiguity remains after evaluation, use AskUserQuestion before proceeding. From f65657ca72ad9669065941bdea9d1241c412bdc8 Mon Sep 17 00:00:00 2001 From: Nick Sullivan Date: Wed, 18 Feb 2026 23:27:01 -0600 Subject: [PATCH 3/5] =?UTF-8?q?=F0=9F=94=A7=20Apply=20prompt=20engineering?= =?UTF-8?q?=20review=20fixes=20and=20bump=20versions?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Autotask v2.3.0: Fix motivation-based language in bot-feedback-loop, simplify error-recovery to boundaries, replace ghost code-reviewer agent with real names, normalize sub-agentsβ†’agents terminology, remove redundant Key Principles/Configuration/Notes sections. Marketplace v9.17.0: Bump all three version locations. Co-Authored-By: Claude Opus 4.6 --- .claude-plugin/marketplace.json | 4 +- plugins/core/.claude-plugin/plugin.json | 2 +- plugins/core/commands/autotask.md | 83 +++++++------------------ 3 files changed, 26 insertions(+), 63 deletions(-) diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index 5132963..5452f22 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -6,7 +6,7 @@ }, "metadata": { "description": "Professional AI coding configurations, agents, skills, and context for Claude Code and Cursor", - "version": "9.16.0", + "version": "9.17.0", "license": "MIT", "repository": "https://github.com/TechNickAI/ai-coding-config" }, @@ -15,7 +15,7 @@ "name": "ai-coding-config", "source": "./plugins/core", "description": "Commands, agents, skills, and context for AI-assisted development workflows", - "version": "9.16.0", + "version": "9.17.0", "tags": ["commands", "agents", "skills", "workflows", "essential"] } ] diff --git a/plugins/core/.claude-plugin/plugin.json b/plugins/core/.claude-plugin/plugin.json index 3bfc41f..ffe6e1c 100644 --- a/plugins/core/.claude-plugin/plugin.json +++ b/plugins/core/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "ai-coding-config", - "version": "9.16.0", + "version": "9.17.0", "description": "Commands, agents, skills, and context for AI-assisted development workflows", "author": { "name": "TechNickAI", diff --git a/plugins/core/commands/autotask.md b/plugins/core/commands/autotask.md index 626665e..7c7811b 100644 --- a/plugins/core/commands/autotask.md +++ b/plugins/core/commands/autotask.md @@ -1,7 +1,7 @@ --- # prettier-ignore description: "Execute development task autonomously from description to PR-ready - handles implementation, testing, and git workflow without supervision" -version: 2.2.0 +version: 2.3.0 --- # /autotask - Autonomous Task Execution @@ -59,7 +59,7 @@ Signals: "quick fix", "simple change", trivial scope, typo, single function Standard multi-file implementation, some design decisions. - Light planning with /load-rules -- Delegate exploration to sub-agents +- Delegate exploration to agents - Targeted testing for changed code - /multi-review with 2-3 domain-relevant agents - Create PR β†’ /address-pr-comments β†’ completion @@ -70,7 +70,7 @@ Signals: Most tasks land here when auto-detected Architectural changes, new patterns, high-risk, multiple valid approaches. -- Full exploration via sub-agents +- Full exploration via agents - Use /brainstorm-synthesis for hard architectural decisions during exploration - Create detailed plan document incorporating synthesis results - **Review the PLAN with /multi-review** before implementation (architecture-auditor, @@ -107,13 +107,13 @@ For worktree creation, use /setup-environment. When the right choice isn't obvio Your context window is precious. Preserve it through delegation. -Delegate to sub-agents: codebase exploration, pattern searching, documentation research, +Delegate to agents: codebase exploration, pattern searching, documentation research, multi-file analysis, any task requiring multiple search/read rounds. Keep in main context: orchestration, decision-making, user communication, synthesizing results, state management, phase transitions. -Sub-agents work with fresh context optimized for their task and return concise results. +Agents work with fresh context optimized for their task and return concise results. Doing exploratory work yourself fills context with raw data. This is about working at the right level. @@ -149,20 +149,21 @@ Scale planning to complexity: **quick**: Skip to implementation. -**balanced**: Load relevant rules with /load-rules. Brief exploration via sub-agent if +**balanced**: Load relevant rules with /load-rules. Brief exploration via agent if needed. Create implementation outline. -**deep**: Full exploration via sub-agents. Create detailed plan document. Run -/multi-review on the PLAN with architecture-focused agents. Incorporate feedback before -writing code. Document design decisions with rationale. +**deep**: Full exploration via agents. Create detailed plan document. Run /multi-review +on the PLAN with architecture-focused agents. Incorporate feedback before writing code. +Document design decisions with rationale. Execute using appropriate agents based on task type: - debugger: Root cause analysis, reproduces issues - autonomous-developer: Implementation work, writes tests -- ux-designer: User-facing text, accessibility, UX consistency -- code-reviewer: Architecture review, design patterns, security +- ux-designer: User-facing text, UX consistency +- architecture-auditor: Architecture review, design patterns +- security-reviewer: Security analysis, injection, auth - prompt-engineer: Prompt optimization - Explore: Investigation, research, trade-off evaluation @@ -229,7 +230,9 @@ Why this approach. **Validation Performed**: Tests run. Verification steps taken. -This phase is MANDATORY. Autotask is not complete without it. +Bot feedback catches issues the author missed β€” security vulnerabilities, real bugs, +style violations. Addressing it before declaring completion prevents shipping known +defects. This phase completes the autotask workflow. After PR creation, poll for bot analysis using `gh pr checks`: @@ -240,10 +243,9 @@ After PR creation, poll for bot analysis using `gh pr checks`: If checks complete sooner, proceed immediately. If timeout reached with checks still pending, proceed with available feedback and note incomplete checks. -Execute /address-pr-comments on the PR. This is not optional. - -Fix valuable feedback (security issues, real bugs, good suggestions). Decline with -WONTFIX and rationale where bot lacks context. Iterate until critical issues resolved. +Execute /address-pr-comments on the PR. Fix valuable feedback (security issues, real +bugs, good suggestions). Decline with WONTFIX and rationale where bot lacks context. +Iterate until critical issues resolved. @@ -284,48 +286,9 @@ Report format: -**Git failures**: Merge conflicts β†’ pause for user resolution. Push rejected β†’ pull and -rebase if safe, ask if not. Hook failures β†’ fix the issue, never use --no-verify. - -**GitHub CLI failures**: Auth issues β†’ run `gh auth status`, inform user. Rate limits β†’ -log and suggest waiting. PR creation fails β†’ check branch exists remotely, retry once. - -**Sub-agent failures**: Log which agent failed. Retry once with simplified scope. If -still fails, continue without that input and note the gap. - -For issues you cannot resolve autonomously, inform user with clear options and context. -Never swallow errors silently. - -## Key Principles - -- Feature branch workflow: Work on branch, deliver via PR -- Complexity scaling: Effort matches task scope -- Context preservation: Delegate exploration, orchestrate at top level -- Mandatory completion: Task not done until bot feedback addressed -- Smart environment detection: Auto-detect when worktree needed -- Git hooks do validation: Leverage existing infrastructure -- PR-centric: Everything leads to mergeable pull request -- Decision transparency: Every autonomous choice documented in PR - -## Requirements - -- GitHub CLI (`gh`) installed and authenticated -- Node.js/npm -- Project standards accessible via /load-rules - -## Configuration - -Adapts to project structure: - -- Detects git hooks (husky, pre-commit) -- Detects test runners (jest, mocha, vitest, etc.) -- Finds linting configs (eslint, prettier, etc.) -- Uses available build scripts -- Respects project-specific conventions - -## Notes +Recover from failures without bypassing safety checks. Never use --no-verify. Never +silently swallow errors. Retry once before escalating. -- Creates real commits and PRs -- Environment auto-detected; asks when ambiguous -- Recognizes multi-repo workflows and existing worktrees -- Bot feedback handling is autonomous and mandatory +When blocked on something you cannot resolve autonomously (merge conflicts requiring +human judgment, auth failures, persistent CI issues), inform the user with clear options +and context. From dbc42e419c75e4bb60dd249016e16b09bba98718 Mon Sep 17 00:00:00 2001 From: Nick Sullivan Date: Wed, 18 Feb 2026 23:29:10 -0600 Subject: [PATCH 4/5] =?UTF-8?q?=F0=9F=94=A7=20Address=20claude-review=20fe?= =?UTF-8?q?edback=20round=202?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - customer-centric β†’ user-centric (broader than commercial context) - Remove config tweaks from quick gate examples (config changes can have behavioral impact) - Soften ambiguity gate to acknowledge quick path explicitly - Lead Evaluate Outcomes with git diff (concrete action before philosophy) - Fix missing newline before Co-Authored-By: Claude Opus 4.6 --- .cursor/rules/autonomous-development-workflow.mdc | 11 +++++------ plugins/core/commands/autotask.md | 4 ++-- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/.cursor/rules/autonomous-development-workflow.mdc b/.cursor/rules/autonomous-development-workflow.mdc index 341f67b..e620def 100644 --- a/.cursor/rules/autonomous-development-workflow.mdc +++ b/.cursor/rules/autonomous-development-workflow.mdc @@ -13,8 +13,8 @@ request that passes all checks and gets merged without back-and-forth. Invest more time understanding the problem than writing the implementation. A precise understanding of the problem produces better code than any amount of iteration on an -ambiguous one. If the task description is ambiguous, clarify before spending tokens on -implementation. +ambiguous one. If the task description is ambiguous, clarify before implementing. Quick, +localized changes with no behavioral impact may proceed directly. ## Before Implementation @@ -59,13 +59,12 @@ Only commit and push when all validation passes. ## Evaluate Outcomes -Verify the implementation delivers what was asked for -- not just technically correct, but -useful. Consider whether a user would understand the resulting behavior without -explanation. Check that no unnecessary complexity was introduced. - Run `git diff` and review the changes at the feature level. The question isn't "is every line correct" -- it's "does this changeset solve the problem cleanly?" +Verify the implementation delivers what was asked for -- not just technically correct, but +useful. Check that no unnecessary complexity was introduced. + Read through applicable rules and verify compliance. Document any assumptions or non-obvious choices in the PR description so reviewers understand the decisions made. diff --git a/plugins/core/commands/autotask.md b/plugins/core/commands/autotask.md index 7c7811b..afa0379 100644 --- a/plugins/core/commands/autotask.md +++ b/plugins/core/commands/autotask.md @@ -119,7 +119,7 @@ the right level. Spec quality determines implementation quality. Ambiguous specs produce software that -fills gaps with AI guesses instead of customer-centric decisions. +fills gaps with AI guesses instead of user-centric decisions. Before implementation, evaluate the task description: @@ -132,7 +132,7 @@ Before implementation, evaluate the task description: context? Document them. **quick**: If the task has no behavioral impact and is a localized change (typo fixes, -comment updates, config tweaks), proceed without deep evaluation. +comment updates), proceed without deep evaluation. **balanced**: Verify you can describe the expected behavior in concrete terms before writing code. If you can't, clarify with the user. From 79f837e8973ddc6334e58e7361ef63f6577fb7ca Mon Sep 17 00:00:00 2001 From: Nick Sullivan Date: Wed, 18 Feb 2026 23:34:42 -0600 Subject: [PATCH 5/5] =?UTF-8?q?=F0=9F=94=A7=20Address=20round=203=20claude?= =?UTF-8?q?-review=20feedback?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add explicit fallthrough for quick gate with behavioral impact: "Otherwise, treat as balanced" - Remove hardcoded .cursor/rules/ path from fallback (tool-agnostic now) - Update PR description to correct version number (v2.1β†’v2.3 not v2.1β†’v2.2) and document design decisions WONTFIX: error-recovery oversimplification (--no-verify still explicit, other condensation intentional per prompt engineering review), removed requirements/config sections (agent-facing not user-facing), accessibility removed from ux-designer (project policy), completion check wording (both angles covered) Co-Authored-By: Claude Opus 4.6 --- .cursor/rules/autonomous-development-workflow.mdc | 2 +- plugins/core/commands/autotask.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/.cursor/rules/autonomous-development-workflow.mdc b/.cursor/rules/autonomous-development-workflow.mdc index e620def..26183db 100644 --- a/.cursor/rules/autonomous-development-workflow.mdc +++ b/.cursor/rules/autonomous-development-workflow.mdc @@ -23,7 +23,7 @@ problem being solved, what "done" looks like, the edge cases, and the assumption making? If not, ask before proceeding. Load project standards via `/load-rules`. If that's not available, fall back to reading -rules in `.cursor/rules/` directly. Every applicable rule must be followed. +applicable rules directly. Every applicable rule must be followed. If `CLAUDE.md` or `AGENTS.md` exist in the project root, read those for additional context. diff --git a/plugins/core/commands/autotask.md b/plugins/core/commands/autotask.md index afa0379..2f69a2f 100644 --- a/plugins/core/commands/autotask.md +++ b/plugins/core/commands/autotask.md @@ -132,7 +132,7 @@ Before implementation, evaluate the task description: context? Document them. **quick**: If the task has no behavioral impact and is a localized change (typo fixes, -comment updates), proceed without deep evaluation. +comment updates), proceed without deep evaluation. Otherwise, treat as balanced. **balanced**: Verify you can describe the expected behavior in concrete terms before writing code. If you can't, clarify with the user.