diff --git a/CHANGELOG.md b/CHANGELOG.md index 10fe904..993e87d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,18 @@ Please choose versions by [Semantic Versioning](http://semver.org/). * MINOR version when you add functionality in a backwards-compatible manner, and * PATCH version when you make backwards-compatible bug fixes. +## Unreleased + +- feat(launch-goal): add new `/vault-cli:launch-goal` interview-driven goal framing command — discovery → fan-out exploration (5 parallel semantic searches + duplicate-check gate) → 3-lens framing (parallel subagents → top-3 candidates) → sharpen → draft-to-disk with `status: draft` + Obsidian link → parallel verify (Adversarial Laziness Test + outcome traceability + hedge-word grep) → audit fan-out (goal-auditor + graph-auditor + late dup-check) → status flip on PASS. Resolves the "create-goal jumps straight to writing" failure mode by forcing an outcome-sentence confirmation gate before file creation. Mirrors `/launch-agent` shape; positions as the rigorous front door beside `create-goal`'s template fast-path +- docs(goal-writing): add § "Tooling-Category Exception" extending Title sniff test — artifact-shaped titles ("Multi-Provider Claude Code Proxy", "Goal-Writing Assistant") accepted for goals where the artifact IS the deliverable; rule: don't bounce tool-existence framings back to outcome-only +- docs(goal-writing): add § "Tasks as Business-Value Milestones" — tasks are shippable outcomes, not WBS slices; explicit decomposition hierarchy (Goal → linked Tasks (wikilinks, separate files) → inline Subtasks (checkboxes in task file)); MUST render as `[[Wikilinks]]`, NOT bold text; 1-8 is soft cap NOT floor; foundation/skeleton work allowed when explicitly framed +- docs(goal-writing): add § "Evidence Shape per Success Criterion" — borrowed from `dark-factory/docs/rules/spec-writing.md`; every SC declares observable evidence (exit code / log line / file content / state transition / metric delta / negative evidence / file artifact); kills "tests pass" / "it works" vagueness +- docs(goal-writing): add § "Adversarial Laziness Test" — borrowed from `dark-factory/docs/rules/spec-writing.md`; read SCs assuming laziest possible implementation; if `[x]` everywhere tomorrow would feel "done" without doing the work, SCs are under-specified +- docs(goal-writing): add § "Anti-pattern: soak-time DoD on personal-laptop tools" under Definition of Done — operator IS the runtime monitor for laptop tools; prefer exercise-now verification ("all paths reached in one session") over time-based bake ("runs N days without incident"); soak-time reserved for prod services with silent-degradation risk; updates Tasks-section bullet in Required sections to mandate `[[Wikilink]]` format +- docs(task-writing): add § "Subtask Hierarchy" under Task Structure — Goal → linked Tasks (wikilink files) → inline Subtasks (checkboxes inside task file); subtasks are atomic work units with no independent identity, no separate files; decision rule: shippable milestone → N tasks, sequential steps within one milestone → N inline subtasks; never recreate file-link hierarchy below task level +- feat(goal-creator): step 9 `# Tasks` body composition now requires `[[Wikilink]]` format when tasks supplied at creation time (NOT bold text + description); step 12 audit checks expanded to grep for bold-text task entries and flag soak-time DoD anti-pattern phrases +- feat(goal-auditor): item 8 "Tasks Quality" extended with WARN flags for (a) bold-text tasks instead of `[[Wikilinks]]` (disables Obsidian auto-create-on-click), (b) WBS-shaped task titles (≥3 tasks starting with `Implement`/`Define`/`Add `/`Refactor`/`Migrate`/`Wire`/`Configure`); count guidance updated to "1-8 soft cap, NOT a floor" — don't flag 1-3 tasks as under-count. Item 12 "Definition of Done Quality" extended with WARN flag for soak-time DoD phrases (`runs for N hours/days`, `one real working day's worth`, `no regressions for a week`) on tooling-category goals; don't flag on production-service goals where soak-time is appropriate + ## v0.88.0 - feat(auditors): goal-auditor + task-auditor flag missing/empty `# Definition of Done` section as MAJOR; severity matrix uses `DOD_REQUIRED_AS_OF=2026-06-26` constant with grandfathering (pages `created` before cutoff → WARN, not MAJOR); task-auditor adds dev → prod ladder check for multi-environment shipping-class artifacts (detection requires explicit `dev` + `prod` co-occurrence, not container keywords); 4 new dishonest-tick phrases added to anti-pattern list (`tested on dev only`, `ci passed = tested`, `auto-release tagged ≠ shipped`, `deferred to follow-up goal`) diff --git a/agents/goal-auditor.md b/agents/goal-auditor.md index 66ddd30..40e88c5 100644 --- a/agents/goal-auditor.md +++ b/agents/goal-auditor.md @@ -82,8 +82,10 @@ Expert Obsidian goal auditor specializing in evaluating goal pages against the G - **Comprehensive**: 3-5 criteria covering key outcomes ### 8. Tasks Quality -- **Count**: 4-8 major tasks -- **Linked**: Major tasks link to standalone task pages `[[Task Name]]` +- **Count**: 1-8 major tasks. The 4-8 range is a soft cap, NOT a floor — small goals can have 1 task ("Implement the proxy"). Don't flag 1-3 tasks as under-count; only flag >8 as over-count. See `docs/goal-writing.md` § Tasks as Business-Value Milestones. +- **Linked as wikilinks (WARN if not)**: tasks MUST render as `[[Wikilink Task Title]]`, NOT bold text + description. Detect with `grep -nE '^\s*[0-9]+\.\s+\*\*[^[]' ` in the `# Tasks` section — any match is bold-text-task and disables Obsidian auto-create-on-click. Recommendation: "Convert bold-text task entries to `[[Wikilinks]]` so clicking in Obsidian auto-creates the task file." +- **Business-value milestones (WARN if WBS-shaped)**: each task delivers a *shippable improvement*, not a code-change slice. Detect WBS-shaped titles by leading-verb pattern: titles starting with `Implement`, `Define`, `Add` (when followed by a noun like "schema", "field", "adapter"), `Refactor`, `Migrate`, `Wire`, `Configure` are likely WBS rows. If ≥3 of the goal's tasks fit this pattern, flag as WARN with: "Tasks read as code-change slices, not business-value milestones. Consider collapsing into 1-3 shippable-milestone tasks; move the code-change breakdown to inline subtasks inside each task file (see `docs/goal-writing.md` § Tasks as Business-Value Milestones)." +- **Foundation work allowed when explicitly framed**: tasks like "Set Up Project Skeleton" advance no SC by design. Don't flag as orphan if the task body explicitly says "foundation; enables iteration" or similar — accept the framing. Otherwise apply the orphan rule from Task-Goal Alignment below. - **Structured**: Logical order or phased approach ### 9. SMART Compliance @@ -119,6 +121,30 @@ The `# Definition of Done` section is the closure gate. Goals that lack it (or h **Reference checks:** The DoD section should reference `[[Goal Closure Checklist]]` (generic 6-section structure) and/or `[[Closure Patterns]]` (per-artifact blocks) — recommend, don't require. +**Soak-time DoD anti-pattern (WARN):** flag DoD checkboxes whose evidence is time-based bake. The check is **gated on category** and runs in two ordered steps — gate first, grep second. + +**Step 1 — category gate (apply ONLY if BOTH hold):** + +1. **Tooling category match** — frontmatter `category: tooling`, OR title/summary contains a personal-laptop-tool keyword (`slash command`, `CLI`, `script`, `daemon`, `plugin`, `extension`, `proxy`, `wrapper`, `helper`) — **case-insensitive substring match**. +2. **No production signal** — title, summary, and DoD do NOT contain any of: `prod`, `production`, `k8s`, `kubernetes`, `multi-user`, `customers`, `tenants`, `trading hot path`, `live traffic`, `SLA`, `uptime` — **case-insensitive substring match**. + +If EITHER step-1 condition fails → do NOT scan; skip soak-time check entirely. Soak-time DoD is appropriate on production-service goals (k8s, multi-user, trading hot path) and on goals where prod-shape is named explicitly. + +**Step 2 — phrase grep (run only after step 1 passes):** + +Detect by phrase matching in the DoD section (case-insensitive): + +- `runs for N (hour|day|week)s? without (incident|regression|workaround)` and variants +- `one (real )?(working )?day's? worth of` +- `no regressions for a (week|day|N (hours|days|weeks))` +- `runs unattended for N (hours|days|weeks)` +- `runs reliably (over|for) N` +- `stable for N (days|weeks)` + +On any hit, emit WARN with: "Soak-time DoD on a personal-laptop tool — the operator IS the runtime monitor and notices breakage immediately. Replace with exercise-now verification ('all paths reached in one session, evidence: log line per path'). See `docs/goal-writing.md` § Anti-pattern: soak-time DoD on personal-laptop tools." + +**Rationale for the two-step gate:** phrase-grep alone over-triggers — a k8s prod goal can legitimately say "runs for 24h without incident on prod" and that's correct DoD for that artifact type. The category gate enforces "this check applies ONLY when the artifact is a personal-laptop tool"; the prod-signal carve-out catches the edge case of a tooling-category goal that still has prod implications (e.g. a CLI used by a multi-user service). + ## Goal Scope Fit (CRITICAL — flag at top of report if mismatch) **Goals exist to organize coherent multi-task achievement.** Bloated goals (10+ tasks, mixed concerns) hide scope creep; thin goals (1-2 tasks) are usually a single task in disguise. Evaluate on these signals: diff --git a/agents/goal-creator.md b/agents/goal-creator.md index a391ef2..4ccea3d 100644 --- a/agents/goal-creator.md +++ b/agents/goal-creator.md @@ -134,7 +134,7 @@ If a template body was loaded in step 7, use it. Otherwise, generate the standar 5. `# Status Summary` — Progress / Current / Next / Blockers placeholders 6. `# Success Criteria` — measurable outcomes as checkboxes 7. `# Non-goals` — explicit out-of-scope items (3–7 concrete deferrals; placeholder if author drafting). Forces scope articulation at write-time; mirrors dark-factory spec convention. See Goal Writing Guide section 7. -8. `# Tasks` — placeholder for linked subtasks (created later) +8. `# Tasks` — placeholder for linked tasks (created later). When tasks ARE supplied at creation time, render each as a `[[Wikilink Task Title]]` (one per line, numbered or bulleted), NOT bold text + description — Obsidian auto-creates the task file when the operator clicks the wikilink (primary task-creation path); bold-text disables this path. Tasks are **business-value milestones**, not WBS slices — see `docs/goal-writing.md` § Tasks as Business-Value Milestones. The 1-8 range is a soft cap, NOT a floor: small goals can have 1 task; don't pad. Title-Case names, no `/`, `.`, backticks, `:`, `*`, `?`, `"`, `<`, `>`, `|`. 9. `# Related` — themes / related goals / docs ## 10. Check for filename collision @@ -165,6 +165,8 @@ Run a light self-audit against the file: - Title is outcome-shaped (no verb-first activity openings: "Build X" / "Refactor Y" / "Set up Z" / "Migrate Y") - Summary first sentence is outcome-shaped (no `via X` / `by doing Y` / `through Z`, AND no verb-first activity opening: "Build X" / "Refactor Y" / "Set up Z" / "Split X and build Y") - Body has Success Criteria + Tasks sections (or template body) +- Tasks (if supplied) are `[[Wikilinks]]`, NOT bold text + description (`grep -E '^\s*[0-9]+\.\s+\*\*' | head -1` should return empty in the `# Tasks` section — bold-text task entries break Obsidian auto-create-on-click) +- DoD has no soak-time anti-pattern items (no "runs for N hours/days", "no regressions for a week"); for personal-laptop tools, prefer exercise-now verification — see `docs/goal-writing.md` § Anti-pattern: soak-time DoD - If `timeline` is set, validate it is ≤ 4 weeks - No accidental empty sections diff --git a/commands/launch-goal.md b/commands/launch-goal.md new file mode 100644 index 0000000..df25018 --- /dev/null +++ b/commands/launch-goal.md @@ -0,0 +1,276 @@ +--- +description: Interview-driven goal framing — discovery → fan-out exploration → outcome candidates → sharpen → draft-to-disk → parallel verify → audit + status flip. Parallels /launch-agent. Resolves the "create-goal jumps straight to writing" failure mode by forcing an outcome-sentence confirmation gate before the file is touched. +argument-hint: "[rough idea]" +allowed-tools: [Task, Read, Write, Edit, Glob, Grep, Bash, AskUserQuestion, mcp__semantic-search__search_related] +--- + + +Frame a goal through interview-driven discovery before drafting the file. Uses fan-out / fan-in patterns at exploration, framing, verification, and audit to surface more signal in parallel without lengthening the user-facing path. + + + +- `docs/goal-writing.md` — generic contract (Title sniff test, Summary sniff test, Tasks as business-value milestones, Soak-time DoD anti-pattern, Evidence shape per SC, Adversarial Laziness Test, Scope Check, required sections) +- `docs/task-writing.md` — subtask hierarchy (linked Tasks → inline Subtasks) +- Per-vault writing-guide extension (if present in the vault's `knowledge_dir`) — vault-specific examples and conventions +- `~/Documents/workspaces/dark-factory/docs/rules/spec-writing.md` — source of evidence-shape vocab + Adversarial Laziness Test (borrowed concepts, not the whole framework) +- `[[Goal Closure Checklist]]`, `[[Closure Patterns]]` — referenced from the Definition of Done section +- `$ARGUMENTS` — optional rough idea; if empty, start Phase 1 from scratch + + + +- **KISS over rigor.** Goals are conversation anchors, not autonomous-agent contracts. Borrow the cheap-and-high-signal spec patterns (evidence shape, laziness test); skip the spec-grade ceremony (formal decomposition matrix, hedge-word audits). +- **Fan-out where parallel branches genuinely produce different signal** (exploration, framing, verification, audit). **Stay linear** where the work is a single user-focused thread (interview, section drafting). +- **No silent advance through gates.** Outcome confirmation, draft approval, forcing-test pass, scope check — each is an explicit user-visible checkpoint. + + + + +## Phase 1 — Understand (linear, conversational) + +Goal: surface the real outcome, the user it serves, and a list of scope-creep candidates for Phase 4 Non-goals. + +Ask ONE question per turn (not a list). Keep going until you have enough signal — typically 3-5 turns. Always include the **"so that..." probe** somewhere: + +> "If this ships, what is true that isn't today — and why does that matter?" + +If the user struggles to answer the "so that" cleanly, that's a red flag the goal isn't well-understood yet. Loop one more probe before advancing. + +Other starter prompts (pick what fits): +- "What's broken or missing today?" +- "Who feels the change — you, an operator, an agent, a downstream user?" +- "How will you know it worked?" (proto-success-criteria signal) +- Listen for adjacent-but-out-of-scope work — note these as Phase 4 Non-goal candidates. + +**Hard gate to Phase 2**: you can state the rough outcome in one sentence without using "via", "by", "through", or naming a mechanism. If you can't, ask one more question. + +## Phase 2 — Explore (fan-out: 5 parallel semantic searches + linkage check) + +**Fan-out**: in a single message, issue parallel `mcp__semantic-search__search_related` calls: + +1. `" theme"` → parent theme candidates from the vault's `themes_dir` +2. `" objective"` → parent objective candidates from the vault's `objectives_dir` +3. `""` → existing or duplicate goals from the vault's `goals_dir` +4. `" adjacent"` → sibling/related goals worth linking +5. `" example"` → vault examples or guide entries worth quoting back during framing + +**Fan-in**: build a context bundle: +- Top theme candidate(s) (≤3) +- Top objective candidate(s) (≤2) +- Goals scoring ≥ similarity threshold → flagged as **possible duplicates** +- Linked-goal candidates (for `# Related` section) +- Example phrasings for Phase 3 framing + +### Duplicate-check gate (hard) + +If the search returned any goal with high outcome-overlap (similar title OR similar Summary first sentence), STOP and present via AskUserQuestion: + +- **1 (Recommended on close match)** — Extend the existing goal `[[]]` instead of creating a new one (open it, suggest where to add the new SC/task) +- **2** — Create a separate goal (the existing one is adjacent but distinct; explain in one sentence why) +- **3** — Abort (the existing goal already covers this) + +Do NOT advance past this gate silently. If 2 is picked, mention the adjacent goal in `# Related` automatically. + +### Optional: code or web exploration + +Only when the goal references an external system the user can't characterize from memory (rare). Skip by default. + +## Phase 3 — Frame (fan-out: 3 lens subagents → fan-in: top-3 candidates) + +**Fan-out**: spawn 3 parallel `Task` calls (general-purpose agent), each with a different framing lens. Each returns ONE outcome sentence and a one-line rationale. + +- **Lens A — user-impact**: "Frame this goal as an outcome from the perspective of who feels the change. Name the actor + the new capability or relief, no mechanism. ≤1 sentence." +- **Lens B — system-state**: "Frame this goal as a before/after world state. Name what is true after that isn't now. No mechanism. ≤1 sentence." +- **Lens C — theme-alignment**: "Given parent theme `<top theme candidate>`, frame this goal as the specific delta on that theme this goal will deliver. No mechanism. ≤1 sentence." + +Pass each subagent: the Phase 1 transcript + the Phase 2 context bundle + the outcome-vs-mechanism sniff-test rules from `docs/goal-writing.md` § Summary. + +**Fan-in** (main loop): +- Discard any candidate that fails the sniff test (mechanism leak: "via", "by", "through", verb-first activity opening like "Build / Refactor / Set up / Migrate") +- Dedupe semantic overlap (keep the strongest phrasing) +- If <3 distinct sniff-test-passing candidates survive, run one more Phase 1 probe and retry — do NOT show weak options +- Rank: prefer the lens most aligned with the user's "so that…" answer from Phase 1 + +Present top 3 via AskUserQuestion (single-select), #1 recommended, with "Other (counter-propose)" implicit. + +- "Other" with counter-proposal → run through sniff test → if passes, lock; if fails, rewrite and re-confirm +- **Tooling-category exception**: when the user counter-proposes a tool-existence-shaped framing ("I have an assistant that …" / "I have a proxy that …"), don't bounce back to outcome-only — accept it as a tooling goal (the artifact IS the outcome), tighten phrasing if needed, and lock. See `docs/goal-writing.md` § Tooling-Category Exception. + +**Hard gate to Phase 4**: user confirmed an outcome sentence. **No silent advance.** + +## Phase 4 — Sharpen (linear) + +Derive from the locked outcome + Phase 1 transcript + Phase 2 context bundle. + +- **Title** — outcome-shaped, passes the Title sniff test from `docs/goal-writing.md` § Title (mechanism table). Tooling-category exception accepted for artifact-shaped titles when the artifact IS the outcome. +- **Success Criteria** — 3-5 binary checkboxes; each declares an **evidence shape** (one phrase, vocab from `docs/goal-writing.md` § Evidence Shape per Success Criterion): + + | Shape | Example | + |---|---| + | exit code | "`make precommit` exits 0" | + | log line | "log line `request_id=<uuid> status=ok`" | + | file content / diff | "`grep -n 'pattern' file.md` returns ≥1 line" | + | HTTP response | "`GET /api/x` returns 200" | + | state transition | "frontmatter `status` transitions `next → in_progress`" | + | metric delta | "counter `foo_total` increments by N" | + | negative evidence | "`grep ERROR run.log` returns 0 lines" | + | file artifact | "task file under `tasks_dir/` exists with frontmatter `goal: [[X]]`" | + + Not closure steps. Not "tests pass". Not "it works". + +- **Definition of Done** — ≥2 binary closure checkboxes. Reference `[[Goal Closure Checklist]]` + the matching `[[Closure Patterns]]` block for the artifact type (k8s service / CLI tool / docs). Add ≥2 project-specific extras inline. **Anti-pattern: avoid soak-time DoD** ("runs N hours/days without incident", "one real working day's worth of use", "no regressions for a week") for personal-laptop tools the operator drives interactively. The operator IS the runtime monitor and notices breakage immediately. Prefer **exercise-now** verification ("all paths reached in one session, evidence: log line per path") over time-based bake. Soak-time DoD is appropriate only for production services with silent-degradation risk (prod k8s, multi-user, trading hot path) — flag explicitly when used. See `docs/goal-writing.md` § Soak-Time DoD Anti-Pattern. + +- **Non-goals** — surface the Phase 1 scope-creep candidates via AskUserQuestion (multiSelect): *"Which of these are OUT of scope?"* Target 3-7 concrete deferrals. Link follow-up goals/tasks where the deferred work will live (use the Phase 2 adjacent-goal results to suggest links). + +- **Tasks** — **business-value milestones**, not code-change slices. Each task delivers a shippable improvement: "Allow Claude Code to pass through the proxy" YES; "Implement config-driven routing core" NO — that's a WBS slice. The 4-8 range from `docs/goal-writing.md` is a soft cap, NOT a floor: **small goals can have 1 task** ("Implement the proxy"). Don't pad to hit a number. Each task visibly traces to ≥1 SC; foundation/skeleton work that enables but doesn't advance an SC is allowed if explicitly framed ("foundation; enables iteration"). **Decomposition hierarchy** (encode this when drafting): + + ``` + Goal (file) → linked Tasks (wikilinks, separate files) → inline Subtasks (checkboxes inside the task file) + ``` + + **Decision rule**: if each could be a shippable milestone → N separate tasks. If sequential steps within one milestone → 1 task with N inline subtasks. Implementation breakdown (schemas, adapters, refactors, types) lives as inline subtasks inside the task file (or in a dark-factory spec for code-heavy work) — **never as sibling task files**. Subtasks are atomic work units, no independent identity, no separate files. Don't recreate file-link hierarchy below the task level. See `docs/goal-writing.md` § Tasks as Business-Value Milestones + `docs/task-writing.md` § Subtask Hierarchy. + + **Format rules:** + - MUST render as `[[Wikilink Task Title]]` in the goal file body, NOT bold text + description + - Obsidian auto-creates the task file when the operator clicks the wikilink — primary task-creation path + - Closing summary surfaces `/vault-cli:create-task "<title>"` as the alternative CLI path + - Title-Case names, no `/`, `.`, backticks, `:`, `*`, `?`, `"`, `<`, `>`, `|` + - Optional one-line context after the wikilink (e.g. `1. [[Task Title]] — context (→ SC2)`) + +- **Parent theme + objective** — pre-fill from Phase 2 top candidates; confirm via AskUserQuestion if multiple plausible matches. + +- **Impact** (one paragraph) — strategic value + theme connection. **Lead the paragraph with the user's verbatim "so that" answer from Phase 1 if it's memorable.** The Phase 3 lens subagents tend to sanitize human framing into clinical phrasing — preserve the original voice; do not rewrite for tone. + +- **Status Summary** — `Progress: 0%` / `Current: Goal drafted` / `Next: <first task title>` / `Blockers: None`. + +### Write draft to disk + show Obsidian link + +**Do NOT render the full draft in chat.** Markdown walls in chat are unreadable, lose clickable wikilinks, and force the user to dictate edits back through chat. Write the file to disk with `status: draft` and let the user review in Obsidian's native rendering. + +1. **Resolve target vault** via `vault-cli config list --output json` (don't hardcode paths) +2. **Check for filename collision** via Glob `<vault.path>/<goals_dir>/<Title>.md`. If the file already exists, NEVER silently overwrite — present via AskUserQuestion: + - **1 (Recommended)** — Pick a different title (loops back to title generation in Phase 4 sharpen) + - **2** — Append disambiguating suffix (e.g. `- v2`, `- Base`, `- <year>`) and retry + - **3** — Open the existing goal (likely it's the goal you meant to extend — see Phase 2 duplicate gate) + - **4** — Abort + + Mirrors `goal-creator` step 10 — collision guard is mandatory before any write. + +3. **Write `<goals_dir>/<Title>.md`** with all Phase 4 content, frontmatter `status: draft` (NOT `in_progress` yet — flipped on audit PASS in Phase 6). Canonical section order from `docs/goal-writing.md` § Required sections: + - Frontmatter: `status: draft`, `page_type: goal`, `themes:` (confirmed in Phase 4), `objective:` (confirmed in Phase 4), `created: <today>`, optional `category`, `priority`, `timeline` + - `Tags: [[Goal]]` (+ theme tags) + - Summary paragraph (the Phase 3 locked sentence + one optional quantification sentence) + - `# Impact` + - `# Status Summary` + - `# Success Criteria` (with evidence shapes inline) + - `# Definition of Done` + - `# Non-goals` + - `# Tasks` (as `[[Wikilinks]]`, NOT bold text) + - `# Related` (linked sibling goals from Phase 2) +4. **Show the link inline** (single message, no chat-render of file content): + + ``` + Draft written: [<Title>](obsidian://open?vault=<vault>&file=<encoded-path>) + Review in Obsidian — edit any section directly. Say "go" to advance to verify + audit. + ``` + +5. **Wait for user "go"** before advancing. + +**Hard gate to Phase 5**: file written on disk with `status: draft` + user said "go" (file may have been edited in Obsidian between draft-write and go — Phase 6 re-reads before audit). + +## Phase 5 — Verify (fan-out: 3 parallel forcing tests → fan-in: PASS or fix-list) + +**Fan-out**: in a single message, run 3 parallel forcing tests as `Task` calls (or inline if simpler): + +1. **Adversarial Laziness Test** — *"If the operator wrote `[x]` on every Success Criterion tomorrow without doing the actual work, would the goal feel done?"* If yes → list which SCs are under-specified. See `docs/goal-writing.md` § Adversarial Laziness Test. +2. **Outcome traceability** — *"Does every Success Criterion verify the locked outcome sentence (Phase 3)?"* If no → list SCs that drifted off-target. +3. **Hedge-word grep** — scan Summary + SCs + Tasks for `should / appropriate / reasonable / as needed / if necessary / proper / correct`. Distinguish deferral from descriptive use — flag only deferrals. + +**Fan-in**: consolidate into one of: +- **PASS** — all three tests clean → advance +- **FIX** — list specific edits needed → loop back to Phase 4 draft (re-show, re-approve) + +### Scope check (linear, follows verify) + +After verify passes, run the Scope Check from `docs/goal-writing.md` § Scope Check: +- Task count ≤ 8 (soft cap, not floor — 1 task is fine for small goals) +- Tasks-to-criteria ratio ≤ 2.5× +- All tasks share one mental model (one operator outcome, one domain) +- Title + Summary both pass their sniff tests + +If 3+ signals fail → goal is over-scoped. Use AskUserQuestion (KISS bias: collapse before split): + +- **1 (Recommended)** — Collapse fragmented tasks into broader milestones. Goals usually want fewer-broader tasks, not more granular ones. Most ratio failures dissolve when 3 small tasks become 1 named milestone. +- **2** — Add Success Criteria the existing tasks already serve. Often the tasks are fine; the SC list under-counts what's actually being delivered. +- **3** — Split into N goals. Use ONLY when the goal is genuinely multi-outcome (two separate "so that" answers). On split, loop Phase 4 per split (Phase 1-3 stay reusable — the umbrella context still applies). + +## Phase 6 — Audit + status flip + +### Re-read draft + audit + flip status on PASS + +The file already exists from Phase 4 with `status: draft`. The user may have edited it in Obsidian between draft-write and "go" — re-read what's actually on disk so audit operates on the user's edits, not the original draft. + +1. **Re-read** `<goals_dir>/<Title>.md` to capture any in-Obsidian edits +2. **Run audit fan-out** (see below) — on the disk content, not the original draft +3. **On audit PASS** (0 MAJOR): Edit frontmatter `status: draft` → `status: in_progress` +4. **On audit FAIL** (≥1 MAJOR): leave `status: draft`; surface MAJOR findings; user can fix in Obsidian directly (or via chat) and say "re-audit" — loop back to step 1 + +### Audit (fan-out: parallel auditors + late dup-check) + +In a single message, run in parallel: + +- `Task(subagent_type: "vault-cli:goal-auditor", ...)` — full guide-compliance audit +- `Task(subagent_type: "vault-cli:graph-auditor", ...)` (or `hierarchy-auditor` where available) — orphan + broken wikilink check, theme/objective backlink integrity +- `mcp__semantic-search__search_related` against the just-written outcome sentence → final dup-check (catches duplicates the rough-idea search missed once the outcome is locked) + +**Fan-in**: consolidate MAJOR (must-fix) and WARN (note) findings into a single block. + +### Print closing summary — rich launchpad + +Final output is the operator's action panel. Goal as clickable `obsidian://` URL (file exists). Tasks as plain `[[wikilinks]]` — match what's in the goal file body, and clicking them in Obsidian auto-creates the task file (primary path). The closing summary surfaces `/vault-cli:create-task` as the alternative CLI path for operators who prefer scripted creation. Related links as clickable `obsidian://` URLs (files exist). + +``` +✅ Goal: [<Title>](obsidian://open?vault=<vault>&file=<encoded-path>) + +Audit: MAJOR <n> / WARN <n><one-line WARN summary if non-trivial> +Theme backlink: [[<primary theme>]] # Sub-Goals ✓ + +📋 Tasks to create (<N>, in dependency order): + 1. [[<Task 1 title>]] + 2. [[<Task 2 title>]] + ... + +Related (existing — click to navigate): + - [<Linked Title>](obsidian://...) — <one-line context> + - ... + +Next: + → /vault-cli:create-task "<Task 1 title>" (first in dependency order) + → or /vault-cli:create-task <pick-by-name> for a different task + (if MAJOR > 0: fix in Obsidian + say "re-audit" — task creation gates on clean audit) +``` + +**URL encoding rules**: +- Use `%20` for spaces, `%2F` for `/` +- Strip `.md` from file paths in `obsidian://` URLs +- Vault name is case-sensitive — use the value from `vault-cli config list` + +**Skip the Related block** if there are no existing-file links to surface (all-new namespace). Empty sections are noise. + +</process> + +<success_criteria> +- User explicitly confirmed an outcome sentence in Phase 3 (no silent advance) +- Duplicate-check gate ran in Phase 2 — either no high-overlap match, OR user chose extend/separate/abort +- Goal file exists with all 9 required sections populated from the interview +- Each Success Criterion declares an evidence shape inline +- Tasks rendered as `[[Wikilinks]]`, NOT bold text + description +- Phase 5 forcing tests (laziness + traceability + hedge) all PASS before audit +- Final audit (Phase 6 fan-out) returns 0 MAJOR findings; on PASS, `status` flips `draft` → `in_progress` +</success_criteria> + +<notes> +- **Mirrors `/launch-agent`** in shape (interview → scaffold → checklist), differs in scope (one markdown file vs. a whole repo). +- **Position vs `create-goal`**: `create-goal` is the template fast-path (scaffolds from frontmatter; no discovery). `launch-goal` is the rigorous front door (discovery + framing + verify + audit). Both keep their seats; pick `create-goal` when you already know the outcome, `launch-goal` when you're still framing. +- **KISS guardrail:** if a phase starts feeling like ceremony in real use, cut it. Adversarial Laziness Test + evidence shape + outcome traceability are the three load-bearing forcing functions; everything else is supporting machinery. +- **Voice:** stay terse during the interview. One question per turn, no preambles. Sharp questions, not a form. +</notes> diff --git a/docs/goal-writing.md b/docs/goal-writing.md index 3b10ad9..2db1387 100644 --- a/docs/goal-writing.md +++ b/docs/goal-writing.md @@ -65,6 +65,20 @@ When the title still describes a mechanism after one rewrite, the goal itself ma This is the goal-level form of the **problem-vs-solution** principle from `task-writing.md` — same idea, scoped to weeks-of-work surface. +### Tooling-Category Exception + +For goals under `category: tooling` whose deliverable IS the tool, an artifact-shaped title and tool-existence summary pass the sniff test even though they read as mechanism on the surface. Examples: + +| Title | Verdict | Why | +|---|---|---| +| "Multi-Provider Claude Code Proxy" | Accepted under tooling exception | The proxy IS the deliverable — naming the artifact names the outcome | +| "Goal-Writing Assistant" | Accepted under tooling exception | The assistant IS the deliverable — reaching for it by default IS the world-after-ship | +| "Release Agent - Base" | Accepted under tooling exception | The `- Base` suffix scopes the artifact to MVP (already canonical example above) | + +**Rule:** when the user's framing is `"I have a <tool> that does X"` / `"<Artifact> exists and I reach for it"`, accept it as a tooling goal — the artifact IS the outcome. **Don't bounce back to outcome-only.** Still apply Summary discipline (lead with the state-of-the-world the artifact creates, not "I am building X"), but accept the artifact noun in the title. + +When NOT to apply: goals where the tool is a means to a separate outcome (e.g. "Reduce backlog by 50%" via a tool — outcome is the backlog reduction; tool is the mechanism). Tooling category is for goals where shipping the artifact = the win. + ## Summary (First Sentence) The first sentence of the goal body is the **outcome statement** — the same rule as Title & Filename, scoped to one sentence. Tells the reader what's true when the goal is done, in plain language. The *how* (mechanism, architecture, refactor steps) belongs in `# Impact` (as an "Approach" lead paragraph) or in linked design docs — never in the opening sentence. @@ -116,7 +130,7 @@ In order: 5. `# Success Criteria` — 3-5 binary, measurable checkbox outcomes — *what we want when done* 6. `# Definition of Done` — closure steps that verify completion — *how we know we're done* (peer to Success Criteria; see [Definition of Done](#definition-of-done)) 7. `# Non-goals` — 3-7 concrete deferrals (what's out of scope; link follow-up tasks/goals) -8. `# Tasks` — 4-8 linked task pages, logical order +8. `# Tasks` — 1-8 task wikilinks (`[[Wikilink Task Title]]`, NOT bold text + description), business-value milestones in logical order. See [Tasks as Business-Value Milestones](#tasks-as-business-value-milestones). The 4-8 range is a soft cap, NOT a floor — 1 task is fine for small goals; don't pad to hit a number. 9. `# Related` — themes / sister goals / docs Optional: `# Risk Management` (appendix for high-stakes goals). @@ -198,6 +212,20 @@ Goals authored before this section existed sometimes embed closure steps INSIDE DoD is the gate, not the plan. +### Anti-pattern: soak-time DoD on personal-laptop tools + +For goals shipping personal-laptop tools the operator drives interactively, **the operator IS the runtime monitor.** Avoid time-based bake-in DoD items: + +| Soak-time (weak) | Exercise-now (strong) | Why | +|---|---|---| +| "Runs for 1 working day with no manual workarounds" | "All 4 providers reached in one Claude Code session — evidence: `[route] <provider> 200` log line per provider, same PID" | Operator notices breakage immediately; time-based bake adds no signal beyond exercise-now | +| "No regressions for a week of normal use" | "Three real goal-drafting sessions complete end-to-end with 0 MAJOR audit findings" | "Regression" requires immediate exercise to detect; a week of no-use doesn't bake anything | +| "Service runs unattended for 24h" | (acceptable only on prod services with silent-degradation risk — k8s, multi-user, trading hot path) | The operator's attention IS the heartbeat for laptop tools | + +**Rule:** for personal-laptop tools (CLI tools, slash commands, single-user daemons, scripts), prefer **exercise-now** verification ("all paths reached", "first-pass audit clean", "real-task completed end-to-end") over **time-based bake** ("runs for N hours/days without incident"). Soak-time is appropriate ONLY when silent degradation is a real risk (production services, multi-user, autonomous overnight jobs) — flag explicitly when used. + +This anti-pattern mirrors the [Adversarial Laziness Test](#adversarial-laziness-test): a soak-time DoD item passes by *not breaking*, which the laziest implementation already achieves (just don't deploy and time passes anyway). Replace with an evidence-shape SC or a concrete closure step. + ## Scope Check Before approving a goal, verify these signals: @@ -211,6 +239,130 @@ Before approving a goal, verify these signals: If 3+ smells fail → goal is over-scoped. Split into multiple goals or move items to Non-goals. +## Tasks as Business-Value Milestones + +Tasks under `# Tasks` are **business-value milestones**, not code-change slices (WBS rows). Each task delivers a *shippable improvement* — a usable state of the world. + +### Decision rule + +> If each item could be a shippable milestone → N separate tasks. +> If sequential steps within one milestone → 1 task with N inline subtasks. + +| Business-value milestone (right) | WBS slice (wrong) | Why | +|---|---|---| +| "Allow Claude Code to pass through the proxy" | "Implement config-driven routing core" | First names a shippable state ("I can use it for one provider"); second is one slice of effort | +| "Add config + other providers" | "Add four provider adapters" / "Carry over fallback semantics" | First names a usable extension; second is implementation breakdown that lives inside the task | +| "Set up project skeleton at GitHub" | "Define provider config schema for YAML" | First is foundation (explicitly framed); second is a code-change inside the foundation work | +| "Dogfood `/launch-goal` on 3 real goal ideas and log observations" | "Run launch-goal once" / "Write observations file" | First is the milestone; second is the inline subtasks | + +### Decomposition hierarchy (explicit) + +``` +Goal (file) → linked Tasks (wikilinks, separate files) → inline Subtasks (checkboxes inside the task file) +``` + +- **Goal-level tasks** are `[[Wikilinks]]` to separate task files. +- **Task-level subtasks** are checkboxes INSIDE the task file (`- [ ] …`). Atomic work units, no independent identity, no separate files. +- **Never** create sibling task files for subtask-shaped work. Don't recreate file-link hierarchy below the task level. See `task-writing.md` § Subtask Hierarchy for the task-side view. + +### Format mandate + +In the goal file body, the `# Tasks` section MUST render each task as a `[[Wikilink Task Title]]`: + +```markdown +# Tasks + +1. [[Allow Claude Code to Pass Through the Proxy]] +2. [[Add Config and Other Providers to the Proxy]] — context (→ SC2, SC3) +``` + +NOT bold text + description: + +```markdown +# Tasks + +1. **Allow Claude Code to pass through the proxy** — single-provider end-to-end… +2. **Add config and other providers** — yaml schema, mapping, adapters… +``` + +Why: Obsidian renders `[[Wikilinks]]` as clickable; clicking auto-creates the task file with the title. Bold-text + description disables the auto-create path and forces manual `/vault-cli:create-task "<title>"` invocation. The wikilink form preserves both paths. + +**Title rules** (applied at goal-write time): +- Title-Case +- No `/`, `.`, backticks, `:`, `*`, `?`, `"`, `<`, `>`, `|` (Obsidian filename rules) +- Optional one-line context after the wikilink: `1. [[Task Title]] — context (→ SC2)` + +### Foundation/skeleton work + +Tasks that enable but don't directly advance an SC are allowed when **explicitly framed** as foundation: + +```markdown +1. [[Set Up Multi-Provider Proxy Project Skeleton]] — GitHub repo, license, CI, install.sh (foundation; enables iteration) +``` + +The audit accepts this when the framing makes "foundation" explicit; otherwise it flags as orphan (task advances no SC). + +### Soft cap, not floor + +The 1-8 range is upper-bounded. **There is no minimum** — small goals can have 1 task ("Implement the proxy"). Don't pad to hit a number. If the goal genuinely has only one shippable milestone, the file has one entry under `# Tasks`. + +## Evidence Shape per Success Criterion + +Borrowed from `~/Documents/workspaces/dark-factory/docs/rules/spec-writing.md` § "Evidence Shape per Acceptance Criterion." Every Success Criterion must declare **what the operator will observe to confirm pass.** + +### Acceptable evidence shapes + +| Shape | Example phrasing in SC | +|---|---| +| Exit code | "`make precommit` exits 0" | +| Stdout / stderr match | "stdout contains `processed: 42`" | +| Log line | "log line `request_id=<uuid> status=ok`" | +| File presence | "`ls path/to/file` succeeds" | +| File content (diff / grep) | "`grep -n 'pattern' file.md` returns ≥1 line" | +| HTTP response | "`GET /api/x` returns 200 with body matching `{...}`" | +| State transition | "frontmatter `status` transitions `next → in_progress`" | +| Metric delta | "counter `foo_total{label=x}` increments by N after action" | +| Negative evidence | "`grep ERROR run.log` returns 0 lines during the test window" | +| File artifact | "task file under `tasks_dir/` exists with frontmatter `goal: [[X]]`" | + +### Bad SCs (no evidence shape) + +- ❌ "Tests pass" — what test, what assertion, evidence shape? +- ❌ "It works" / "Functionality verified" — narration, not observable +- ❌ "Code is clean" / "Performance improved" — aspirational; needs a metric + threshold +- ❌ "Documented properly" — what file, what content, grep target? + +### Good SCs (evidence shape declared) + +- ✅ "After `/vault-cli:create-task`, `cat tasks/<id>.md` shows `phase: todo` in frontmatter" +- ✅ "`kubectl -n dev logs <pod> | grep 'job spawned'` returns ≥1 match" +- ✅ "Across 3 dogfood runs, each goal's first-pass `/vault-cli:audit-goal` returns `MAJOR: 0`" + +The point isn't to inline test scripts — it's to make the SC's *observable target* unambiguous. The reader (and the operator later doing the work) should know exactly what to check. + +## Adversarial Laziness Test + +Borrowed from `dark-factory/docs/rules/spec-writing.md` § "Adversarial Laziness Test." Before approving the goal, read your Success Criteria assuming the laziest possible implementation that still ticks every box. + +> If the operator wrote `[x]` on every SC tomorrow **without doing the actual work**, would the goal feel done? + +If yes — the SCs are under-specified. + +### Examples + +| SC | Laziest "satisfaction" | Verdict | +|---|---|---| +| "`/launch-goal` works" | `echo "works" > log` | Under-specified — needs evidence shape (which paths exercised? what audit verdict?) | +| "Goal file exists" | `touch <goal>.md` | Under-specified — needs minimum content (which sections? evidence shapes in SCs?) | +| "All 4 providers reachable from one session" | Hard to fake — requires 4 distinct `[route] <provider> 200` log lines in same PID, observable | Specified well | +| "0 MAJOR findings from `/vault-cli:audit-goal`" | Audit is mechanical — can't bypass | Specified well | + +### Fix pattern + +Replace artifact-existence SCs (`<file> exists`) with **behavior** SCs (`<file> contains section X with content matching Y`). Replace narration SCs (`it works`) with **observation** SCs (`grep <pattern> returns ≥1 line`). + +This test runs at the same scope as the [Soak-Time DoD anti-pattern](#anti-pattern-soak-time-dod-on-personal-laptop-tools) — both reject "passes by not doing anything" SCs/DoDs. Soak-time is the time-based variant; laziness is the action-based variant. + ## Preflight Checklist Before approving: diff --git a/docs/task-writing.md b/docs/task-writing.md index 52717eb..26896d9 100644 --- a/docs/task-writing.md +++ b/docs/task-writing.md @@ -151,6 +151,37 @@ Adopted by parallel with goals' Non-goals convention. Explicit deferrals prevent - Each item is a *concrete* deferral or alternative - Link a follow-up task if the deferred work is real +### Subtask Hierarchy + +The decomposition hierarchy is explicit: + +``` +Goal (file) → linked Tasks (wikilinks, separate files) → inline Subtasks (checkboxes inside the task file) +``` + +- **Goal-level tasks** are `[[Wikilinks]]` to separate task files (see `goal-writing.md` § Tasks as Business-Value Milestones). +- **Task-level subtasks** are checkboxes (`- [ ] …`) INSIDE this task file's `# Tasks` section. They are **atomic work units** — small, sequential, no independent identity, no separate files. +- **Never** create sibling task files for subtask-shaped work. Don't recreate file-link hierarchy below the task level. If a subtask grows large enough to need its own file, it isn't a subtask anymore — promote it to a sibling task under the parent goal. + +**Decision rule** (same as the goal-writing side): + +> If each item could be a shippable milestone → N separate tasks under the parent goal. +> If sequential steps within one milestone → N inline subtasks inside this task file. + +**What belongs as an inline subtask:** + +- A single command or short sequence the operator can run (`make precommit`, `gh pr merge`) +- An implementation step inside one milestone (`define schema`, `wire route`, `add fallback`) +- A verification step (`run scenario on dev`, `confirm `kubectl get pod` Running`) + +**What does NOT belong as an inline subtask:** + +- A separately-shippable improvement that has its own SC trace at the goal level → it's a sibling task under the goal +- A months-long effort → it's a sibling goal, not a subtask +- Implementation breakdown for code-heavy work — that lives in a dark-factory spec referenced from the task body, not as 15 inline checkboxes + +The granularity rule under "Required sections" item 7 still applies: subtasks are session-sized work blocks (3-6 items), not single-command steps unless the command IS the meaningful unit. + ## Definition of Done Every task has TWO sides: