Skip to content

Spec 004: Phase 2 (Project Bootstrap) end-to-end testing#109

Merged
jeremymanning merged 20 commits intomainfrom
008-phase2-project-bootstrap-testing
May 6, 2026
Merged

Spec 004: Phase 2 (Project Bootstrap) end-to-end testing#109
jeremymanning merged 20 commits intomainfrom
008-phase2-project-bootstrap-testing

Conversation

@jeremymanning
Copy link
Copy Markdown
Member

Summary

Validates Phase 2 of the llmXive pipeline end-to-end on iter2/iter3 siblings of spec 003's carry-forward projects (PROJ-261, PROJ-262), per issue #46 and sub-issue #62.

Lands four production fixes:

  • Extend sibling spawner's ALLOWED_START_STAGES to include validated (FR-003a)
  • Skip-if-exists guard on project_initializer's constitution write (FR-011 / Q3)
  • Fail-fast FileNotFoundError on missing idea file (P2-D03 / Constitution Principle V)
  • Tighten project_initializer prompt v1.0.0 → v1.1.0 to forbid external citations + HTML comments (P2-D04 P2-D05 from US2 audit)

Diagnostic

Full report at notes/2026-05-05-phase2-diagnostic.md. Carry-forward manifest at specs/004-phase2-project-bootstrap-testing/carry-forward.yaml names two iter3 siblings as input substrate for spec 005 (Phase 3 testing):

  • PROJ-261-evaluating-the-impact-of-code-duplicatio-iter3 (CS)
  • PROJ-262-predicting-molecular-dipole-moments-with-iter3 (chemistry)

Defects (all 5 fixed in-PR)

ID Severity Resolution
P2-D01 HIGH constitution write was overwrite-unconditional Fixed e8e09f7 (skip-if-exists guard)
P2-D02 HIGH spawner allowlist missing validated Fixed e5e423c
P2-D03 HIGH silent fallback on missing idea file Fixed e8e09f7 (raises FileNotFoundError); verified by US4 Scenario 2
P2-D04 MEDIUM LLM preserved template HTML comments Fixed 8f2fe48 (prompt v1.1.0); verified by iter3 audit
P2-D05 CRITICAL LLM introduced Figshare DOI as citation Fixed 8f2fe48 (prompt v1.1.0 enumerates forbidden citation forms); verified by iter3 audit

Test plan

  • All 4 tests/phase1/test_idempotency.py tests pass
  • All 11 tests/phase1/test_citation_resolver.py tests pass (regression)
  • Manual verification: each iter3 sibling's constitution passes US2 audit (6/6 contract items)
  • Manual verification: all three induced-failure scenarios produce loud + recorded failures with state unchanged
  • Manual verification: idempotency check on PROJ-261-iter3 — sha256 manifest before/after second init_speckit_in byte-equal (diff exit 0)

Per-issue acceptance verdict

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

jeremymanning and others added 12 commits May 5, 2026 21:32
#46 #62)

Spec 003 / D10 introduced the 'validated' stage AFTER the spawner was
written, so the allowlist was out-of-date. Phase 2 testing requires
spawning siblings at validated to route them to project_initializer
(per STAGE_TO_AGENT[VALIDATED] in src/llmxive/pipeline/graph.py:70).

One-line set extension; no other change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r (FR-011 Q3 P2-D03, #46 #62)

Two in-PR HIGH-defect fixes from spec 004's Phase 2 diagnostic plan:

1. Skip-if-exists guard before constitution write (FR-011 / Q3 / spec
   004 research.md Decision 2). Re-rendering a governance document
   silently mutates downstream Constitution Checks; the new guard
   matches the init_speckit_in skip-if-dir-exists pattern at
   src/llmxive/speckit/runner.py:114.

2. Fail-fast on missing idea file (P2-D03 / Constitution Principle V).
   The previous defensive `if idea_path.exists()` masked missing
   inputs and produced constitutions untethered from any idea body.
   Now raises FileNotFoundError immediately (caught by US4 induced-
   failure scenario 2 verification).

Plus: 4-test pytest harness at tests/phase1/test_idempotency.py
proving SC-009 (full .specify/ tree byte-equality after second
project_initializer invocation). All 4 tests pass in 0.08s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…001, #46 #62)

Spawned via tests/phase1/sibling_project.py at --start-stage validated
(now allowed per FR-003a / commit e5e423c). Both siblings have:
  - sha256-verified byte-identical clone of canonical's idea file
  - fresh state YAML at current_stage: validated
  - no .specify/ scaffold yet (project_initializer produces it next)

Substrate for US1 happy-path runs (T017/T018).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s (US1, #46 #62)

Both PROJ-261-iter2 and PROJ-262-iter2 advanced from validated → project_initialized
in <90s wall-clock against the real Dartmouth Chat backend (qwen.qwen3.5-122b).

PROJ-261-iter2 run:
  - run_id: e9a3dfce-8435-455f-bf7a-8e4206ffb754
  - duration: 63s (01:35:25 → 01:36:28)
  - constitution: .specify/memory/constitution.md (LLM-rendered)
  - scaffold: 4 scripts + 5 templates (mechanical via init_speckit_in)

PROJ-262-iter2 run:
  - run_id: 4a04a919-0a1c-46f9-a9a3-fab5a96200ce
  - duration: 72s (01:36:33 → 01:37:45)
  - same artifact set

Both run-log entries: outcome=success, no failure_reason. State YAMLs
both at current_stage=project_initialized.

Substrate for US2 (constitution audit) and US3 (idempotency check).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ns + HTML comments (P2-D04 P2-D05, US2 §4, #46 #62)

US2 audit on iter2 happy-path siblings surfaced two defects:

- P2-D05 (CRITICAL per spec.md SC-011): PROJ-262-iter2's constitution
  introduced an external citation — `DOI: 10.6084/m9.figshare.9981994`
  for the QM9 dataset — into Reproducibility Requirements. The prompt
  said "DO NOT introduce external citations" but didn't define what
  qualifies, and the LLM treated the DOI as a data-source identifier.
- P2-D04 (MEDIUM): PROJ-261-iter2's constitution preserved the HTML
  comment block from the constitution template explaining substitution
  tokens. The comments are scaffolding for the LLM, not content for
  the rendered document.

Prompt v1.0.0 → v1.1.0 (MINOR — adds new behavior constraints, doesn't
break the output contract):
  - Enumerate forbidden citation forms explicitly: DOIs, arXiv IDs,
    URLs, Figshare/Zenodo/OSF/HF record IDs.
  - Allow naming datasets by name without their canonical pointers.
  - Forbid HTML comment blocks in output (strip template scaffolding).

Phase 7 next: spawn iter3 siblings of both PROJ-261 and PROJ-262, re-run
project_initializer with the patched prompt, re-audit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…titutions pass full US2 audit (Phase 7, P2-D04 P2-D05 fixed, #46 #62)

After commit 8f2fe48 tightened agents/prompts/project_initializer.md
to forbid external citations + HTML comments (prompt v1.0.0 → v1.1.0),
spawned iter3 siblings of both PROJ-261 and PROJ-262, re-ran
project_initializer, re-audited:

PROJ-261-iter3 (computer science):
  6/6 contract items PASS
  - No HTML comment block (P2-D04 fixed)
  - Domain-specific principles VI (Model & Compute Integrity) + VII
    (Code Licensing & Compliance) — both well-grounded
  - Reproducibility Requirements names codeparrot/github-code
    (allowed per v1.1.0 — dataset name, not citation)

PROJ-262-iter3 (chemistry):
  6/6 contract items PASS
  - No DOI / arXiv / URL anywhere in body (P2-D05 fixed)
  - Domain-specific principles VI (Physical Consistency) + VII
    (Benchmark Integrity) — both grounded in chemistry domain
  - Reproducibility Requirements references QM9 by name only

Both run-log entries: outcome=success, prompt_version=1.1.0.
Both state YAMLs at current_stage=project_initialized.

Phase 7 iteration loop converged in 1 iteration (well under FR-005
5-cycle cap). iter3 siblings are the carry-forward candidates for
spec 005.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#62)

All three deliberate failure scenarios from US4 / Q2 clarification
exercised on dedicated sibling iters; each produced a loud + recorded
failure with state unchanged.

Scenario 1 (backend unreachable) — PROJ-261-iter4:
  - Set DARTMOUTH_CHAT_API_KEY=invalid for one orchestrator run
  - Result: every backend in chain failed (dartmouth/HF/local)
  - failure_reason quotes all three backend errors
  - State current_stage=validated (unchanged); no .specify/

Scenario 2 (idea file missing) — PROJ-262-iter4:
  - Spawned then deleted idea/<slug>.md before orchestrator
  - The new fail-fast guard (T008 / commit e8e09f7) raised
    FileNotFoundError immediately; no LLM call made
  - failure_reason: "FileNotFoundError: project_initializer requires at
    least one input (idea file path); got ctx.inputs=[]"

Scenario 3 (template file missing) — PROJ-261-iter5:
  - Renamed agents/templates/research_project_constitution.md to .bak
    for one run; restored after
  - render_prompt() raised FileNotFoundError before LLM invocation
  - State unchanged; template restored to canonical path; git clean

All three siblings marked archived_at: 2026-05-06T01:46:00Z (FR-019).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… FR-013 FR-017, #46 #62)

§ 1-8 of the diagnostic report at notes/2026-05-05-phase2-diagnostic.md
covers: inputs, agent behavior on 7 runs (4 happy-path iter2/iter3 +
3 induced-failure iter4/iter5), constitution audits (with verbatim
fail/pass per the 6-item contract), full sha256-tree idempotency
verification (US3 / SC-009 / pytest 4/4), defects table (5 P2-D## all
fixed in-PR), iteration diff for the v1.0.0→v1.1.0 prompt patch, and
carry-forward decision.

Carry-forward manifest names two iter3 siblings as the substrate for
spec 005:
  - PROJ-261-evaluating-the-impact-of-code-duplicatio-iter3 (CS)
  - PROJ-262-predicting-molecular-dipole-moments-with-iter3 (chemistry)

Both at current_stage=project_initialized; both pass the full US2
audit cleanly under prompt v1.1.0. Schema follows spec 003's manifest
with one new field per data-model.md E7 (phase2_iter2_id) recording
which iter2 sibling produced the audited constitution.

Per-issue verdict (§ 6):
  Issue #62 (project_initializer): all 3 acceptance boxes PASS
  Issue #46 (Phase 2 parent):       all 5 acceptance boxes PASS

No CRITICAL/HIGH defects remain unresolved. No follow-up issues
opened — all 5 P2-D## defects fixed in this PR (commits e5e423c,
e8e09f7, 8f2fe48).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…8a, #46 #62)

iter3 siblings (with prompt v1.1.0) carried forward instead of iter2.
Mark iter2 siblings archived per FR-019 — never deleted, just flagged
for clarity. They remain readable for spec 005 if it wants to inspect
the iter2-vs-iter3 diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ruff --fix on the two files spec 004 touched:
  - src/llmxive/agents/project_initializer.py: I001 import sort,
    UP017 datetime.timezone.utc → datetime.UTC alias.
  - tests/phase1/test_idempotency.py: I001 import sort.

Behavior unchanged. All 15 tests in tests/phase1/ still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#46 #62)

All 69 tasks in tasks.md marked [X]. spec.md Status: Draft → In Review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commits artifacts the spec-kit workflow generated but never auto-staged:
  - plan.md, research.md, data-model.md, quickstart.md, 4× contracts/,
    requirements.md (the spec-004 design substrate)
  - CLAUDE.md SPECKIT plan reference updated 003 → 004
  - .history.jsonl run-history files (one per iter project)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jeremymanning and others added 4 commits May 5, 2026 21:53
… require principle grounding (P2-D06, #46 #62)

Deep audit re-check on iter3 surfaced a MEDIUM defect missed in the
shallow audit: PROJ-261-iter3's added Principle VII "Code Licensing &
Compliance" claims things about GPL / restrictive licensing that have
NO basis in the project's idea body (which is about clone density vs
LLM perplexity, not licensing). The prompt's "Adapt Core Principles
to the specific research domain" instruction permitted the LLM to
extrapolate too freely and invent generic-good-practice principles
that don't govern the actual research scope.

Prompt v1.1.0 → v1.2.0 (MINOR): added explicit grounding requirement —
each new principle must trace claims back to specific idea-body
sections (Methodology / Expected results / Motivation / Research
question). Forbid fabrication of generic-good-practice principles
(licensing, deployment, maintenance) that don't address the project's
specific research. Require new principles to reference idea-body's
named datasets/models/methods when codifying domain norms.

Phase 7 next: spawn iter4 siblings, verify both projects' VI/VII
principles are grounded in their idea bodies.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… deep audit (Phase 7 round 2, P2-D06 fixed, #46 #62)

Deep re-audit on iter3 (after merge of v1.1.0 patch) surfaced a
MEDIUM defect missed in the original audit: PROJ-261-iter3's added
Principle VII "Code Licensing & Compliance" claimed things about GPL
that have no basis in the project's idea body. The v1.1.0 prompt
allowed too-liberal extrapolation; v1.2.0 added explicit grounding
requirements (every claim must trace to a specific idea-body section).

iter6 re-runs (with v1.2.0 prompt) produce dramatically improved
output:

PROJ-261-iter6:
  - VI "Statistical Correlation Integrity" — grounds in idea
    Methodology + Expected results (p < 0.05 threshold, Spearman's
    rank correlation)
  - VII "Clone Detection Consistency" — grounds in idea Methodology
    (AST-based clone detector, codeparrot/github-code subset)
  - No fabricated principles. No license/compliance fabrication.

PROJ-262-iter6:
  - VI "3D Geometry Preservation" — grounds in idea Methodology
    sketch + Expected results, with explicit "This principle is
    grounded in..." annotations citing specific idea sections
  - VII "Chemical Interpretability" — grounds in idea Research
    question + Motivation with quoted text
  - LLM internalized v1.2.0 instruction beautifully (auto-included
    grounding annotations in the constitution body).

Regression checks all pass:
  - No {{token}} leaks
  - No DOI / arXiv / URL citations
  - No HTML comments
  - All 4 inherited principles (I-IV) byte-identical to template
  - Principle V differs only in substituted project_id (expected)

Quality monitoring: iter3 → iter6 strictly IMPROVED. No regression.
1 iteration cycle to converge on the new defect (well under FR-005
5-cycle cap; total cycles for this spec: 2).

iter3 siblings now archived (superseded by iter6); iter6 are the
carry-forward candidates for spec 005.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…6 selection (P2-D06, #46 #62)

Diagnostic report:
  - § 4: P2-D06 added (MEDIUM, fixed in commit 7c5cc08)
  - § 5: round-2 iteration diff (v1.1.0 → v1.2.0)
  - § 3.6 + § 3.7: deep audit subsections for iter6 PROJ-261 + PROJ-262
  - § 8: re-selection of iter6 siblings as carry-forward (was iter3)

carry-forward.yaml now names iter6 siblings:
  - PROJ-261-evaluating-the-impact-of-code-duplicatio-iter6
  - PROJ-262-predicting-molecular-dipole-moments-with-iter6
Each with project_initializer iterations: 3 (iter2 + iter3 + iter6).

Quality monitoring: strictly monotone improvement across iter2 →
iter3 → iter6. No regressions. Total Phase 7 cycles for spec 004:
2 (well under FR-005 5-cycle cap).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jeremymanning
Copy link
Copy Markdown
Member Author

Phase 7 round 2 complete (P2-D06 fixed)

After the user's request for high-quality verification, a deep re-audit on iter3 surfaced one more defect:

  • P2-D06 (MEDIUM): PROJ-261-iter3's added principle "Code Licensing & Compliance" had no basis in the project's idea body (which is about clone density vs LLM perplexity, not licensing). Fabricated grounding.

Patched the project_initializer prompt v1.1.0 → v1.2.0 to require explicit principle-grounding (every claim must trace to a specific idea-body section). Spawned iter6 siblings, re-ran:

  • PROJ-261-iter6: VI "Statistical Correlation Integrity" + VII "Clone Detection Consistency" — both grounded in idea's Methodology + Expected results.
  • PROJ-262-iter6: VI "3D Geometry Preservation" + VII "Chemical Interpretability" — LLM included explicit "This principle is grounded in..." annotations directly in the constitution body, citing specific idea sections.

Carry-forward manifest updated to point to iter6. Phase 7 total: 2 iteration cycles, strictly monotone quality improvement, no regressions. All 15 pytest tests still pass.

Latest commit: 5f72de2.

…ng forward (#46 #62)

Removes the proliferating PROJ-NNN-<slug>-iterN sibling directories
(8 of them across PROJ-261 and PROJ-262 from spec 004's iteration
loops + induced-failure scenarios) and promotes iter6's audited
constitutions onto the canonical paths in place.

Convention change documented at notes/2026-05-06-iteration-convention-change.md:
  - Iterate in place on canonical PROJ-NNN-<slug>/.
  - Use git commits + log notes to track iteration trail (one commit
    per iteration with a descriptive message).
  - Diagnostic reports gain a § 5 "Iteration log" indexing commits
    by phase + agent + iteration number.

What's removed:
  - All 9 PROJ-261-...-iter[2-6] and PROJ-262-...-iter[2-4,6] dirs
  - All state/projects/PROJ-26*-iter*.yaml files
  - All state/projects/PROJ-26*-iter*.history.jsonl files
  Run-log JSONL entries kept in state/run-log/2026-05/ as historical
  audit evidence.

What's promoted:
  - projects/PROJ-261-evaluating-the-impact-of-code-duplicatio/.specify/memory/constitution.md
    ← copy of iter6 audited content, project_id substituted to canonical
  - projects/PROJ-262-predicting-molecular-dipole-moments-with/.specify/memory/constitution.md
    ← same

What's preserved:
  - tests/phase1/sibling_project.py (deprecation banner added; spec
    003's historical reproducibility holds)
  - All phase2/spec-004 commits (the iteration trail v1.0.0 → v1.1.0
    → v1.2.0 remains browsable via git log on the prompt path)
  - spec 003 + spec 004 diagnostic reports' historical references
    to siblings (they describe historical state)

Verification:
  - find projects/PROJ-26*-iter* → empty
  - ls state/projects/ | grep iter → empty
  - pytest tests/phase1/ → 15/15 pass
  - canonical constitutions hold the audited iter6 content with
    -iter6 stripped from substituted project_ids

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jeremymanning
Copy link
Copy Markdown
Member Author

Convention change: sibling-iter directories retired in favor of in-place iteration

After review, the -iterN proliferation produced very messy project trees: spec 004's testing alone left 8 sibling directories across just 2 carry-forward projects, plus duplicate state YAMLs, duplicate .specify/ scaffolds, and history files for each. Latest commit (30aa5a8) cleans this up:

Removed: all PROJ-261-...-iterN/ and PROJ-262-...-iterN/ directories + their state files. Run-log entries kept in state/run-log/ as audit evidence.

Promoted: iter6's audited constitution copied in place onto each canonical (projects/PROJ-NNN-<slug>/.specify/memory/constitution.md), with the -iter6 suffix stripped from substituted project_id references.

Going forward: future phase-test specs (005+) iterate in place on canonicals, with git commits + log notes as the iteration trail. Sibling spawner deprecated (banner added; preserved for spec 003's historical reproducibility).

Documented at notes/2026-05-06-iteration-convention-change.md. Diagnostic report and carry-forward manifest both updated to reference canonicals.

Verification: find projects/PROJ-26*-iter* → empty; pytest tests/phase1/ → 15/15 PASS; canonical constitutions hold audited iter6 content.

jeremymanning and others added 2 commits May 6, 2026 08:04
…ROJ-261/PROJ-262 (Q1B Q3A, #46 #62)

Two duplicate PROJ-NNN groups existed on main from concurrent cron
runs racing in cli._cmd_brainstorm:
  - PROJ-261: evaluating-... (carry-forward) + investigating-...
  - PROJ-262: predicting-... (carry-forward) + quantifying-...

Q1B (race-condition fix):

  New src/llmxive/state/project_id_lock.py provides:
    - project_id_lock(repo_root): exclusive fcntl.flock on
      state/.brainstorm.lock; held only during disk-snapshot + state-
      YAML write (microseconds, NOT during LLM call).
    - next_available_proj_num(repo_root, starting_num=1): scans
      state/projects/ AND projects/ for used PROJ-NNN slots; returns
      smallest free n. Handles -iterN historic suffixes correctly.

  cli._cmd_brainstorm now wraps the per-seed allocation in the lock
  and writes the state YAML eagerly inside the lock as the atomic
  ID claim. The LLM call happens BEFORE the lock; the body write
  happens AFTER (with the ID already claimed).

  8 new tests at tests/phase1/test_project_id_lock.py including a
  real os.fork() concurrent-allocation test that proves two
  children racing for the lock produce DISTINCT project numbers.

Q3A (cleanup):

  Renamed the non-carry-forward duplicates to next-free IDs:
    PROJ-261-investigating-... → PROJ-331-investigating-...
    PROJ-262-quantifying-...   → PROJ-332-quantifying-...

  Carry-forward projects (PROJ-261-evaluating-, PROJ-262-predicting-)
  keep their numbers since spec 003 + spec 004 + carry-forward
  manifests + the parent issue all reference them.

  Updated 5 file groups: project dirs, state YAMLs (id field),
  history JSONLs, web/data/projects.json, run-log JSONL entries.

Verification:
  - grep -rn "PROJ-261-investigating|PROJ-262-quantifying" → 0 matches
  - pytest tests/phase1/ → 23/23 PASS (12.2s, no regression)
  - all 4 PROJ-26[12] / PROJ-33[12] dirs verified unique on disk

Documented at notes/2026-05-06-project-id-numbering-fix.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#46 #62)

Captures the user's design proposal from the spec 004 review:
  - Single canonical librarian agent for all literature search +
    citation verification (Constitution Principle I — single source
    of truth)
  - Multi-step expanded-term search when initial results are thin
    (<5 verified citations); brainstorm 10-20 alt terms, iterate,
    accumulate, log expanded trail to idea.md
  - Re-validate flesh_out + research_question_validator from spec 003
    once the librarian is in place

3 user stories (P1 each), ~4-5 days estimated effort. 5 open design
questions captured for the next /speckit-clarify pass.

This is a HANDOFF NOTE only; the actual spec 005 directory will be
created by /speckit-specify when the user starts the next session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jeremymanning
Copy link
Copy Markdown
Member Author

Project-ID race fixed + duplicates renamed (Q1B + Q3A complete)

After review, the duplicate PROJ-261 / PROJ-262 issue had a real concurrency root cause in cli._cmd_brainstorm. Latest commits (9820567):

Q1B: new src/llmxive/state/project_id_lock.py with fcntl.flock-protected atomic project-ID allocation. cli._cmd_brainstorm now wraps the per-seed allocation in the lock and writes the state YAML eagerly inside the lock as the atomic ID claim. 8 new tests including a real os.fork()-based concurrent-allocation test that proves the lock prevents the race.

Q3A: renamed the non-carry-forward duplicates to next-free IDs (PROJ-331 + PROJ-332). Carry-forward projects keep their numbers since spec 003 + spec 004 + manifests + tracker all reference them.

Spec 005 handoff note: notes/2026-05-06-spec-005-librarian-outline.md outlines the librarian agent + Phase 1 re-validation work that the user proposed during this review. It will be a separate PR / spec.

Tests: 23/23 PASS (15 prior + 8 new lock tests, 12.2s, no regression). Latest commit: f72c5e2 on this branch.

…ne_e2e.py (#46 #62)

Test expected `flesh_out_complete → project_initialized` after one step,
but spec 003 / D10 inserted research_question_validator between those
two stages. The next dispatch from FLESH_OUT_COMPLETE is now the
validator, which has four legitimate verdicts:
  - VALIDATED            (question passed)
  - VALIDATOR_REVISE     → FLESH_OUT_IN_PROGRESS
  - VALIDATOR_REJECTED   → BRAINSTORMED
  - HUMAN_INPUT_NEEDED

The synthetic smoke fixture has no real research question (just a
title + empty idea), so the validator correctly rejects it to
BRAINSTORMED — which the assertion now allows.

This was caught by CI on PR #109 against spec 004; fix is one
test-assertion change, no production-code shift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jeremymanning jeremymanning merged commit a00b01e into main May 6, 2026
1 check passed
@jeremymanning jeremymanning deleted the 008-phase2-project-bootstrap-testing branch May 6, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant