Skip to content

[Feature]: verify that [X]-marked tasks were actually implemented #1865

@davesharpe13

Description

@davesharpe13

Problem Statement

The completed task list is not cross-referenced against the actual codebase after /speckit.implement.

After ~770 structured tasks using spec-kit /implement were completed, I found 2 tasks marked [X] complete where the corresponding code was not completed. Each was discovered only when the feature misbehaved during testing. The point is not to review the implementation, but to verify the tasks were ever really done.

The two examples:

  1. A task specified "Create outcome classification types in outcomeTypes.ts." The file was never created. The types were silently folded into a different module.
  2. A task specified "Modify patternDetector.ts to filter out turns with operationalFailures." The file was never modified. The filtering was applied elsewhere but never extended to pattern detection.

I'm calling these phantom completions: the agent marks a task done because the [X] token is the statistically favored continuation in a list of completed tasks, not because it verified the work completed in the filesystem.

Proposed Solution

Similar to analyze, an optional command after implement to /speckit.verify-tasks. The command would verify [X]-marked tasks for file existence, git diffs, and expected patterns in the codebase, flagging gaps before they reach code review or production. Produce a verify-tasks-report.md in the specs branch directory, and iterates the report to Fix, Skip, or Investigate any unverified completed tasks.

I'm building this as a spec-kit extension (https://github.com/datastone-inc/speckit-verify-tasks). Detailed writeup of the Phantom Creation phenomenon and its causes: (https://datastone.ca/blog/task-phantom-completions-ai-assisted-development/).

Alternatives Considered

No response

Component

Specify CLI (initialization, commands)

AI Agent (if applicable)

None

Use Cases

Valuable for improving the quality of long running projects with lots of tasks being implemented.

Acceptance Criteria

The [X] completed tasks are verified as backed by real implementation.

Additional Context

There are some other issues and extensions "circling" this issue, but I think this feature drills down to a specific, real gap:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions