Skip to content

Feature request: worktree overlay indexing for branch-aware agent workflows #573

Description

@pcristin

What problem does this solve?

Agent-driven development workflows often use many short-lived Git worktrees for feature branches, PR review, verification, and parallel tasks. The current worktree/branch-root support from #401 makes the graph shape more explicit, but it still leaves a larger design question open: how should Codebase Memory MCP avoid treating mostly identical worktrees as fully independent indexes?

The remaining pain points are:

  • duplicated indexing and storage for worktrees that mostly match the base branch
  • project-list clutter from short-lived worktree paths
  • query ambiguity when an MCP client is running inside a worktree but the canonical checkout is also indexed
  • stale project entries after temporary worktrees are removed, related to Feature Request: Automatic deletion of project when underlying folder is deleted #286
  • missing or stale branch-specific context if only the canonical checkout is considered
  • unnecessary churn if every transient agent edit triggers a full or near-full reindex

Related prior discussion and implementation:

Proposed solution

I would like to discuss a committed-source, file-granularity overlay model on top of the existing branch/worktree awareness:

  1. Keep one canonical/base index for the repository's base branch.
  2. For each branch/worktree context, compute the merge base against the configured base branch.
  3. Index only files changed between merge-base..HEAD into an overlay context.
  4. At query time, resolve the active Git context and merge results as:
    • overlay nodes/edges first
    • overlay tombstones for deleted files or symbols
    • base index for unchanged files
  5. If a file exists in the overlay, hide that file's base graph for that context.
  6. Treat worktree path as an attachment to a branch/HEAD context, not necessarily as the graph identity.

In other words, a repository would look conceptually like:

Project: example-repo
  Base context: main @ abc123
    full graph for committed main

  Overlay context: feature/foo @ def456, base abc123
    graph for changed committed files only
    deleted-file tombstones
    metadata: branch, worktree path, base SHA, head SHA

For a first implementation, I suggest indexing committed changes only:

overlay = git diff --name-status $(git merge-base HEAD <base-branch>)..HEAD

Dirty working-tree overlays could be considered later, but agents modify files quickly and frequently, so indexing uncommitted edits by default may add noise and churn.

Alternatives considered

Keep one full index per worktree

This is simple and mostly matches the existing external project model, but it duplicates unchanged source, grows storage with every short-lived agent worktree, and keeps project selection ambiguous.

Branch roots only

#401 is a useful foundation, but branch roots alone do not fully solve deduplication or overlay query semantics. They make the relationship visible; they do not yet make unchanged base files shared.

Shared content storage with separate project views

This could reduce storage while preserving the current project model, but it may still leave project-list clutter and context ambiguity unless grouped views and context-aware query resolution are added.

On-demand diff indexing only

This is attractive for temporary review worktrees, but the first query could be slower and long-lived branches probably still need persistent overlay metadata.

Questions to discuss

  1. Should the overlay identity be branch-based, worktree-path-based, HEAD-SHA-based, or a combination?

  2. Should v1 index only committed changes, or should it also support dirty working-tree changes behind an explicit option?

  3. What should the default base branch be: detected default branch, configured project base, or caller-provided base_branch?

  4. Should file-level replacement semantics be the first milestone? For example: if src/foo.c changed in the overlay, all base graph nodes from src/foo.c are hidden for that context.

  5. How should cross-file edges be resolved when unchanged base files refer to changed overlay symbols with the same qualified name?

  6. Should existing MCP tools infer context from the caller's current working directory, or should tools accept an explicit context/project selector?

  7. How should search_graph, get_code_snippet, and trace_path expose whether a result came from the base index or an overlay?

  8. How should stale overlays be garbage-collected after a worktree is deleted or a branch is rebased/force-pushed?

  9. Should deleted files be represented as file-level tombstones only, or are node-level tombstones needed for precise symbol hiding?

  10. Is this best developed as a sequence of small PRs, e.g. context metadata first, changed-file overlay indexing second, context-aware search third, and trace-path edge rewriting later?

Confirmations

  • I searched existing issues and this is not a duplicate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions