You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Agent-driven development workflows often use many short-lived Git worktrees for feature branches, PR review, verification, and parallel tasks. The current worktree/branch-root support from #401 makes the graph shape more explicit, but it still leaves a larger design question open: how should Codebase Memory MCP avoid treating mostly identical worktrees as fully independent indexes?
The remaining pain points are:
duplicated indexing and storage for worktrees that mostly match the base branch
project-list clutter from short-lived worktree paths
query ambiguity when an MCP client is running inside a worktree but the canonical checkout is also indexed
I would like to discuss a committed-source, file-granularity overlay model on top of the existing branch/worktree awareness:
Keep one canonical/base index for the repository's base branch.
For each branch/worktree context, compute the merge base against the configured base branch.
Index only files changed between merge-base..HEAD into an overlay context.
At query time, resolve the active Git context and merge results as:
overlay nodes/edges first
overlay tombstones for deleted files or symbols
base index for unchanged files
If a file exists in the overlay, hide that file's base graph for that context.
Treat worktree path as an attachment to a branch/HEAD context, not necessarily as the graph identity.
In other words, a repository would look conceptually like:
Project: example-repo
Base context: main @ abc123
full graph for committed main
Overlay context: feature/foo @ def456, base abc123
graph for changed committed files only
deleted-file tombstones
metadata: branch, worktree path, base SHA, head SHA
For a first implementation, I suggest indexing committed changes only:
overlay = git diff --name-status $(git merge-base HEAD <base-branch>)..HEAD
Dirty working-tree overlays could be considered later, but agents modify files quickly and frequently, so indexing uncommitted edits by default may add noise and churn.
Alternatives considered
Keep one full index per worktree
This is simple and mostly matches the existing external project model, but it duplicates unchanged source, grows storage with every short-lived agent worktree, and keeps project selection ambiguous.
Branch roots only
#401 is a useful foundation, but branch roots alone do not fully solve deduplication or overlay query semantics. They make the relationship visible; they do not yet make unchanged base files shared.
Shared content storage with separate project views
This could reduce storage while preserving the current project model, but it may still leave project-list clutter and context ambiguity unless grouped views and context-aware query resolution are added.
On-demand diff indexing only
This is attractive for temporary review worktrees, but the first query could be slower and long-lived branches probably still need persistent overlay metadata.
Questions to discuss
Should the overlay identity be branch-based, worktree-path-based, HEAD-SHA-based, or a combination?
Should v1 index only committed changes, or should it also support dirty working-tree changes behind an explicit option?
What should the default base branch be: detected default branch, configured project base, or caller-provided base_branch?
Should file-level replacement semantics be the first milestone? For example: if src/foo.c changed in the overlay, all base graph nodes from src/foo.c are hidden for that context.
How should cross-file edges be resolved when unchanged base files refer to changed overlay symbols with the same qualified name?
Should existing MCP tools infer context from the caller's current working directory, or should tools accept an explicit context/project selector?
How should search_graph, get_code_snippet, and trace_path expose whether a result came from the base index or an overlay?
How should stale overlays be garbage-collected after a worktree is deleted or a branch is rebased/force-pushed?
Should deleted files be represented as file-level tombstones only, or are node-level tombstones needed for precise symbol hiding?
Is this best developed as a sequence of small PRs, e.g. context metadata first, changed-file overlay indexing second, context-aware search third, and trace-path edge rewriting later?
Confirmations
I searched existing issues and this is not a duplicate.
What problem does this solve?
Agent-driven development workflows often use many short-lived Git worktrees for feature branches, PR review, verification, and parallel tasks. The current worktree/branch-root support from #401 makes the graph shape more explicit, but it still leaves a larger design question open: how should Codebase Memory MCP avoid treating mostly identical worktrees as fully independent indexes?
The remaining pain points are:
Related prior discussion and implementation:
.worktrees/to avoid nested duplicate indexing..git/info/excludehandling.Proposed solution
I would like to discuss a committed-source, file-granularity overlay model on top of the existing branch/worktree awareness:
merge-base..HEADinto an overlay context.In other words, a repository would look conceptually like:
For a first implementation, I suggest indexing committed changes only:
Dirty working-tree overlays could be considered later, but agents modify files quickly and frequently, so indexing uncommitted edits by default may add noise and churn.
Alternatives considered
Keep one full index per worktree
This is simple and mostly matches the existing external project model, but it duplicates unchanged source, grows storage with every short-lived agent worktree, and keeps project selection ambiguous.
Branch roots only
#401 is a useful foundation, but branch roots alone do not fully solve deduplication or overlay query semantics. They make the relationship visible; they do not yet make unchanged base files shared.
Shared content storage with separate project views
This could reduce storage while preserving the current project model, but it may still leave project-list clutter and context ambiguity unless grouped views and context-aware query resolution are added.
On-demand diff indexing only
This is attractive for temporary review worktrees, but the first query could be slower and long-lived branches probably still need persistent overlay metadata.
Questions to discuss
Should the overlay identity be branch-based, worktree-path-based, HEAD-SHA-based, or a combination?
Should v1 index only committed changes, or should it also support dirty working-tree changes behind an explicit option?
What should the default base branch be: detected default branch, configured project base, or caller-provided
base_branch?Should file-level replacement semantics be the first milestone? For example: if
src/foo.cchanged in the overlay, all base graph nodes fromsrc/foo.care hidden for that context.How should cross-file edges be resolved when unchanged base files refer to changed overlay symbols with the same qualified name?
Should existing MCP tools infer context from the caller's current working directory, or should tools accept an explicit context/project selector?
How should
search_graph,get_code_snippet, andtrace_pathexpose whether a result came from the base index or an overlay?How should stale overlays be garbage-collected after a worktree is deleted or a branch is rebased/force-pushed?
Should deleted files be represented as file-level tombstones only, or are node-level tombstones needed for precise symbol hiding?
Is this best developed as a sequence of small PRs, e.g. context metadata first, changed-file overlay indexing second, context-aware search third, and trace-path edge rewriting later?
Confirmations