Skip to content

Comments

Add size limits to prevent context overflow in large repos#19

Merged
JordanCoin merged 4 commits intomainfrom
fix/context-size-limits
Feb 19, 2026
Merged

Add size limits to prevent context overflow in large repos#19
JordanCoin merged 4 commits intomainfrom
fix/context-size-limits

Conversation

@JordanCoin
Copy link
Owner

@JordanCoin JordanCoin commented Jan 29, 2026

Summary

This PR keeps codemap hooks/MCP useful while making them safe on large repos.

Existing fix (kept)

  • Session-start hook uses adaptive depth based on repo size:
    • >5000 files -> depth 2
    • >2000 files -> depth 3
    • otherwise depth 4
  • Hook + MCP get_structure enforce 60KB max output (~15k tokens, <10% of context)
  • Truncation is clean at line boundaries with a helpful message

Additional hardening in this PR

  • Added shared limits package (limits/) so hook + MCP use one source of truth for:
    • output budget (60KB)
    • adaptive depth thresholds
  • Daemon now skips dependency graph build on very large repos (>5000 files) to avoid startup CPU/memory spikes
  • Daemon now always writes .codemap/state.json (even when dep graph is unavailable), so hooks still get file_count + session event context
  • ReadState now accepts stale state if daemon PID is still alive (avoids unnecessary expensive rescans after idle periods)
  • Hook hub lookup no longer falls back to heavy fresh graph builds when daemon is already running but dep data is unavailable
  • Session-start now:
    • waits briefly for daemon state (small warmup window)
    • uses conservative depth when file count is unknown
    • uses lightweight git diff output for large/unknown repos instead of full codemap --diff
  • MCP get_structure now:
    • accepts optional depth
    • defaults to adaptive depth when omitted
    • reuses daemon hub data when available
    • skips expensive hub analysis on very large repos and prints a note

Problem

Large repos (10k+ files) could produce massive startup output and expensive repeated analysis:

  • Hook output goes into Claude “Messages” context (competes directly with conversation history)
  • Large tree output could consume context immediately
  • Repeated fallback graph scans could be expensive when daemon state was missing/stale

Why this matters

Hooks need to be context-aware and resource-aware:

  • bounded output
  • graceful degradation on large repos
  • avoid repeated heavy computation in background hook flows

Test results

  • go test ./... passes
  • Stress-tested on synthetic ~10k-file repo:
    • session-start output remained small (no runaway output)
    • daemon state remained usable after idle
    • pre-edit hook significantly faster with daemon vs no daemon

Behavioral note

On very large repos, hub/dependency analysis is intentionally deferred in hook/session-start flows unless explicitly requested via dedicated tooling (get_hubs, etc.).

JordanCoin and others added 4 commits January 29, 2026 01:49
- Session-start hook now uses adaptive depth based on repo size:
  - >5000 files: depth 2
  - >2000 files: depth 3
  - Otherwise: depth 4
- Both hook and MCP get_structure enforce 60KB max output (~15k tokens)
- Truncates cleanly at line boundaries with helpful message
- Prevents consuming >10% of LLM context window

Fixes issue where 10k+ file repos (like Rails monoliths) would output
1.3MB+ of tree structure, overwhelming Claude Code's context.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Hook output goes directly into Claude's "Messages" context, not system
prompt. This means hook output competes with conversation history for
the ~200k token limit. A 1.3MB output (like a full tree of a 10k file
repo) equals ~500k tokens, causing instant context overflow.

The size limits (adaptive depth + 60KB cap) are critical safeguards.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@JordanCoin JordanCoin merged commit 4e18220 into main Feb 19, 2026
12 checks passed
@JordanCoin JordanCoin deleted the fix/context-size-limits branch February 19, 2026 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant