Skip to content

feat(stablehlo): add KV cache I/O rewriting to program emitter#74

Merged
dndungu merged 2 commits intomainfrom
p3b-kvcache
Apr 4, 2026
Merged

feat(stablehlo): add KV cache I/O rewriting to program emitter#74
dndungu merged 2 commits intomainfrom
p3b-kvcache

Conversation

@dndungu
Copy link
Copy Markdown
Contributor

@dndungu dndungu commented Apr 4, 2026

Phase 3b PJRT pipeline: KV cache rewriting.

dndungu added 2 commits April 2, 2026 15:57
Cache key is SHA256(stablehlo_mlir + platform_name). Serialized PJRT
executables are stored to $ZERFOO_PJRT_CACHE or ~/.cache/zerfoo/pjrt/.
LRU eviction kicks in when total size exceeds configurable max (default
2 GB). Atomic writes via tmp+rename. Thread-safe via sync.Mutex.

Implements T64.1.1 and T64.1.2.
PJRT is pure-functional and cannot handle mutable state. The graph's
StatefulInputNode KV cache feedback must be rewritten as explicit
function I/O: KV cache tensors become both function arguments and
return values.

Add KVCacheSlot type and EmitKVCacheProgram function that:
- Adds KV cache inputs as extra function arguments
- Adds KV cache outputs as extra return values (tuple return)
- For decode programs, emits stablehlo.concatenate to append new KV
  step along the sequence axis
- For prefill programs, passes KV outputs through directly

Implements T61.3.2 from plan-pjrt.md.
@dndungu dndungu merged commit c8db036 into main Apr 4, 2026
1 check failed
@dndungu dndungu deleted the p3b-kvcache branch April 4, 2026 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant