[REFACTOR] Replace in-tree cache_mem with CacheSeek integration#4
Open
yJader wants to merge 5 commits into
Open
[REFACTOR] Replace in-tree cache_mem with CacheSeek integration#4yJader wants to merge 5 commits into
yJader wants to merge 5 commits into
Conversation
Replace the in-tree telefuser/cache_mem cache with cacheseek as the cross-request cache middleware. - service (container/task_service/api_server): build and drive (CacheService, TeleFuserCacheAdapter); per request build_query -> lookup -> apply_resume -> on_response -> save - lingbot_world_fast: world_kv hooks (on_runtime_created / on_chunk_finalized) + decode-only fast path for exact-prefix KV reuse; enable rolling KV window (local_attn_size=7, sink_size=3) - remove legacy telefuser/cache_mem + service/cache/cache_factory| cache_service and the cache_mem unit tests - pin torch==2.7.0 + torchvision==0.22.0 - docs: update latent_cache (en/zh)
…arch-v2) arch-v2 退役了 cacheseek.core,CacheConfig 现从顶层 `cacheseek` 导出。cache 与 nocache 两个 wan22 T2V service 入口仍 import arch-v1 的 cacheseek.core.config, 导致 cacheseek approximate-reuse e2e 在服务启动期 ModuleNotFoundError 崩溃。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Co-authored-by: @yx0716
Description
This PR replaces TeleFuser's in-tree latent cache implementation with an optional CacheSeek integration path. It wires CacheSeek into the service container, task service, CLI flags, Wan2.2 service examples, and LingBot World Fast world-KV hooks while keeping latent cache disabled unless explicitly requested.
Motivation
Cross-request latent/KV reuse is now owned by CacheSeek instead of TeleFuser-local
cache_memcode. TeleFuser should depend on CacheSeek only when the feature is enabled, fail clearly when CacheSeek is missing, and keep the default import/runtime path lightweight when latent cache is disabled.Type of Change
Changes Made
telefuser/cache_memimplementation and related unit tests.world_kv_bindingruntime hooks so CacheSeek exact-prefix reuse can fast-forward cached chunks.https://github.com/Tele-AI/CacheSeek.Testing
pytest tests/)Test commands:
Additional validation notes:
lookup_hit skip_step=1andsave_stored.all_pass=true.world_kv: fast-forward 1 chunks (decode-only).Checklist
ruff)pre-commit run --all-files)pytest tests/)[TYPE] Brief descriptionRelated Issues
N/A
Additional Notes
This PR is scoped to the TeleFuser-side CacheSeek adaptation. It does not include CacheSeek repository changes. The LingBot e2e command above exercises CacheSeek as an external dependency to verify that the TeleFuser
world_kv_bindinghooks are usable end to end.GPU Architecture Support
No kernel-specific code was changed. Real e2e validation ran on NVIDIA H100.
Performance Impact
No kernel-level performance change is intended. CacheSeek reuse can reduce repeated work when enabled. The LingBot exact-prefix smoke e2e showed functional reuse:
7.456s1.519s1.494s4.628sThese are smoke e2e timings on H100 and should not be treated as a formal benchmark.