[REFACTOR] Replace in-tree cache_mem with CacheSeek integration by yJader · Pull Request #4 · Tele-AI/TeleFuser

yJader · 2026-07-01T13:53:58Z

Co-authored-by: @yx0716

Description

This PR replaces TeleFuser's in-tree latent cache implementation with an optional CacheSeek integration path. It wires CacheSeek into the service container, task service, CLI flags, Wan2.2 service examples, and LingBot World Fast world-KV hooks while keeping latent cache disabled unless explicitly requested.

Motivation

Cross-request latent/KV reuse is now owned by CacheSeek instead of TeleFuser-local cache_mem code. TeleFuser should depend on CacheSeek only when the feature is enabled, fail clearly when CacheSeek is missing, and keep the default import/runtime path lightweight when latent cache is disabled.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Performance improvement
Code refactoring
Documentation update
Other (please describe):

Changes Made

Removed the in-tree telefuser/cache_mem implementation and related unit tests.
Added lazy CacheSeek service initialization in TeleFuser service container and task service code.
Added/updated latent cache CLI and server config plumbing, including direct failure when latent cache is enabled but CacheSeek is unavailable.
Updated Wan2.2 T2V service examples for CacheSeek-backed latent cache and added a nocache service example.
Added LingBot World Fast world_kv_binding runtime hooks so CacheSeek exact-prefix reuse can fast-forward cached chunks.
Updated English and Chinese latent cache docs with CacheSeek usage and the CacheSeek GitHub link: https://github.com/Tele-AI/CacheSeek.

Testing

Unit tests pass (pytest tests/)
Manual testing performed
Benchmarks added/updated (if applicable)

Test commands:

# TeleFuser targeted latent-cache/service tests.
python -m pytest \
  tests/unit/service/test_latent_cache_cli.py \
  tests/unit/service/test_latent_cache_task_service.py \
  tests/unit/pipelines/wan_video/test_service_examples.py \
  tests/unit/pipelines/wan_video/test_latent_data_utils.py -q
# Result: 15 passed

# Real Wan2.2 service e2e without latent cache.
# Model: Wan2.2-T2V-A14B
# Config: num_inference_steps=2, num_frames=5, resolution=480p, parallelism=1.
# Result: completed; non-empty mp4 generated.

# Real Wan2.2 service e2e with CacheSeek enabled.
# Model: Wan2.2-T2V-A14B
# Cache mode: read_write
# Result: task_status=completed; non-empty mp4 generated.
# Cache evidence: audit log contains lookup_hit skip_step=1 and save_stored.

# Real LingBot World Fast exact-prefix e2e through TeleFuser world_kv hooks.
# Note: uses CacheSeek as an external dependency; this PR only includes TeleFuser-side hooks.
export CUDA_VISIBLE_DEVICES=0,1
export LINGBOT_WORLD_CHECKPOINT_DIR=<lingbot-world-fast-checkpoint-root>
export WORLDKV_REPO_ROOTS=<telefuser-repo>:<cacheseek-repo>
export PYTHONPATH=<telefuser-repo>:<cacheseek-repo>:${PYTHONPATH:-}
cd <cacheseek-repo>
python \
  examples/exact_prefix_reuse/e2e_telefuser_lingbot.py \
  --frame-num 13 \
  --prefix-chunks 1 \
  --out-dir <output-dir> \
  --image-path <lingbot-example-image> \
  --action-path <lingbot-example-action-dir> \
  --aux-device cuda:1 \
  --no-save-videos
# Result: all_pass=true; fast_forward_k A=0, B=1, C=1, D=0.

Additional validation notes:

Wan2.2 CacheSeek audit log contained lookup_hit skip_step=1 and save_stored.
LingBot e2e manifest reported all_pass=true.
LingBot e2e log reported world_kv: fast-forward 1 chunks (decode-only).
GPUs were checked after e2e runs and had no remaining compute processes.

Checklist

Code follows the project's coding standards (ruff)
Pre-commit hooks pass (pre-commit run --all-files)
All tests pass (pytest tests/)
New tests added for new functionality
Documentation updated (README, CLAUDE.md, docstrings)
Commit messages are clear and descriptive
PR title follows the convention: [TYPE] Brief description

Related Issues

N/A

Additional Notes

This PR is scoped to the TeleFuser-side CacheSeek adaptation. It does not include CacheSeek repository changes. The LingBot e2e command above exercises CacheSeek as an external dependency to verify that the TeleFuser world_kv_binding hooks are usable end to end.

GPU Architecture Support

SM80 (Ampere, Ada Lovelace)
SM90 (Hopper H100)
SM100+ (Blackwell)

No kernel-specific code was changed. Real e2e validation ran on NVIDIA H100.

Performance Impact

No kernel-level performance change is intended. CacheSeek reuse can reduce repeated work when enabled. The LingBot exact-prefix smoke e2e showed functional reuse:

A cold run: 7.456s
B full hit: 1.519s
C prefix hit: 1.494s
D cold fork reference: 4.628s

These are smoke e2e timings on H100 and should not be treated as a formal benchmark.

Replace the in-tree telefuser/cache_mem cache with cacheseek as the cross-request cache middleware. - service (container/task_service/api_server): build and drive (CacheService, TeleFuserCacheAdapter); per request build_query -> lookup -> apply_resume -> on_response -> save - lingbot_world_fast: world_kv hooks (on_runtime_created / on_chunk_finalized) + decode-only fast path for exact-prefix KV reuse; enable rolling KV window (local_attn_size=7, sink_size=3) - remove legacy telefuser/cache_mem + service/cache/cache_factory| cache_service and the cache_mem unit tests - pin torch==2.7.0 + torchvision==0.22.0 - docs: update latent_cache (en/zh)

…arch-v2) arch-v2 退役了 cacheseek.core，CacheConfig 现从顶层 `cacheseek` 导出。cache 与 nocache 两个 wan22 T2V service 入口仍 import arch-v1 的 cacheseek.core.config，导致 cacheseek approximate-reuse e2e 在服务启动期 ModuleNotFoundError 崩溃。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

yx0716 and others added 5 commits June 18, 2026 15:50

fix: prepare cacheseek integration for upstream

26bb897

docs: update latent cache documentation for CacheSeek integration

252e6f1

style: apply ruff format

38379f7

yJader marked this pull request as ready for review July 1, 2026 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[REFACTOR] Replace in-tree cache_mem with CacheSeek integration#4

[REFACTOR] Replace in-tree cache_mem with CacheSeek integration#4
yJader wants to merge 5 commits into
Tele-AI:mainfrom
yJader:refactor/cacheseek-adapt

yJader commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yJader commented Jul 1, 2026

Description

Motivation

Type of Change

Changes Made

Testing

Checklist

Related Issues

Additional Notes

GPU Architecture Support

Performance Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants