Skip to content

[REFACTOR] Replace in-tree cache_mem with CacheSeek integration#4

Open
yJader wants to merge 5 commits into
Tele-AI:mainfrom
yJader:refactor/cacheseek-adapt
Open

[REFACTOR] Replace in-tree cache_mem with CacheSeek integration#4
yJader wants to merge 5 commits into
Tele-AI:mainfrom
yJader:refactor/cacheseek-adapt

Conversation

@yJader

@yJader yJader commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Co-authored-by: @yx0716

Description

This PR replaces TeleFuser's in-tree latent cache implementation with an optional CacheSeek integration path. It wires CacheSeek into the service container, task service, CLI flags, Wan2.2 service examples, and LingBot World Fast world-KV hooks while keeping latent cache disabled unless explicitly requested.

Motivation

Cross-request latent/KV reuse is now owned by CacheSeek instead of TeleFuser-local cache_mem code. TeleFuser should depend on CacheSeek only when the feature is enabled, fail clearly when CacheSeek is missing, and keep the default import/runtime path lightweight when latent cache is disabled.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Performance improvement
  • Code refactoring
  • Documentation update
  • Other (please describe):

Changes Made

  • Removed the in-tree telefuser/cache_mem implementation and related unit tests.
  • Added lazy CacheSeek service initialization in TeleFuser service container and task service code.
  • Added/updated latent cache CLI and server config plumbing, including direct failure when latent cache is enabled but CacheSeek is unavailable.
  • Updated Wan2.2 T2V service examples for CacheSeek-backed latent cache and added a nocache service example.
  • Added LingBot World Fast world_kv_binding runtime hooks so CacheSeek exact-prefix reuse can fast-forward cached chunks.
  • Updated English and Chinese latent cache docs with CacheSeek usage and the CacheSeek GitHub link: https://github.com/Tele-AI/CacheSeek.

Testing

  • Unit tests pass (pytest tests/)
  • Manual testing performed
  • Benchmarks added/updated (if applicable)

Test commands:

# TeleFuser targeted latent-cache/service tests.
python -m pytest \
  tests/unit/service/test_latent_cache_cli.py \
  tests/unit/service/test_latent_cache_task_service.py \
  tests/unit/pipelines/wan_video/test_service_examples.py \
  tests/unit/pipelines/wan_video/test_latent_data_utils.py -q
# Result: 15 passed

# Real Wan2.2 service e2e without latent cache.
# Model: Wan2.2-T2V-A14B
# Config: num_inference_steps=2, num_frames=5, resolution=480p, parallelism=1.
# Result: completed; non-empty mp4 generated.

# Real Wan2.2 service e2e with CacheSeek enabled.
# Model: Wan2.2-T2V-A14B
# Cache mode: read_write
# Result: task_status=completed; non-empty mp4 generated.
# Cache evidence: audit log contains lookup_hit skip_step=1 and save_stored.

# Real LingBot World Fast exact-prefix e2e through TeleFuser world_kv hooks.
# Note: uses CacheSeek as an external dependency; this PR only includes TeleFuser-side hooks.
export CUDA_VISIBLE_DEVICES=0,1
export LINGBOT_WORLD_CHECKPOINT_DIR=<lingbot-world-fast-checkpoint-root>
export WORLDKV_REPO_ROOTS=<telefuser-repo>:<cacheseek-repo>
export PYTHONPATH=<telefuser-repo>:<cacheseek-repo>:${PYTHONPATH:-}
cd <cacheseek-repo>
python \
  examples/exact_prefix_reuse/e2e_telefuser_lingbot.py \
  --frame-num 13 \
  --prefix-chunks 1 \
  --out-dir <output-dir> \
  --image-path <lingbot-example-image> \
  --action-path <lingbot-example-action-dir> \
  --aux-device cuda:1 \
  --no-save-videos
# Result: all_pass=true; fast_forward_k A=0, B=1, C=1, D=0.

Additional validation notes:

  • Wan2.2 CacheSeek audit log contained lookup_hit skip_step=1 and save_stored.
  • LingBot e2e manifest reported all_pass=true.
  • LingBot e2e log reported world_kv: fast-forward 1 chunks (decode-only).
  • GPUs were checked after e2e runs and had no remaining compute processes.

Checklist

  • Code follows the project's coding standards (ruff)
  • Pre-commit hooks pass (pre-commit run --all-files)
  • All tests pass (pytest tests/)
  • New tests added for new functionality
  • Documentation updated (README, CLAUDE.md, docstrings)
  • Commit messages are clear and descriptive
  • PR title follows the convention: [TYPE] Brief description

Related Issues

N/A

Additional Notes

This PR is scoped to the TeleFuser-side CacheSeek adaptation. It does not include CacheSeek repository changes. The LingBot e2e command above exercises CacheSeek as an external dependency to verify that the TeleFuser world_kv_binding hooks are usable end to end.

GPU Architecture Support

  • SM80 (Ampere, Ada Lovelace)
  • SM90 (Hopper H100)
  • SM100+ (Blackwell)

No kernel-specific code was changed. Real e2e validation ran on NVIDIA H100.

Performance Impact

No kernel-level performance change is intended. CacheSeek reuse can reduce repeated work when enabled. The LingBot exact-prefix smoke e2e showed functional reuse:

  • A cold run: 7.456s
  • B full hit: 1.519s
  • C prefix hit: 1.494s
  • D cold fork reference: 4.628s

These are smoke e2e timings on H100 and should not be treated as a formal benchmark.

yx0716 and others added 5 commits June 18, 2026 15:50
Replace the in-tree telefuser/cache_mem cache with cacheseek as the
cross-request cache middleware.

- service (container/task_service/api_server): build and drive
  (CacheService, TeleFuserCacheAdapter); per request build_query ->
  lookup -> apply_resume -> on_response -> save
- lingbot_world_fast: world_kv hooks (on_runtime_created /
  on_chunk_finalized) + decode-only fast path for exact-prefix KV reuse;
  enable rolling KV window (local_attn_size=7, sink_size=3)
- remove legacy telefuser/cache_mem + service/cache/cache_factory|
  cache_service and the cache_mem unit tests
- pin torch==2.7.0 + torchvision==0.22.0
- docs: update latent_cache (en/zh)
…arch-v2)

arch-v2 退役了 cacheseek.core,CacheConfig 现从顶层 `cacheseek` 导出。cache 与
nocache 两个 wan22 T2V service 入口仍 import arch-v1 的 cacheseek.core.config,
导致 cacheseek approximate-reuse e2e 在服务启动期 ModuleNotFoundError 崩溃。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@yJader yJader marked this pull request as ready for review July 1, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants