Invalidate HookRegistry child-registries cache on enable/disable cache by SuryanshSS1011 · Pull Request #14093 · huggingface/diffusers

SuryanshSS1011 · 2026-06-29T23:44:02Z

What does this PR do?

HookRegistry._get_child_registries() caches the child-module registries it finds by walking named_modules(), and never invalidates that cache. But enable_cache() / disable_cache() add and remove block-level hooks, changing which modules carry a _diffusers_hook. If cache_context() is first entered while no block hooks exist (e.g. a warmup pass with caching disabled), the parent registry caches an incomplete child list. A later enable_cache(FirstBlockCacheConfig(...)) registers block hooks, but _set_context() still iterates the stale cache, so the new block StateManagers never receive a context and the next cached forward raises:

ValueError: No context is set. Please set a context before retrieving the state.

This adds HookRegistry.invalidate_child_registries_cache(), which clears the cached list across the module tree, and calls it from enable_cache() and disable_cache() after hooks are added/removed.

The staleness originates in register_hook / remove_hook, but those run on the child block registries, which can't reach the parent registry whose cache is stale. enable_cache / disable_cache operate on the root module, so invalidating there covers the reported scenario for every cache technique. Happy to move it into register_hook / remove_hook instead if you'd prefer it lower down.

The self-contained CPU reproduction from the issue passes after the fix, and a regression test is added in tests/hooks/test_hooks.py.

Before submitting

Did you use an AI agent (Claude Code, Codex, Cursor, etc.) to help with this PR? If so:
- Did you read the Coding with AI agents guide?
- Did you self-review the diff against .ai/review-rules.md?
Did you read the contributor guideline?
Did you read our philosophy doc? (important for complex PRs)
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?
Are you the author (or part of the team) of the model/pipeline (only applicable for model/pipeline related PRs)?

Who can review?

@DN6 @sayakpaul

sayakpaul · 2026-06-30T00:12:45Z

        assert registry.get_hook("stateful_add_hook").increment == 1
        assert torch.allclose(output1, output2)

+    def test_child_registries_cache_invalidation(self):


Should there also be testing when the context path is exercised, as reported in #14037?

Yes, that is a good call!

I've added test_cache_context_after_enable_cache_with_prior_context, which reproduces the issue end-to-end: it enters cache_context() before enable_cache() on a small FluxTransformer2DModel and runs a cached forward. It raises the original ValueError: No context is set without the fix and passes with it. Also fixed the check_code_quality failure (a docstring line-length restyle).

HuggingFaceDocBuilderDev · 2026-06-30T00:24:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sergereview

🤗 Serge says:

Clean, well-targeted fix for #14037. The stale-cache diagnosis is accurate: _get_child_registries() memoizes the child-registry walk and enable_cache/disable_cache change which modules carry a _diffusers_hook, so invalidating after hooks are added/removed is the right correction.

Correctness

invalidate_child_registries_cache() walks the full tree and resets _child_registries_cache = None, which _get_child_registries() correctly treats as "rebuild on next use". Clearing every registry in the subtree (not just the root) is right, since a child registry can also appear in an ancestor's cache.
Invalidation is placed in enable_cache/disable_cache on the root module rather than at the true source (register_hook/remove_hook), which the author calls out. Given a child registry can't reach a stale ancestor cache, this is a reasonable and minimal choice that covers the reported path for every cache technique.
The HookRegistry import added to enable_cache/disable_cache and the top-level FirstBlockCacheConfig/FluxTransformer2DModel imports in the test are all valid exports.

Tests

Both the unit-level (test_child_registries_cache_invalidation) and end-to-end (test_cache_context_after_enable_cache_with_prior_context) tests exercise the fix and match the failure described in the issue. They use small CPU-friendly configs consistent with the rest of the file.

Matches the PR description. No blocking issues.

serge v0.1.0 · model: claude-opus-4-8 · 8 LLM turns · 9 tool calls · 36.9s · 145457 in / 2138 out tokens

sayakpaul · 2026-07-02T03:52:15Z

+    # Warmup pass inside a cache_context() while caching is disabled, then enable caching.
+    with torch.no_grad(), model.cache_context("cond"):
+        model(**inputs)
+    model.enable_cache(FirstBlockCacheConfig(threshold=0.1))


Help me understand why would this be a practical flow though?

Like with torch.no_grad(), model.cache_context("cond"): isn't doing anything from the perspectives of caching. So, does this reflect how caching is to be used actually in practice?

So our flow here is enabling caching on a transformer that has already been called once inside cache_context().

The pipelines call cache_context() unconditionally around every transformer forward. For example pipeline_qwenimage.py does with self.transformer.cache_context("cond"): on every denoise step regardless of whether caching is enabled. So the first cache_context() in my test mirrors a normal pipeline run. It's a no-op for caching but it still builds HookRegistry._child_registries_cache. If the user then calls enable_cache(...) and runs again, which is a natural "run once, then turn on caching to speed up the next run" workflow, the second cache_context() hits that stale cache and the block hooks never get a context, which raises ValueError: No context is set.

To confirm it's not contrived, calling enable_cache() before any run works fine. The bug only appears when a cache_context() pass precedes enable_cache(). The test reduces that to the minimal repro, using direct model(**inputs) calls in place of a full pipeline loop so it stays CPU-only and fast, but the ordering is the same one a real pipeline produces.

We can reshape the test to go through an actual pipeline call if you'd prefer it to read more explicitly as the real flow.

Yeah let's do that

github-actions Bot added size/S PR with diff < 50 LOC models tests hooks fixes-issue labels Jun 29, 2026

sayakpaul reviewed Jun 30, 2026

View reviewed changes

Invalidate HookRegistry child-registries cache on enable/disable cache

424f752

SuryanshSS1011 force-pushed the fix/14037-hook-registry-cache-invalidation branch from 42f7a74 to 424f752 Compare June 30, 2026 00:29

github-actions Bot added size/M PR with diff < 200 LOC and removed size/S PR with diff < 50 LOC labels Jun 30, 2026

Merge branch 'main' into fix/14037-hook-registry-cache-invalidation

e4def1b

SuryanshSS1011 requested a review from sayakpaul July 1, 2026 23:25

sergereview Bot reviewed Jul 2, 2026

View reviewed changes

sayakpaul reviewed Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Invalidate HookRegistry child-registries cache on enable/disable cache#14093

Invalidate HookRegistry child-registries cache on enable/disable cache#14093
SuryanshSS1011 wants to merge 2 commits into
huggingface:mainfrom
SuryanshSS1011:fix/14037-hook-registry-cache-invalidation

SuryanshSS1011 commented Jun 29, 2026 •

edited

Loading

Uh oh!

sayakpaul Jun 30, 2026

Uh oh!

SuryanshSS1011 Jun 30, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jun 30, 2026

Uh oh!

sergereview Bot left a comment

Uh oh!

sayakpaul Jul 2, 2026

Uh oh!

SuryanshSS1011 Jul 2, 2026

Uh oh!

sayakpaul Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

SuryanshSS1011 commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

sayakpaul Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

SuryanshSS1011 Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jun 30, 2026

Uh oh!

sergereview Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

SuryanshSS1011 Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SuryanshSS1011 commented Jun 29, 2026 •

edited

Loading

SuryanshSS1011 Jun 30, 2026 •

edited

Loading