[DRAFT] Add cache manager and package cache routing.#1216
Conversation
…upport Introduces a new centralized cache management layer that replaces the ad-hoc per-directory lookups with a structured, hierarchical system. Key components: - `CacheManager` with collection-based lookup across local and remote backends - `RemotePEP503Backend` with lazy per-package fetching and session-scoped index - `StoreRouter` for directing artifacts to the correct collection - Short-circuit optimization in `_phase_prepare_source` that skips source download, build env, and build dep resolution on cache hits - `update_wheel_mirror` called after remote cache downloads so wheels are indexed for subsequent build dependency resolution via the internal server - CLI commands (`fromager cache list/stats/verify/invalidate/gc`) - `--use-cache-manager` and `--cache-wheel-server-url` bootstrap options Co-Authored-By: Claude <claude@anthropic.com> Signed-off-by: Sean Pryor <spryor@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
When the CacheManager is active and a non-cpu variant has top-level requirements, newly built wheels are routed to the appropriate collection directory: - Packages listed in the variant's requirements file → wheels-repo/variants/<variant>/ - Unlisted transitive dependencies → wheels-repo/downloads/ (shared default) This ensures the variant collection contains exactly the packages explicitly requested in the requirements file, while common deps discovered during dependency resolution stay in the shared collection for reuse across variants. Changes: - `_build_cache_manager` accepts `toplevel_reqs` and uses them as the variant package set for store routing - `StoreRouter` routes based on variant_packages (from requirements file), not just pre-built packages - `_phase_build` calls `store_wheel` after building (not for cached hits) - `LocalDirectoryBackend.store()` copies (not moves) to preserve the original in downloads/ for the internal wheel server Co-Authored-By: Claude <claude@anthropic.com> Signed-off-by: Sean Pryor <spryor@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Sean Pryor <spryor@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
📝 WalkthroughWalkthroughThis PR introduces a unified artifact caching subsystem ( Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
tests/test_cache.py (1)
1045-1103: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert same-run promotion to the local backend here.
This test proves the file was downloaded, but not that the
CacheManageractually promoted it intolocal_backendfor subsequent lookups. Adding a second lookup (or a directlocal_backend.lookup(...)assertion) would catch regressions where remote hits never become local cache hits.Based on learnings and path instructions: "tests/**: Verify test actually tests the intended behavior. Check for missing edge cases."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_cache.py` around lines 1045 - 1103, The remote-hit test currently only verifies the download result and does not confirm same-run promotion into the local cache. Update test_remote_hit_downloads_to_local_store in CacheManager flow to perform a second lookup or directly query local_backend after the first remote hit, and assert that the item is now served from LocalDirectoryBackend as a local hit. Use the existing symbols CacheManager.lookup_wheel, local_backend, and RemotePEP503Backend to keep the test focused on promotion behavior.Source: Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/fromager/bootstrapper.py`:
- Around line 1535-1541: The wheel mirror update is happening before the routed
cache entry is stored, which can leave the internal index out of sync for the
newly copied path. In `bootstrapper.py`, reorder the flow in the cache-hit
handling around `server.update_wheel_mirror`, `self.ctx.package_build_info`, and
`self.ctx.cache.store_wheel` so the wheel is stored in the routed collection
first and the mirror is updated afterward. Keep the existing routing logic
intact, just move the mirror refresh to run after `store_wheel()` completes.
In `@src/fromager/cache.py`:
- Around line 664-667: After backend.fetch in the cache hit path, register the
downloaded artifact with collection.store_backend so its index is updated as a
local artifact instead of remaining remote-only. Use the existing cache-hit flow
around backend.fetch and the collection.store_backend object to add the
downloaded local_path back into the store manager in the same way promoted
artifacts are tracked, so live cache/admin commands can see it.
- Around line 454-463: Verify and enforce the advertised sha256 when downloading
a remote wheel in fetch(). After the session.get/iter_content write completes,
compute the file’s SHA256 for the downloaded target and compare it against
info.sha256 from ArtifactInfo; if it does not match, reject the wheel by
removing the cached file and raising an error. Keep the existing download flow
in fetch() but add the hash validation before returning target so only verified
wheels are accepted.
- Around line 504-535: The _parse_project_page() helper is accepting untrusted
anchor text as ArtifactInfo.filename, which later gets joined by fetch() when
writing to dest, so reject any remote wheel name containing path components or
traversal segments. Update _parse_project_page() to validate the parsed filename
before appending ArtifactInfo, and only allow a plain basename for remote
artifacts while keeping the existing URL and sha256 parsing logic intact.
In `@src/fromager/commands/cache_cmd.py`:
- Around line 92-99: The variant cache setup in cache_cmd.py is reusing the same
prebuilt_backend instance in both the variant and default collections, which
breaks collection scoping and causes double counting/mutating shared files.
Update the CacheCollection construction logic around wkctx.variant so the
variant collection only owns variant-specific backends, while prebuilt_backend
remains attached to the default collection and is still reachable via
search_order. Keep the unique symbols CacheCollection, variant_backends,
collections[wkctx.variant], and search_order aligned so cache list/stats and
invalidate --collection <variant> operate on separate backend instances.
In `@tests/test_bootstrapper_iterative.py`:
- Around line 1319-1368: The cache hit stats test is bypassing the real lookup
path by mocking _find_cached_wheel_via_manager, so CacheManager.lookup_wheel()
never executes and the hit event cannot be verified. Update
test_cache_hit_records_stats to exercise the actual manager lookup path by only
stubbing the unpack step (for example, _unpack_metadata_from_wheel) and then
assert the relevant stats counter/event produced by CacheManager/Bootstrapper.
If this test is only duplicating phase progression, rename or remove it so it
validates the intended stats behavior.
---
Nitpick comments:
In `@tests/test_cache.py`:
- Around line 1045-1103: The remote-hit test currently only verifies the
download result and does not confirm same-run promotion into the local cache.
Update test_remote_hit_downloads_to_local_store in CacheManager flow to perform
a second lookup or directly query local_backend after the first remote hit, and
assert that the item is now served from LocalDirectoryBackend as a local hit.
Use the existing symbols CacheManager.lookup_wheel, local_backend, and
RemotePEP503Backend to keep the test focused on promotion behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a423bd0f-a4ca-4a6b-892c-f98580939f94
📒 Files selected for processing (9)
pyproject.tomlsrc/fromager/bootstrapper.pysrc/fromager/cache.pysrc/fromager/commands/bootstrap.pysrc/fromager/commands/cache_cmd.pysrc/fromager/context.pysrc/fromager/requirements_file.pytests/test_bootstrapper_iterative.pytests/test_cache.py
| server.update_wheel_mirror(self.ctx) | ||
| # Route to the correct collection (e.g., variant dir for listed packages) | ||
| pbi = self.ctx.package_build_info(item.req) | ||
| build_tag = pbi.build_tag(item.resolved_version) | ||
| self.ctx.cache.store_wheel( | ||
| item.req, item.resolved_version, build_tag, cached_wheel | ||
| ) |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Update the wheel mirror after storing the routed cache hit.
store_wheel() can copy the hit into the routed collection, so updating the mirror first can leave the internal wheel index stale for that newly stored path.
Suggested fix
- server.update_wheel_mirror(self.ctx)
# Route to the correct collection (e.g., variant dir for listed packages)
pbi = self.ctx.package_build_info(item.req)
build_tag = pbi.build_tag(item.resolved_version)
- self.ctx.cache.store_wheel(
+ cached_wheel = self.ctx.cache.store_wheel(
item.req, item.resolved_version, build_tag, cached_wheel
)
+ server.update_wheel_mirror(self.ctx)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| server.update_wheel_mirror(self.ctx) | |
| # Route to the correct collection (e.g., variant dir for listed packages) | |
| pbi = self.ctx.package_build_info(item.req) | |
| build_tag = pbi.build_tag(item.resolved_version) | |
| self.ctx.cache.store_wheel( | |
| item.req, item.resolved_version, build_tag, cached_wheel | |
| ) | |
| # Route to the correct collection (e.g., variant dir for listed packages) | |
| pbi = self.ctx.package_build_info(item.req) | |
| build_tag = pbi.build_tag(item.resolved_version) | |
| cached_wheel = self.ctx.cache.store_wheel( | |
| item.req, item.resolved_version, build_tag, cached_wheel | |
| ) | |
| server.update_wheel_mirror(self.ctx) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/fromager/bootstrapper.py` around lines 1535 - 1541, The wheel mirror
update is happening before the routed cache entry is stored, which can leave the
internal index out of sync for the newly copied path. In `bootstrapper.py`,
reorder the flow in the cache-hit handling around `server.update_wheel_mirror`,
`self.ctx.package_build_info`, and `self.ctx.cache.store_wheel` so the wheel is
stored in the routed collection first and the mirror is updated afterward. Keep
the existing routing logic intact, just move the mirror refresh to run after
`store_wheel()` completes.
| url = info.url_or_path | ||
| logger.info( | ||
| "downloading cached wheel %s from %s", info.filename, self._server_url | ||
| ) | ||
| resp = session.get(url, stream=True) | ||
| resp.raise_for_status() | ||
| with open(target, "wb") as f: | ||
| for chunk in resp.iter_content(chunk_size=1024 * 1024): | ||
| f.write(chunk) | ||
| return target |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Verify the advertised sha256 before accepting a remote wheel.
You already parse the fragment hash into ArtifactInfo.sha256, but fetch() never checks it. That turns a signed/simple index hint into unused metadata and lets corrupted or tampered wheels enter the local cache silently.
Proposed fix
@@
import dataclasses
+import hashlib
import logging
@@
resp = session.get(url, stream=True)
resp.raise_for_status()
+ hasher = hashlib.sha256() if info.sha256 else None
with open(target, "wb") as f:
for chunk in resp.iter_content(chunk_size=1024 * 1024):
+ if not chunk:
+ continue
+ if hasher is not None:
+ hasher.update(chunk)
f.write(chunk)
+ if hasher is not None and hasher.hexdigest() != info.sha256:
+ target.unlink(missing_ok=True)
+ raise ValueError(
+ f"sha256 mismatch for cached wheel {info.filename}: expected {info.sha256}"
+ )
return target📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| url = info.url_or_path | |
| logger.info( | |
| "downloading cached wheel %s from %s", info.filename, self._server_url | |
| ) | |
| resp = session.get(url, stream=True) | |
| resp.raise_for_status() | |
| with open(target, "wb") as f: | |
| for chunk in resp.iter_content(chunk_size=1024 * 1024): | |
| f.write(chunk) | |
| return target | |
| import hashlib | |
| url = info.url_or_path | |
| logger.info( | |
| "downloading cached wheel %s from %s", info.filename, self._server_url | |
| ) | |
| resp = session.get(url, stream=True) | |
| resp.raise_for_status() | |
| hasher = hashlib.sha256() if info.sha256 else None | |
| with open(target, "wb") as f: | |
| for chunk in resp.iter_content(chunk_size=1024 * 1024): | |
| if not chunk: | |
| continue | |
| if hasher is not None: | |
| hasher.update(chunk) | |
| f.write(chunk) | |
| if hasher is not None and hasher.hexdigest() != info.sha256: | |
| target.unlink(missing_ok=True) | |
| raise ValueError( | |
| f"sha256 mismatch for cached wheel {info.filename}: expected {info.sha256}" | |
| ) | |
| return target |
🧰 Tools
🪛 ast-grep (0.44.0)
[warning] 459-459: File path is request-/variable-derived; validate and normalize to prevent path traversal.
Context: open(target, "wb")
Note: [CWE-22] Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal').
(open-filename-from-request)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/fromager/cache.py` around lines 454 - 463, Verify and enforce the
advertised sha256 when downloading a remote wheel in fetch(). After the
session.get/iter_content write completes, compute the file’s SHA256 for the
downloaded target and compare it against info.sha256 from ArtifactInfo; if it
does not match, reject the wheel by removing the cached file and raising an
error. Keep the existing download flow in fetch() but add the hash validation
before returning target so only verified wheels are accepted.
| def _parse_project_page(html: str, base_url: str) -> list[ArtifactInfo]: | ||
| """Extract wheel artifact info from a PEP 503 project page.""" | ||
| artifacts: list[ArtifactInfo] = [] | ||
| for match in re.finditer(r'<a\s+href="([^"]+)"[^>]*>([^<]+)</a>', html): | ||
| href = match.group(1) | ||
| filename = match.group(2).strip() | ||
| if not filename.endswith(".whl"): | ||
| continue | ||
|
|
||
| # Resolve relative URLs | ||
| if href.startswith("http://") or href.startswith("https://"): | ||
| url = href | ||
| elif href.startswith("/"): | ||
| parsed = urlparse(base_url) | ||
| url = f"{parsed.scheme}://{parsed.netloc}{href}" | ||
| else: | ||
| url = base_url.rstrip("/") + "/" + href | ||
|
|
||
| # Strip hash fragment for the URL but extract sha256 if present | ||
| sha256 = None | ||
| if "#" in url: | ||
| url_part, fragment = url.rsplit("#", 1) | ||
| if fragment.startswith("sha256="): | ||
| sha256 = fragment[7:] | ||
| url = url_part | ||
|
|
||
| artifacts.append( | ||
| ArtifactInfo( | ||
| filename=filename, | ||
| url_or_path=url, | ||
| sha256=sha256, | ||
| ) |
There was a problem hiding this comment.
🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win
Reject path components in remote artifact names.
_parse_project_page() trusts the anchor text as ArtifactInfo.filename, and fetch() later writes dest / info.filename. A malicious or compromised cache can publish ../...whl and escape the cache directory.
Proposed fix
@@
- filename = match.group(2).strip()
+ filename = match.group(2).strip()
if not filename.endswith(".whl"):
continue
+
+ safe_filename = pathlib.PurePosixPath(filename).name
+ if safe_filename != filename:
+ logger.warning("skipping remote artifact with unsafe filename %r", filename)
+ continue
@@
artifacts.append(
ArtifactInfo(
- filename=filename,
+ filename=safe_filename,
url_or_path=url,
sha256=sha256,
)
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _parse_project_page(html: str, base_url: str) -> list[ArtifactInfo]: | |
| """Extract wheel artifact info from a PEP 503 project page.""" | |
| artifacts: list[ArtifactInfo] = [] | |
| for match in re.finditer(r'<a\s+href="([^"]+)"[^>]*>([^<]+)</a>', html): | |
| href = match.group(1) | |
| filename = match.group(2).strip() | |
| if not filename.endswith(".whl"): | |
| continue | |
| # Resolve relative URLs | |
| if href.startswith("http://") or href.startswith("https://"): | |
| url = href | |
| elif href.startswith("/"): | |
| parsed = urlparse(base_url) | |
| url = f"{parsed.scheme}://{parsed.netloc}{href}" | |
| else: | |
| url = base_url.rstrip("/") + "/" + href | |
| # Strip hash fragment for the URL but extract sha256 if present | |
| sha256 = None | |
| if "#" in url: | |
| url_part, fragment = url.rsplit("#", 1) | |
| if fragment.startswith("sha256="): | |
| sha256 = fragment[7:] | |
| url = url_part | |
| artifacts.append( | |
| ArtifactInfo( | |
| filename=filename, | |
| url_or_path=url, | |
| sha256=sha256, | |
| ) | |
| def _parse_project_page(html: str, base_url: str) -> list[ArtifactInfo]: | |
| """Extract wheel artifact info from a PEP 503 project page.""" | |
| artifacts: list[ArtifactInfo] = [] | |
| for match in re.finditer(r'<a\s+href="([^"]+)"[^>]*>([^<]+)</a>', html): | |
| href = match.group(1) | |
| filename = match.group(2).strip() | |
| if not filename.endswith(".whl"): | |
| continue | |
| safe_filename = pathlib.PurePosixPath(filename).name | |
| if safe_filename != filename: | |
| logger.warning("skipping remote artifact with unsafe filename %r", filename) | |
| continue | |
| # Resolve relative URLs | |
| if href.startswith("http://") or href.startswith("https://"): | |
| url = href | |
| elif href.startswith("/"): | |
| parsed = urlparse(base_url) | |
| url = f"{parsed.scheme}://{parsed.netloc}{href}" | |
| else: | |
| url = base_url.rstrip("/") + "/" + href | |
| # Strip hash fragment for the URL but extract sha256 if present | |
| sha256 = None | |
| if "#" in url: | |
| url_part, fragment = url.rsplit("#", 1) | |
| if fragment.startswith("sha256="): | |
| sha256 = fragment[7:] | |
| url = url_part | |
| artifacts.append( | |
| ArtifactInfo( | |
| filename=safe_filename, | |
| url_or_path=url, | |
| sha256=sha256, | |
| ) |
🧰 Tools
🪛 ast-grep (0.44.0)
[warning] 513-513: Do not make http calls without encryption
Context: "http://"
Note: [CWE-319] Cleartext Transmission of Sensitive Information.
(requests-http)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/fromager/cache.py` around lines 504 - 535, The _parse_project_page()
helper is accepting untrusted anchor text as ArtifactInfo.filename, which later
gets joined by fetch() when writing to dest, so reject any remote wheel name
containing path components or traversal segments. Update _parse_project_page()
to validate the parsed filename before appending ArtifactInfo, and only allow a
plain basename for remote artifacts while keeping the existing URL and sha256
parsing logic intact.
| # Hit -- fetch the artifact to a local path | ||
| local_path = backend.fetch( | ||
| key, info, collection.store_backend.directory | ||
| ) |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Register remote hits in the local store backend after download.
This downloads remote hits into collection.store_backend.directory, but it never updates collection.store_backend's index. In the same run, the promoted wheel still looks like a remote-only hit, and cache admin commands built on the live manager cannot see the promoted artifact.
Proposed fix
- local_path = backend.fetch(
- key, info, collection.store_backend.directory
- )
+ local_path = backend.fetch(
+ key, info, collection.store_backend.directory
+ )
+ if backend is not collection.store_backend:
+ collection.store_backend.store(key, local_path)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Hit -- fetch the artifact to a local path | |
| local_path = backend.fetch( | |
| key, info, collection.store_backend.directory | |
| ) | |
| # Hit -- fetch the artifact to a local path | |
| local_path = backend.fetch( | |
| key, info, collection.store_backend.directory | |
| ) | |
| if backend is not collection.store_backend: | |
| collection.store_backend.store(key, local_path) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/fromager/cache.py` around lines 664 - 667, After backend.fetch in the
cache hit path, register the downloaded artifact with collection.store_backend
so its index is updated as a local artifact instead of remaining remote-only.
Use the existing cache-hit flow around backend.fetch and the
collection.store_backend object to add the downloaded local_path back into the
store manager in the same way promoted artifacts are tracked, so live
cache/admin commands can see it.
| variant_backends: list[CacheBackend] = [variant_backend, prebuilt_backend] | ||
| variant_collection = CacheCollection( | ||
| name=wkctx.variant, | ||
| backends=variant_backends, | ||
| store_backend=variant_backend, | ||
| ) | ||
| collections[wkctx.variant] = variant_collection | ||
| search_order = [wkctx.variant, "default"] |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Do not attach the same prebuilt_backend instance to both collections.
With search_order = [variant, "default"], prebuilt wheels are already reachable through default. Reusing the same backend object in both collections breaks collection scoping: cache list/stats double-count prebuilt artifacts, and invalidate --collection <variant> can mutate files that also belong to default.
Proposed fix
- variant_backends: list[CacheBackend] = [variant_backend, prebuilt_backend]
+ variant_backends: list[CacheBackend] = [variant_backend]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| variant_backends: list[CacheBackend] = [variant_backend, prebuilt_backend] | |
| variant_collection = CacheCollection( | |
| name=wkctx.variant, | |
| backends=variant_backends, | |
| store_backend=variant_backend, | |
| ) | |
| collections[wkctx.variant] = variant_collection | |
| search_order = [wkctx.variant, "default"] | |
| variant_backends: list[CacheBackend] = [variant_backend] | |
| variant_collection = CacheCollection( | |
| name=wkctx.variant, | |
| backends=variant_backends, | |
| store_backend=variant_backend, | |
| ) | |
| collections[wkctx.variant] = variant_collection | |
| search_order = [wkctx.variant, "default"] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/fromager/commands/cache_cmd.py` around lines 92 - 99, The variant cache
setup in cache_cmd.py is reusing the same prebuilt_backend instance in both the
variant and default collections, which breaks collection scoping and causes
double counting/mutating shared files. Update the CacheCollection construction
logic around wkctx.variant so the variant collection only owns variant-specific
backends, while prebuilt_backend remains attached to the default collection and
is still reachable via search_order. Keep the unique symbols CacheCollection,
variant_backends, collections[wkctx.variant], and search_order aligned so cache
list/stats and invalidate --collection <variant> operate on separate backend
instances.
| def test_cache_hit_records_stats(self, tmp_context: WorkContext) -> None: | ||
| """Cache hit via CacheManager records a hit event in stats.""" | ||
|
|
||
| from fromager.cache import ( | ||
| CacheCollection, | ||
| CacheManager, | ||
| LocalDirectoryBackend, | ||
| StoreRouter, | ||
| ) | ||
|
|
||
| wheels_dir = tmp_context.wheels_downloads | ||
| wheel_file = wheels_dir / "testpkg-1.0-1-py3-none-any.whl" | ||
| wheel_file.write_bytes(b"fake wheel") | ||
|
|
||
| backend = LocalDirectoryBackend(wheels_dir, backend_name="local:default") | ||
| collection = CacheCollection( | ||
| name="default", backends=[backend], store_backend=backend | ||
| ) | ||
| router = StoreRouter( | ||
| overrides={}, accelerated_packages=set(), active_variant="cpu" | ||
| ) | ||
| cache_mgr = CacheManager( | ||
| collections={"default": collection}, | ||
| search_order=["default"], | ||
| store_routing=router, | ||
| ) | ||
| cache_mgr.initialize() | ||
| tmp_context.cache = cache_mgr | ||
|
|
||
| bt = bootstrapper.Bootstrapper(tmp_context) | ||
| item = _make_build_item(phase=BootstrapPhase.PREPARE_SOURCE) | ||
|
|
||
| with ( | ||
| patch.object(tmp_context.constraints, "get_constraint", return_value=None), | ||
| patch.object( | ||
| bt, | ||
| "_find_cached_wheel_via_manager", | ||
| return_value=(wheel_file, tmp_context.work_dir / "testpkg-1.0"), | ||
| ), | ||
| patch.object( | ||
| bt, | ||
| "_create_unpack_dir", | ||
| return_value=tmp_context.work_dir / "testpkg-1.0", | ||
| ), | ||
| patch("fromager.bootstrapper.server.update_wheel_mirror"), | ||
| ): | ||
| bt._phase_prepare_source(item) | ||
|
|
||
| # Stats should show the cache was consulted | ||
| assert item.phase == BootstrapPhase.PROCESS_INSTALL_DEPS |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Make this stats test exercise the real manager lookup.
This patches _find_cached_wheel_via_manager, so CacheManager.lookup_wheel() never runs and no hit stats are recorded by the code under test. Patch _unpack_metadata_from_wheel instead, then assert the exposed stats counter/event; otherwise rename/remove this duplicate phase-advancement test. As per path instructions, “Verify test actually tests the intended behavior.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_bootstrapper_iterative.py` around lines 1319 - 1368, The cache hit
stats test is bypassing the real lookup path by mocking
_find_cached_wheel_via_manager, so CacheManager.lookup_wheel() never executes
and the hit event cannot be verified. Update test_cache_hit_records_stats to
exercise the actual manager lookup path by only stubbing the unpack step (for
example, _unpack_metadata_from_wheel) and then assert the relevant stats
counter/event produced by CacheManager/Bootstrapper. If this test is only
duplicating phase progression, rename or remove it so it validates the intended
stats behavior.
Source: Path instructions
Needs human review, I'm sick today and wanted to get this up though
Don't bother reviewing till I address at least the AI comments
Pull Request Description
What
This adds two features:
Why
This solves two longstanding problems:
Separation of caches for shared deps and top level accelerated packages
Pulling from multiple remote caches
PR follows CONTRIBUTING.md guidelines
AI summaries below:
Cache Manager:
Cache Routing: