feat(resolver): wire source_resolver into resolver and download pipelines#1202
feat(resolver): wire source_resolver into resolver and download pipelines#1202smoparth wants to merge 2 commits into
Conversation
|
Warning Review limit reached
More reviews will be available in 3 minutes and 4 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR wires the source resolver model layer into runtime resolution and source downloading. It adds Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/fromager/packagesettings/_pbi.py (1)
180-186: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winAdd
versionaddedmetadata to the new publicsource_resolverproperty.This accessor is a new user-facing API surface and should carry a Sphinx
.. versionadded:: 0.85directive for release-doc consistency.Suggested patch
def source_resolver(self) -> SourceResolver | None: """Effective source resolver for this package and variant. + .. versionadded:: 0.85 + Returns the variant-level ``source`` override if set, otherwise the package-level ``source``, or ``None`` when neither is configured. """As per coding guidelines, "Use Sphinx
versionadded,versionremoved,versionchangeddirectives for user-facing changes".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fromager/packagesettings/_pbi.py` around lines 180 - 186, The `source_resolver` property in the _pbi.py file is a new public API but is missing the required Sphinx version metadata. Add a `.. versionadded:: 0.85` directive to the docstring of the `source_resolver` property to document when this accessor was introduced and maintain consistency with coding guidelines for user-facing API changes.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/fromager/resolver.py`:
- Around line 129-138: The code uses typing.cast to convert req_type to
RequirementType without enforcing that req_type is actually non-None at runtime.
Since source.resolver_provider() requires a non-optional RequirementType
argument, add a runtime validation check before the return statement in the
source resolver block to ensure req_type is not None. If it is None, either
raise an appropriate error or handle it appropriately so that None is never
passed to the source.resolver_provider() call.
In `@src/fromager/sources.py`:
- Around line 207-223: The `_parse_git_url` function returns clone URLs that may
still contain query parameters and fragment identifiers (such as
`#subdirectory`=...), which are invalid for git cloning. When reconstructing the
clone URL in the `_parse_git_url` function using the `_replace()` method on the
parsed URL object before calling `geturl()`, you need to also clear the query
and fragment components by setting them to empty strings in addition to updating
the path. This ensures the returned clone URL contains only the protocol, host,
and path components needed for git operations.
In `@tests/test_source_resolver_wiring.py`:
- Around line 74-77: Replace all real domain names in test URL fixtures with
their `.test` equivalents throughout the test file. Specifically, change domains
like github.com to github.test, pypi.org to pypi.test, and any other real
domains to their corresponding .test TLD versions. This applies to test URL
literals used in test methods including test_is_git_url_ssh and other test
methods that construct URLs as part of their fixtures, ensuring consistent
adherence to test isolation conventions.
---
Nitpick comments:
In `@src/fromager/packagesettings/_pbi.py`:
- Around line 180-186: The `source_resolver` property in the _pbi.py file is a
new public API but is missing the required Sphinx version metadata. Add a `..
versionadded:: 0.85` directive to the docstring of the `source_resolver`
property to document when this accessor was introduced and maintain consistency
with coding guidelines for user-facing API changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0a276ad7-135e-47ad-af4f-351f66a07123
📒 Files selected for processing (6)
src/fromager/packagesettings/_models.pysrc/fromager/packagesettings/_pbi.pysrc/fromager/resolver.pysrc/fromager/sources.pytests/test_packagesettings.pytests/test_source_resolver_wiring.py
| def _parse_git_url(url: str) -> tuple[str, str | None]: | ||
| """Split a VCS URL into clone URL and optional ref. | ||
|
|
||
| ``git+https://host/repo@ref`` -> ``(https://host/repo, ref)`` | ||
| ``git+https://host/repo`` -> ``(https://host/repo, None)`` | ||
| """ | ||
| clone_url = url | ||
| if clone_url.startswith("git+"): | ||
| clone_url = clone_url[len("git+") :] | ||
|
|
||
| parsed = urlparse(clone_url) | ||
| ref: str | None = None | ||
| if "@" in parsed.path: | ||
| new_path, _, ref = parsed.path.rpartition("@") | ||
| clone_url = parsed._replace(path=new_path).geturl() | ||
|
|
||
| return clone_url, ref |
There was a problem hiding this comment.
Strip query/fragment from git clone URL in _parse_git_url.
Line 221 can return clone URLs that still contain query/fragment data (for example #subdirectory=...), which is not a valid clone target and can break download_git_source().
Suggested fix
def _parse_git_url(url: str) -> tuple[str, str | None]:
@@
- parsed = urlparse(clone_url)
+ parsed = urlparse(clone_url)
+ # Not part of git clone target
+ parsed = parsed._replace(params="", query="", fragment="")
ref: str | None = None
if "@" in parsed.path:
new_path, _, ref = parsed.path.rpartition("@")
clone_url = parsed._replace(path=new_path).geturl()
+ else:
+ clone_url = parsed.geturl()
return clone_url, ref🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/fromager/sources.py` around lines 207 - 223, The `_parse_git_url`
function returns clone URLs that may still contain query parameters and fragment
identifiers (such as `#subdirectory`=...), which are invalid for git cloning. When
reconstructing the clone URL in the `_parse_git_url` function using the
`_replace()` method on the parsed URL object before calling `geturl()`, you need
to also clear the query and fragment components by setting them to empty strings
in addition to updating the path. This ensures the returned clone URL contains
only the protocol, host, and path components needed for git operations.
There was a problem hiding this comment.
This is the third implementation of pip VCS url parser in Fromager. All three implementation have slightly different behavior and subtle bugs. I have created #1215 to introduce a well-defined and tested function to parse a VCS URL into clone URL and git ref.
There was a problem hiding this comment.
@tiran, that's a much better approach than patching individual implementations — consolidating into one well-tested function in #1215 fixes the root cause for all three sites at once. I'll withdraw this suggestion.
✏️ Learnings added
Learnt from: tiran
Repo: python-wheel-build/fromager PR: 1202
File: src/fromager/sources.py:207-223
Timestamp: 2026-06-24T11:25:31.914Z
Learning: In the `fromager` codebase (`python-wheel-build/fromager`), VCS (pip-style `git+https://` / `git+ssh://`) URL parsing must use the single canonical helper introduced in PR `#1215`. There were previously three separate ad-hoc implementations with divergent behavior (in `sources.py` and elsewhere); do not suggest adding new inline parsers — point to the shared utility instead.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
fac1b7f to
55aca35
Compare
|
@tiran PTAL |
|
The PR looks good to me overall. Lets see if we get review from @tiran . |
| return results[0] | ||
|
|
||
|
|
||
| def default_resolver_provider( |
There was a problem hiding this comment.
Is default_resolver_provider the right place to hook into source resolver? How are you going to implement https://fromager.readthedocs.io/en/latest/proposals/new-resolver-config.html#default-behavior-and-hooks ?
There was a problem hiding this comment.
Created a separate commit refactor(resolver): check source config before plugin lookup to move the source_resolver check before overrides.find_and_invoke in resolve(), get_source_provider(), and download_source().
When provider != "hook", YAML config is used directly, plugins are skipped for both resolver and download. When provider == "hook" or no source config, the old plugin system runs as before. Let me know what you think.
"at least one source setting must use provider: hook when a plugin exists" needs a runtime check since plugin discovery isn't available at config load time. I'm not sure what to do here, need help with this.
|
|
||
| if _is_git_url(url): | ||
| clone_url, ref = _parse_git_url(url) | ||
| download_path = ctx.work_dir / f"{req.name}-{version}" / f"{req.name}-{version}" |
There was a problem hiding this comment.
Isn't that the same path as sdists_downloads_dir?
There was a problem hiding this comment.
My understanding is sdists_downloads_dir is for downloaded archive files (.tar.gz, .zip). Git clones produce a directory, not an archive, so they go to ctx.work_dir, same path the existing req.url git clone uses.
| def _parse_git_url(url: str) -> tuple[str, str | None]: | ||
| """Split a VCS URL into clone URL and optional ref. | ||
|
|
||
| ``git+https://host/repo@ref`` -> ``(https://host/repo, ref)`` | ||
| ``git+https://host/repo`` -> ``(https://host/repo, None)`` | ||
| """ | ||
| clone_url = url | ||
| if clone_url.startswith("git+"): | ||
| clone_url = clone_url[len("git+") :] | ||
|
|
||
| parsed = urlparse(clone_url) | ||
| ref: str | None = None | ||
| if "@" in parsed.path: | ||
| new_path, _, ref = parsed.path.rpartition("@") | ||
| clone_url = parsed._replace(path=new_path).geturl() | ||
|
|
||
| return clone_url, ref |
There was a problem hiding this comment.
This is the third implementation of pip VCS url parser in Fromager. All three implementation have slightly different behavior and subtle bugs. I have created #1215 to introduce a well-defined and tested function to parse a VCS URL into clone URL and git ref.
55aca35 to
3474a69
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/fromager/sources.py`:
- Around line 239-247: The git URL branch in default_download_source() is
ignoring configured submodule settings because it calls
gitutils.git_clone_fast(), which does not take submodules arguments. Update this
path to use gitutils.git_clone(..., submodules=...) with the parsed
git_options.submodules and git_options.submodule_paths, or extend
git_clone_fast() to accept and forward those options so submodules are honored.
Use the default_download_source() and
gitutils.git_clone_fast()/gitutils.git_clone symbols to locate the change.
In `@tests/test_source_resolver_wiring.py`:
- Around line 279-302: The default `remove_dot_git` tests are asserting the
opposite of the intended behavior, so update the assertions in
`test_keeps_dot_git_by_default` and the related default-check test to match the
PR’s default-`True` behavior. Use `sources.download_git_source` as the reference
point and change the expectations so the default path verifies `.git` is removed
unless `remove_dot_git=False` is explicitly passed, keeping the tests aligned
with the new contract.
- Around line 76-77: The test `test_is_git_url_ssh_not_supported` is asserting
the wrong contract for `sources._is_git_url`; update it so `git+ssh://` is
expected to be treated as a supported git URL, alongside `git+https://`, and
rename the test if needed to reflect the new behavior. Adjust the assertion in
this test module to match the runtime routing through git cloning rather than
marking SSH URLs unsupported.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ba7175d4-59ef-4683-8272-4f1fa0edcb2b
📒 Files selected for processing (6)
src/fromager/packagesettings/_models.pysrc/fromager/packagesettings/_pbi.pysrc/fromager/resolver.pysrc/fromager/sources.pytests/test_packagesettings.pytests/test_source_resolver_wiring.py
✅ Files skipped from review due to trivial changes (1)
- tests/test_packagesettings.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/fromager/packagesettings/_pbi.py
- src/fromager/resolver.py
…ines Activate the `source` field on `PackageSettings` and `VariantInfo` so YAML-configured providers are used by the resolver and download pipelines. This is the wiring layer for the model classes introduced in PR python-wheel-build#1052. - Uncomment `source: SourceResolver | None` in `PackageSettings` and `VariantInfo` - Add `PackageBuildInfo.source_resolver` property (variant overrides package level) - `default_resolver_provider()` checks `pbi.source_resolver` before returning the default `PyPIProvider` - `default_download_source()` detects `git+https://` URL and routes to `git_clone_fast()` with automatic submodule handling - Add `GitOptions.remove_dot_git` field (default `False`) and removes `.git` recursively including submodules - Change `resolver_provider()` in `_resolver.py` to accept `RequirementType | None` for consistency with all Provider constructors; removes need for `typing.cast` - Update existing test snapshots and add new tests Closes: python-wheel-build#1048 Co-Authored-By: Claude <claude@anthropic.com> Signed-off-by: Shanmukh Pawan <smoparth@redhat.com>
3474a69 to
2284b46
Compare
Move the source_resolver dispatch from inside default_resolver_provider (which only runs when no plugin exists) to before overrides.find_and_invoke in resolve(), get_source_provider(), and download_source(). This ensures YAML source config takes priority over plugins for both resolver and download pipelines. When provider is "hook", the dispatch falls through to the plugin system. All other provider values bypass plugins entirely. Co-Authored-By: Claude <claude@anthropic.com> Signed-off-by: Shanmukh Pawan <smoparth@redhat.com>
Activate the
sourcefield onPackageSettingsandVariantInfoso YAML-configured providers are used by the resolver and download pipelines. This is the wiring layer for the model classes introduced in PR #1052.source: SourceResolver | NoneinPackageSettingsandVariantInfoPackageBuildInfo.source_resolverproperty (variant overrides package level)default_resolver_provider()checkspbi.source_resolverbefore returning the defaultPyPIProviderdefault_download_source()detectsgit+https:///git+ssh://URLs and routes todownload_git_source()GitOptions.remove_dot_gitfield (defaultTrue) and wire it intodownload_git_source()Closes: #1048
Pull Request Description
What
Why