Skip to content

feat(resolver): wire source_resolver into resolver and download pipelines#1202

Open
smoparth wants to merge 2 commits into
python-wheel-build:mainfrom
smoparth:feat/wire-source-resolver
Open

feat(resolver): wire source_resolver into resolver and download pipelines#1202
smoparth wants to merge 2 commits into
python-wheel-build:mainfrom
smoparth:feat/wire-source-resolver

Conversation

@smoparth

@smoparth smoparth commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Activate the source field on PackageSettings and VariantInfo so YAML-configured providers are used by the resolver and download pipelines. This is the wiring layer for the model classes introduced in PR #1052.

  • Uncomment source: SourceResolver | None in PackageSettings and VariantInfo
  • Add PackageBuildInfo.source_resolver property (variant overrides package level)
  • default_resolver_provider() checks pbi.source_resolver before returning the default PyPIProvider
  • default_download_source() detects git+https:// / git+ssh:// URLs and routes to download_git_source()
  • Add GitOptions.remove_dot_git field (default True) and wire it into download_git_source()
  • Update existing test snapshots and add 20 new tests

Closes: #1048

Pull Request Description

What

Why

@smoparth smoparth requested a review from a team as a code owner June 22, 2026 14:00
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@smoparth, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 3 minutes and 4 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8a975682-cb10-41ae-97e2-1a2d06834cc4

📥 Commits

Reviewing files that changed from the base of the PR and between 2284b46 and 9f9f8e2.

📒 Files selected for processing (3)
  • src/fromager/resolver.py
  • src/fromager/sources.py
  • tests/test_source_resolver_wiring.py
📝 Walkthrough

Walkthrough

This PR wires the source resolver model layer into runtime resolution and source downloading. It adds source fields to VariantInfo and PackageSettings, declares GitOptions.remove_dot_git with a default of False, and adds PackageBuildInfo.source_resolver to resolve variant or package source settings. default_resolver_provider() now uses the configured source when present, and default_download_source() detects supported git URLs, clones them directly, and optionally removes .git artifacts. Tests were updated and expanded for the new defaults and flows.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning [#1048] The git download path only shows git+https handling and does not confirm the required git+ssh and full git_options wiring. Add git+ssh detection and ensure submodules, submodule_paths, and remove_dot_git are passed through the git clone path.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed Clear, specific summary of the main source-wiring change.
Description check ✅ Passed The description matches the source-wiring changes in the diff.
Out of Scope Changes check ✅ Passed The resolver signature changes and added tests support the source wiring work and do not look unrelated.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@mergify mergify Bot added the ci label Jun 22, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/fromager/packagesettings/_pbi.py (1)

180-186: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add versionadded metadata to the new public source_resolver property.

This accessor is a new user-facing API surface and should carry a Sphinx .. versionadded:: 0.85 directive for release-doc consistency.

Suggested patch
     def source_resolver(self) -> SourceResolver | None:
         """Effective source resolver for this package and variant.
 
+        .. versionadded:: 0.85
+
         Returns the variant-level ``source`` override if set, otherwise
         the package-level ``source``, or ``None`` when neither is configured.
         """

As per coding guidelines, "Use Sphinx versionadded, versionremoved, versionchanged directives for user-facing changes".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fromager/packagesettings/_pbi.py` around lines 180 - 186, The
`source_resolver` property in the _pbi.py file is a new public API but is
missing the required Sphinx version metadata. Add a `.. versionadded:: 0.85`
directive to the docstring of the `source_resolver` property to document when
this accessor was introduced and maintain consistency with coding guidelines for
user-facing API changes.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fromager/resolver.py`:
- Around line 129-138: The code uses typing.cast to convert req_type to
RequirementType without enforcing that req_type is actually non-None at runtime.
Since source.resolver_provider() requires a non-optional RequirementType
argument, add a runtime validation check before the return statement in the
source resolver block to ensure req_type is not None. If it is None, either
raise an appropriate error or handle it appropriately so that None is never
passed to the source.resolver_provider() call.

In `@src/fromager/sources.py`:
- Around line 207-223: The `_parse_git_url` function returns clone URLs that may
still contain query parameters and fragment identifiers (such as
`#subdirectory`=...), which are invalid for git cloning. When reconstructing the
clone URL in the `_parse_git_url` function using the `_replace()` method on the
parsed URL object before calling `geturl()`, you need to also clear the query
and fragment components by setting them to empty strings in addition to updating
the path. This ensures the returned clone URL contains only the protocol, host,
and path components needed for git operations.

In `@tests/test_source_resolver_wiring.py`:
- Around line 74-77: Replace all real domain names in test URL fixtures with
their `.test` equivalents throughout the test file. Specifically, change domains
like github.com to github.test, pypi.org to pypi.test, and any other real
domains to their corresponding .test TLD versions. This applies to test URL
literals used in test methods including test_is_git_url_ssh and other test
methods that construct URLs as part of their fixtures, ensuring consistent
adherence to test isolation conventions.

---

Nitpick comments:
In `@src/fromager/packagesettings/_pbi.py`:
- Around line 180-186: The `source_resolver` property in the _pbi.py file is a
new public API but is missing the required Sphinx version metadata. Add a `..
versionadded:: 0.85` directive to the docstring of the `source_resolver`
property to document when this accessor was introduced and maintain consistency
with coding guidelines for user-facing API changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0a276ad7-135e-47ad-af4f-351f66a07123

📥 Commits

Reviewing files that changed from the base of the PR and between 24af91e and fac1b7f.

📒 Files selected for processing (6)
  • src/fromager/packagesettings/_models.py
  • src/fromager/packagesettings/_pbi.py
  • src/fromager/resolver.py
  • src/fromager/sources.py
  • tests/test_packagesettings.py
  • tests/test_source_resolver_wiring.py

Comment thread src/fromager/resolver.py Outdated
Comment thread src/fromager/sources.py
Comment on lines +207 to +223
def _parse_git_url(url: str) -> tuple[str, str | None]:
"""Split a VCS URL into clone URL and optional ref.

``git+https://host/repo@ref`` -> ``(https://host/repo, ref)``
``git+https://host/repo`` -> ``(https://host/repo, None)``
"""
clone_url = url
if clone_url.startswith("git+"):
clone_url = clone_url[len("git+") :]

parsed = urlparse(clone_url)
ref: str | None = None
if "@" in parsed.path:
new_path, _, ref = parsed.path.rpartition("@")
clone_url = parsed._replace(path=new_path).geturl()

return clone_url, ref

@coderabbitai coderabbitai Bot Jun 22, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Strip query/fragment from git clone URL in _parse_git_url.

Line 221 can return clone URLs that still contain query/fragment data (for example #subdirectory=...), which is not a valid clone target and can break download_git_source().

Suggested fix
 def _parse_git_url(url: str) -> tuple[str, str | None]:
@@
-    parsed = urlparse(clone_url)
+    parsed = urlparse(clone_url)
+    # Not part of git clone target
+    parsed = parsed._replace(params="", query="", fragment="")
     ref: str | None = None
     if "@" in parsed.path:
         new_path, _, ref = parsed.path.rpartition("@")
         clone_url = parsed._replace(path=new_path).geturl()
+    else:
+        clone_url = parsed.geturl()
 
     return clone_url, ref
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fromager/sources.py` around lines 207 - 223, The `_parse_git_url`
function returns clone URLs that may still contain query parameters and fragment
identifiers (such as `#subdirectory`=...), which are invalid for git cloning. When
reconstructing the clone URL in the `_parse_git_url` function using the
`_replace()` method on the parsed URL object before calling `geturl()`, you need
to also clear the query and fragment components by setting them to empty strings
in addition to updating the path. This ensures the returned clone URL contains
only the protocol, host, and path components needed for git operations.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the third implementation of pip VCS url parser in Fromager. All three implementation have slightly different behavior and subtle bugs. I have created #1215 to introduce a well-defined and tested function to parse a VCS URL into clone URL and git ref.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiran, that's a much better approach than patching individual implementations — consolidating into one well-tested function in #1215 fixes the root cause for all three sites at once. I'll withdraw this suggestion.


✏️ Learnings added
Learnt from: tiran
Repo: python-wheel-build/fromager PR: 1202
File: src/fromager/sources.py:207-223
Timestamp: 2026-06-24T11:25:31.914Z
Learning: In the `fromager` codebase (`python-wheel-build/fromager`), VCS (pip-style `git+https://` / `git+ssh://`) URL parsing must use the single canonical helper introduced in PR `#1215`. There were previously three separate ad-hoc implementations with divergent behavior (in `sources.py` and elsewhere); do not suggest adding new inline parsers — point to the shared utility instead.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment thread tests/test_source_resolver_wiring.py Outdated
@LalatenduMohanty

Copy link
Copy Markdown
Member

@tiran PTAL

Comment thread src/fromager/packagesettings/_models.py
Comment thread src/fromager/packagesettings/_models.py Outdated
@LalatenduMohanty

Copy link
Copy Markdown
Member

The PR looks good to me overall. Lets see if we get review from @tiran .

Comment thread src/fromager/packagesettings/_models.py Outdated
Comment thread src/fromager/packagesettings/_models.py Outdated
Comment thread src/fromager/resolver.py Outdated
Comment thread src/fromager/sources.py Outdated
Comment thread src/fromager/resolver.py
return results[0]


def default_resolver_provider(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is default_resolver_provider the right place to hook into source resolver? How are you going to implement https://fromager.readthedocs.io/en/latest/proposals/new-resolver-config.html#default-behavior-and-hooks ?

@smoparth smoparth Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a separate commit refactor(resolver): check source config before plugin lookup to move the source_resolver check before overrides.find_and_invoke in resolve(), get_source_provider(), and download_source().

When provider != "hook", YAML config is used directly, plugins are skipped for both resolver and download. When provider == "hook" or no source config, the old plugin system runs as before. Let me know what you think.

"at least one source setting must use provider: hook when a plugin exists" needs a runtime check since plugin discovery isn't available at config load time. I'm not sure what to do here, need help with this.

Comment thread src/fromager/sources.py Outdated
Comment thread src/fromager/sources.py

if _is_git_url(url):
clone_url, ref = _parse_git_url(url)
download_path = ctx.work_dir / f"{req.name}-{version}" / f"{req.name}-{version}"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that the same path as sdists_downloads_dir?

@smoparth smoparth Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is sdists_downloads_dir is for downloaded archive files (.tar.gz, .zip). Git clones produce a directory, not an archive, so they go to ctx.work_dir, same path the existing req.url git clone uses.

Comment thread src/fromager/sources.py Outdated
Comment thread src/fromager/sources.py
Comment on lines +207 to +223
def _parse_git_url(url: str) -> tuple[str, str | None]:
"""Split a VCS URL into clone URL and optional ref.

``git+https://host/repo@ref`` -> ``(https://host/repo, ref)``
``git+https://host/repo`` -> ``(https://host/repo, None)``
"""
clone_url = url
if clone_url.startswith("git+"):
clone_url = clone_url[len("git+") :]

parsed = urlparse(clone_url)
ref: str | None = None
if "@" in parsed.path:
new_path, _, ref = parsed.path.rpartition("@")
clone_url = parsed._replace(path=new_path).geturl()

return clone_url, ref

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the third implementation of pip VCS url parser in Fromager. All three implementation have slightly different behavior and subtle bugs. I have created #1215 to introduce a well-defined and tested function to parse a VCS URL into clone URL and git ref.

@smoparth smoparth force-pushed the feat/wire-source-resolver branch from 55aca35 to 3474a69 Compare June 24, 2026 15:01

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fromager/sources.py`:
- Around line 239-247: The git URL branch in default_download_source() is
ignoring configured submodule settings because it calls
gitutils.git_clone_fast(), which does not take submodules arguments. Update this
path to use gitutils.git_clone(..., submodules=...) with the parsed
git_options.submodules and git_options.submodule_paths, or extend
git_clone_fast() to accept and forward those options so submodules are honored.
Use the default_download_source() and
gitutils.git_clone_fast()/gitutils.git_clone symbols to locate the change.

In `@tests/test_source_resolver_wiring.py`:
- Around line 279-302: The default `remove_dot_git` tests are asserting the
opposite of the intended behavior, so update the assertions in
`test_keeps_dot_git_by_default` and the related default-check test to match the
PR’s default-`True` behavior. Use `sources.download_git_source` as the reference
point and change the expectations so the default path verifies `.git` is removed
unless `remove_dot_git=False` is explicitly passed, keeping the tests aligned
with the new contract.
- Around line 76-77: The test `test_is_git_url_ssh_not_supported` is asserting
the wrong contract for `sources._is_git_url`; update it so `git+ssh://` is
expected to be treated as a supported git URL, alongside `git+https://`, and
rename the test if needed to reflect the new behavior. Adjust the assertion in
this test module to match the runtime routing through git cloning rather than
marking SSH URLs unsupported.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ba7175d4-59ef-4683-8272-4f1fa0edcb2b

📥 Commits

Reviewing files that changed from the base of the PR and between fac1b7f and 3474a69.

📒 Files selected for processing (6)
  • src/fromager/packagesettings/_models.py
  • src/fromager/packagesettings/_pbi.py
  • src/fromager/resolver.py
  • src/fromager/sources.py
  • tests/test_packagesettings.py
  • tests/test_source_resolver_wiring.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_packagesettings.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/fromager/packagesettings/_pbi.py
  • src/fromager/resolver.py

Comment thread src/fromager/sources.py
Comment thread tests/test_source_resolver_wiring.py
Comment thread tests/test_source_resolver_wiring.py
…ines

Activate the `source` field on `PackageSettings` and `VariantInfo` so
YAML-configured providers are used by the resolver and download
pipelines.  This is the wiring layer for the model classes introduced
in PR python-wheel-build#1052.

- Uncomment `source: SourceResolver | None` in `PackageSettings` and
  `VariantInfo`
- Add `PackageBuildInfo.source_resolver` property (variant overrides
  package level)
- `default_resolver_provider()` checks `pbi.source_resolver` before
  returning the default `PyPIProvider`
- `default_download_source()` detects `git+https://` URL and routes to `git_clone_fast()` with automatic submodule handling
- Add `GitOptions.remove_dot_git` field (default `False`) and removes
  `.git` recursively including submodules
- Change `resolver_provider()` in `_resolver.py` to accept
  `RequirementType | None` for consistency with all Provider
  constructors; removes need for `typing.cast`
- Update existing test snapshots and add new tests

Closes: python-wheel-build#1048
Co-Authored-By: Claude <claude@anthropic.com>
Signed-off-by: Shanmukh Pawan <smoparth@redhat.com>
@smoparth smoparth force-pushed the feat/wire-source-resolver branch from 3474a69 to 2284b46 Compare June 24, 2026 19:10
Move the source_resolver dispatch from inside default_resolver_provider
(which only runs when no plugin exists) to before
overrides.find_and_invoke in resolve(), get_source_provider(), and
download_source().

This ensures YAML source config takes priority over plugins for both
resolver and download pipelines. When provider is "hook", the dispatch
falls through to the plugin system. All other provider values bypass
plugins entirely.

Co-Authored-By: Claude <claude@anthropic.com>
Signed-off-by: Shanmukh Pawan <smoparth@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wire source_resolver into resolver and download pipelines

3 participants