feat(sbom): use annotated data types in Pydantic config#1090
feat(sbom): use annotated data types in Pydantic config#1090
Conversation
📝 WalkthroughWalkthroughModels and typedefs for SBOM/PURL configuration were tightened: SbomSettings now uses AnyUrl for Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
tests/test_packagesettings.py (1)
498-569: Solid coverage; one optional edge caseTests follow the existing
test_type_*pattern and exercise normalization + rejection for each typedef. Consider adding one case forSpdxActorrejecting an unknown category (e.g."Robot: Bender") to lock in the regex's closed set ofOrganization|Person|Tool:Optional addition
with pytest.raises(ValueError): ta.validate_python("Organization: ") + with pytest.raises(ValueError): + ta.validate_python("Robot: Bender") with pytest.raises(ValueError): ta.validate_python("")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_packagesettings.py` around lines 498 - 569, Add an extra assertion to test_type_spdx_actor to assert that an unknown SPDX actor category is rejected: use the existing pydantic.TypeAdapter(SpdxActor) in test_type_spdx_actor and add a with pytest.raises(ValueError): ta.validate_python("Robot: Bender") to ensure the SpdxActor regex only allows the closed set Organization|Person|Tool (reference SpdxActor and test_type_spdx_actor).src/fromager/packagesettings/_typedefs.py (2)
104-104:re.DOTALLallows newlines inside actor namesWith
re.DOTALL,\S.*$will happily match"Organization: John\nDoe". SPDX actor names shouldn't span lines. The flag looks unnecessary here — dropping it tightens validation with no downside.Proposed tweak
-_SPDX_ACTOR_RE = re.compile(r"^(Organization|Person|Tool):\s+\S.*$", flags=re.DOTALL) +_SPDX_ACTOR_RE = re.compile(r"^(Organization|Person|Tool):\s+\S.*$")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fromager/packagesettings/_typedefs.py` at line 104, The _SPDX_ACTOR_RE currently uses re.DOTALL which lets \S.* match across newlines (allowing multi-line actor names); remove the flags=re.DOTALL from the re.compile call (i.e., compile without DOTALL) so that \S and . do not match newlines and actor names are constrained to a single line; update the _SPDX_ACTOR_RE definition accordingly.
107-117: Consider Pydantic'sAnyHttpUrlfor stricter URL validation, but current implementation is sufficient.The custom
_validate_urlworks well for your use case (http/https, non-empty host), as confirmed by tests and SBOM generation. Switching to Pydantic'sAnyHttpUrlwould catch edge cases likehttps://@but isn't necessary unless stricter validation becomes a requirement. If you do migrate, useAnnotated[AnyHttpUrl, BeforeValidator(...)]returning the Url object, orAnnotated[AnyHttpUrl, AfterValidator(str)]to keep plain strings (the latter applies normalization first, then converts to string).Also applies to: 138-141, 164-167
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fromager/packagesettings/_typedefs.py` around lines 107 - 117, The custom _validate_url function enforces http/https and non-empty host but could be replaced with Pydantic's AnyHttpUrl for stricter validation; if you choose to migrate, update type annotations where _validate_url is used to Annotated[AnyHttpUrl, BeforeValidator(<normalizer>)] to return a Url object or Annotated[AnyHttpUrl, AfterValidator(str)] to keep plain strings, and remove or delegate logic from _validate_url accordingly (see references to _validate_url and the alternative BeforeValidator/AfterValidator patterns); otherwise leave _validate_url as-is since current tests and SBOM are sufficient.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/fromager/packagesettings/_models.py`:
- Around line 53-68: Update the release notes to document that the SPDX-related
fields in PackageSettings (supplier, creators, namespace, repository_url) now
enforce SPDX 2.3 via strict validators and will raise ValidationError for
non-compliant values; explicitly list the breaking examples given (supplier
"John Smith" must be prefixed with Organization:/Person:/Tool: or be
NOASSERTION, repository_url must include https://, namespace must include
https://) so downstream users know configs may fail to load after the upgrade.
---
Nitpick comments:
In `@src/fromager/packagesettings/_typedefs.py`:
- Line 104: The _SPDX_ACTOR_RE currently uses re.DOTALL which lets \S.* match
across newlines (allowing multi-line actor names); remove the flags=re.DOTALL
from the re.compile call (i.e., compile without DOTALL) so that \S and . do not
match newlines and actor names are constrained to a single line; update the
_SPDX_ACTOR_RE definition accordingly.
- Around line 107-117: The custom _validate_url function enforces http/https and
non-empty host but could be replaced with Pydantic's AnyHttpUrl for stricter
validation; if you choose to migrate, update type annotations where
_validate_url is used to Annotated[AnyHttpUrl, BeforeValidator(<normalizer>)] to
return a Url object or Annotated[AnyHttpUrl, AfterValidator(str)] to keep plain
strings, and remove or delegate logic from _validate_url accordingly (see
references to _validate_url and the alternative BeforeValidator/AfterValidator
patterns); otherwise leave _validate_url as-is since current tests and SBOM are
sufficient.
In `@tests/test_packagesettings.py`:
- Around line 498-569: Add an extra assertion to test_type_spdx_actor to assert
that an unknown SPDX actor category is rejected: use the existing
pydantic.TypeAdapter(SpdxActor) in test_type_spdx_actor and add a with
pytest.raises(ValueError): ta.validate_python("Robot: Bender") to ensure the
SpdxActor regex only allows the closed set Organization|Person|Tool (reference
SpdxActor and test_type_spdx_actor).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bfa425fc-a49f-4523-ac71-4e4b267429d2
📒 Files selected for processing (4)
src/fromager/packagesettings/__init__.pysrc/fromager/packagesettings/_models.pysrc/fromager/packagesettings/_typedefs.pytests/test_packagesettings.py
mprpic
left a comment
There was a problem hiding this comment.
This looks fine overall, but I do have to question the level of validation we're going into. If this is the intent and other parts of the Fromager code base expect this type of detailed validation, then fine with me.
tiran
left a comment
There was a problem hiding this comment.
@smoparth You can get rid of most helper functions and type checks by using Pydantic's built-in constraints, dedicated data types, and after validator instead of before validator. Pydantic has great documentation that explains its validator concept very well.
@mprpic The checks are going to detect misconfiguration at startup, e.g. in fromager lint.
a271eb0 to
523d4fd
Compare
523d4fd to
cdf090c
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/fromager/packagesettings/_models.py`:
- Around line 65-66: The repository_url field currently typed as AnyUrl should
be changed to HttpUrl to restrict validation to http/https; update the type
annotation for repository_url in the model (and the other occurrence of
repository_url later in the file) from AnyUrl | None to HttpUrl | None and
add/ensure an import for HttpUrl from pydantic so the model enforces
HTTP/HTTPS-only repository URLs at validation time.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6d1ce2bf-d742-4143-b7fb-8866001b7c24
📒 Files selected for processing (5)
src/fromager/packagesettings/_models.pysrc/fromager/packagesettings/_typedefs.pysrc/fromager/sbom.pytests/test_packagesettings.pytests/test_sbom.py
✅ Files skipped from review due to trivial changes (1)
- src/fromager/sbom.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_sbom.py
| """Verify global repository_url adds qualifier to downstream but not upstream.""" | ||
| settings = SbomSettings(repository_url="https://packages.redhat.com") | ||
| settings = SbomSettings( | ||
| repository_url="https://packages.redhat.com/simple", # type: ignore[arg-type] |
There was a problem hiding this comment.
Why switch this to add /simple at the end? The repo URL should be generic at this point and is more of a namespace that defines our packages, not an actual pinpointed location.
There was a problem hiding this comment.
@mprpic You're right. I only added /simple to work around AnyUrl normalizing bare hostnames with a trailing slash. I'll revert the URLs back to "https://packages.redhat.com" but update the assertions to expect the normalized form ("https://packages.redhat.com/")??
There was a problem hiding this comment.
Where does it normalize to add the trailing slash? If I assign https://packages.redhat.com to the repo URL variable, does pydantic overwrite that to https://packages.redhat.com/?
There was a problem hiding this comment.
Yes, when the user writes "https://packages.redhat.com" in their settings.yaml YAML, the value that ends up in the SBOM JSON is "https://packages.redhat.com/". Pydantic's AnyUrl does this normalization.
There was a problem hiding this comment.
@mprpic Here's the reference for that behavior: pydantic#7186. AnyUrl adds a trailing / to bare hostnames without a path component.
A workaround from that thread strips it via AfterValidator:
Annotated[AnyUrl, AfterValidator(lambda x: str(x).rstrip("/"))]
This preserves AnyUrl validation without altering the original URL. Should I apply this, or is the trailing slash acceptable?
There was a problem hiding this comment.
Ok, we don't want that then. The value should be left as is specified in the settings. Can we just use the config that disables this behavior: https://pydantic.dev/articles/pydantic-v2-12-release#preserve-empty-url-paths ?
There was a problem hiding this comment.
Yes updated the model config and pinned pydantic>=2.12 in pyproject.toml. URLs are now preserved as-is. PTAL.
@tiran I have added a pin pydantic>=2.12 in pyproject.toml due to this thread.
Replace generic `str` fields in SbomSettings and PurlConfig with Pydantic's built-in types annotated validators to validate input and make the schema self-documenting. Added pydantic>=2.12 minimum version for url_preserve_empty_path support. Changes: - PurlType: StringConstraints with strip, lowercase, min_length - UpstreamPurl: AfterValidator with PackageURL.from_string() - URL fields (namespace, repository_url): use pydantic.AnyUrl directly - supplier/creators: remain plain str per SPDX 2.3 JSON schema The @field_validator on PurlConfig.upstream is replaced by the UpstreamPurl annotated type, moving validation into the type itself. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Closes: #1072 Signed-off-by: Shanmukh Pawan <smoparth@redhat.com>
cdf090c to
8d9b4c0
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test_sbom.py (1)
64-66: 💤 Low value
# type: ignore[arg-type]recurs at everyAnyUrlfield call site — consider a union type in the model.Passing
strto anAnyUrl-typed field is the standard Pydantic runtime coercion pattern, so this works correctly. However, the repeated# type: ignore[arg-type]comments suggest the field definitions in_models.pycould be tightened. Declaring those fields asstr | AnyUrl(or the richerAnnotatedform) would let mypy/pyright accept raw strings without suppressions and make the intent explicit.Also applies to: 87-87, 137-139, 164-166
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_sbom.py` around lines 64 - 66, Tests repeatedly use raw str values for AnyUrl-typed fields (e.g., the repository_url passed to SbomSettings), causing repeated "# type: ignore[arg-type]" comments; update the model types in _models.py by changing the affected fields (such as the repository_url attribute on SbomSettings and the other AnyUrl fields referenced at lines ~87, ~137-139, ~164-166) from AnyUrl to a union allowing raw strings (e.g., str | AnyUrl or typing.Union[str, AnyUrl], or the equivalent Annotated form) so static type checkers accept plain string literals and remove the need for per-call ignores. Ensure imports are added (from typing import Union or use | syntax) and update the field annotations only where the test uses string literals.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_sbom.py`:
- Around line 64-66: Tests repeatedly use raw str values for AnyUrl-typed fields
(e.g., the repository_url passed to SbomSettings), causing repeated "# type:
ignore[arg-type]" comments; update the model types in _models.py by changing the
affected fields (such as the repository_url attribute on SbomSettings and the
other AnyUrl fields referenced at lines ~87, ~137-139, ~164-166) from AnyUrl to
a union allowing raw strings (e.g., str | AnyUrl or typing.Union[str, AnyUrl],
or the equivalent Annotated form) so static type checkers accept plain string
literals and remove the need for per-call ignores. Ensure imports are added
(from typing import Union or use | syntax) and update the field annotations only
where the test uses string literals.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: eadd3ba9-1dc9-40e2-9356-d2be5f351a05
📒 Files selected for processing (6)
pyproject.tomlsrc/fromager/packagesettings/_models.pysrc/fromager/packagesettings/_typedefs.pysrc/fromager/sbom.pytests/test_packagesettings.pytests/test_sbom.py
✅ Files skipped from review due to trivial changes (1)
- src/fromager/sbom.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/fromager/packagesettings/_typedefs.py
- src/fromager/packagesettings/_models.py
Replace generic
strfields in SbomSettings and PurlConfig withPydantic's built-in types annotated validators to validate input and make the schema
self-documenting.
Added pydantic>=2.12 minimum version for url_preserve_empty_path support.
Changes:
The @field_validator on PurlConfig.upstream is replaced by the
UpstreamPurl annotated type, moving validation into the type itself.
Closes: #1072
Pull Request Description