Skip to content

feat(sbom): use annotated data types in Pydantic config#1090

Open
smoparth wants to merge 1 commit intomainfrom
feat/sbom-annotated-types
Open

feat(sbom): use annotated data types in Pydantic config#1090
smoparth wants to merge 1 commit intomainfrom
feat/sbom-annotated-types

Conversation

@smoparth
Copy link
Copy Markdown
Contributor

@smoparth smoparth commented Apr 24, 2026

Replace generic str fields in SbomSettings and PurlConfig with
Pydantic's built-in types annotated validators to validate input and make the schema
self-documenting.

Added pydantic>=2.12 minimum version for url_preserve_empty_path support.

Changes:

  • PurlType: StringConstraints with strip, lowercase, min_length
  • UpstreamPurl: AfterValidator with PackageURL.from_string()
  • URL fields (namespace, repository_url): use pydantic.AnyUrl directly
  • supplier/creators: remain plain str per SPDX 2.3 JSON schema

The @field_validator on PurlConfig.upstream is replaced by the
UpstreamPurl annotated type, moving validation into the type itself.

Closes: #1072

Pull Request Description

@smoparth smoparth requested a review from a team as a code owner April 24, 2026 13:20
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

Models and typedefs for SBOM/PURL configuration were tightened: SbomSettings now uses AnyUrl for namespace and repository_url, and purl_type uses PurlType. PurlConfig fields type, repository_url, and upstream now use PurlType, AnyUrl | None, and UpstreamPurl | None respectively; the per-field upstream validator was removed. New typedefs PurlType (normalizes whitespace/casing) and UpstreamPurl (validates full PURL via PackageURL parsing) were added. SBOM generation coerces URL-like values to strings. Tests and pyproject (pydantic>=2.12) were updated accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately reflects the main change: replacing generic str fields with annotated Pydantic data types in SBOM configuration models.
Linked Issues check ✅ Passed Code changes fully implement issue #1072 objectives: generic str fields are replaced with annotated types (PurlType, UpstreamPurl, AnyUrl), validation is moved into types, and the schema becomes self-documenting.
Out of Scope Changes check ✅ Passed All changes are scoped to issue #1072: type definitions, model field updates, corresponding tests, and minimal string conversions in sbom.py. Pydantic version bump to >=2.12 is required for annotated type support.
Description check ✅ Passed The PR description clearly relates to the changeset, detailing the replacement of generic str fields with Pydantic annotated types and the addition of validation constraints.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify mergify Bot added the ci label Apr 24, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/test_packagesettings.py (1)

498-569: Solid coverage; one optional edge case

Tests follow the existing test_type_* pattern and exercise normalization + rejection for each typedef. Consider adding one case for SpdxActor rejecting an unknown category (e.g. "Robot: Bender") to lock in the regex's closed set of Organization|Person|Tool:

Optional addition
     with pytest.raises(ValueError):
         ta.validate_python("Organization: ")
+    with pytest.raises(ValueError):
+        ta.validate_python("Robot: Bender")
     with pytest.raises(ValueError):
         ta.validate_python("")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_packagesettings.py` around lines 498 - 569, Add an extra assertion
to test_type_spdx_actor to assert that an unknown SPDX actor category is
rejected: use the existing pydantic.TypeAdapter(SpdxActor) in
test_type_spdx_actor and add a with pytest.raises(ValueError):
ta.validate_python("Robot: Bender") to ensure the SpdxActor regex only allows
the closed set Organization|Person|Tool (reference SpdxActor and
test_type_spdx_actor).
src/fromager/packagesettings/_typedefs.py (2)

104-104: re.DOTALL allows newlines inside actor names

With re.DOTALL, \S.*$ will happily match "Organization: John\nDoe". SPDX actor names shouldn't span lines. The flag looks unnecessary here — dropping it tightens validation with no downside.

Proposed tweak
-_SPDX_ACTOR_RE = re.compile(r"^(Organization|Person|Tool):\s+\S.*$", flags=re.DOTALL)
+_SPDX_ACTOR_RE = re.compile(r"^(Organization|Person|Tool):\s+\S.*$")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fromager/packagesettings/_typedefs.py` at line 104, The _SPDX_ACTOR_RE
currently uses re.DOTALL which lets \S.* match across newlines (allowing
multi-line actor names); remove the flags=re.DOTALL from the re.compile call
(i.e., compile without DOTALL) so that \S and . do not match newlines and actor
names are constrained to a single line; update the _SPDX_ACTOR_RE definition
accordingly.

107-117: Consider Pydantic's AnyHttpUrl for stricter URL validation, but current implementation is sufficient.

The custom _validate_url works well for your use case (http/https, non-empty host), as confirmed by tests and SBOM generation. Switching to Pydantic's AnyHttpUrl would catch edge cases like https://@ but isn't necessary unless stricter validation becomes a requirement. If you do migrate, use Annotated[AnyHttpUrl, BeforeValidator(...)] returning the Url object, or Annotated[AnyHttpUrl, AfterValidator(str)] to keep plain strings (the latter applies normalization first, then converts to string).

Also applies to: 138-141, 164-167

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fromager/packagesettings/_typedefs.py` around lines 107 - 117, The custom
_validate_url function enforces http/https and non-empty host but could be
replaced with Pydantic's AnyHttpUrl for stricter validation; if you choose to
migrate, update type annotations where _validate_url is used to
Annotated[AnyHttpUrl, BeforeValidator(<normalizer>)] to return a Url object or
Annotated[AnyHttpUrl, AfterValidator(str)] to keep plain strings, and remove or
delegate logic from _validate_url accordingly (see references to _validate_url
and the alternative BeforeValidator/AfterValidator patterns); otherwise leave
_validate_url as-is since current tests and SBOM are sufficient.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fromager/packagesettings/_models.py`:
- Around line 53-68: Update the release notes to document that the SPDX-related
fields in PackageSettings (supplier, creators, namespace, repository_url) now
enforce SPDX 2.3 via strict validators and will raise ValidationError for
non-compliant values; explicitly list the breaking examples given (supplier
"John Smith" must be prefixed with Organization:/Person:/Tool: or be
NOASSERTION, repository_url must include https://, namespace must include
https://) so downstream users know configs may fail to load after the upgrade.

---

Nitpick comments:
In `@src/fromager/packagesettings/_typedefs.py`:
- Line 104: The _SPDX_ACTOR_RE currently uses re.DOTALL which lets \S.* match
across newlines (allowing multi-line actor names); remove the flags=re.DOTALL
from the re.compile call (i.e., compile without DOTALL) so that \S and . do not
match newlines and actor names are constrained to a single line; update the
_SPDX_ACTOR_RE definition accordingly.
- Around line 107-117: The custom _validate_url function enforces http/https and
non-empty host but could be replaced with Pydantic's AnyHttpUrl for stricter
validation; if you choose to migrate, update type annotations where
_validate_url is used to Annotated[AnyHttpUrl, BeforeValidator(<normalizer>)] to
return a Url object or Annotated[AnyHttpUrl, AfterValidator(str)] to keep plain
strings, and remove or delegate logic from _validate_url accordingly (see
references to _validate_url and the alternative BeforeValidator/AfterValidator
patterns); otherwise leave _validate_url as-is since current tests and SBOM are
sufficient.

In `@tests/test_packagesettings.py`:
- Around line 498-569: Add an extra assertion to test_type_spdx_actor to assert
that an unknown SPDX actor category is rejected: use the existing
pydantic.TypeAdapter(SpdxActor) in test_type_spdx_actor and add a with
pytest.raises(ValueError): ta.validate_python("Robot: Bender") to ensure the
SpdxActor regex only allows the closed set Organization|Person|Tool (reference
SpdxActor and test_type_spdx_actor).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bfa425fc-a49f-4523-ac71-4e4b267429d2

📥 Commits

Reviewing files that changed from the base of the PR and between 05c65ab and a271eb0.

📒 Files selected for processing (4)
  • src/fromager/packagesettings/__init__.py
  • src/fromager/packagesettings/_models.py
  • src/fromager/packagesettings/_typedefs.py
  • tests/test_packagesettings.py

Comment thread src/fromager/packagesettings/_models.py Outdated
@smoparth smoparth requested a review from mprpic April 24, 2026 13:29
Copy link
Copy Markdown
Contributor

@mprpic mprpic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine overall, but I do have to question the level of validation we're going into. If this is the intent and other parts of the Fromager code base expect this type of detailed validation, then fine with me.

Comment thread src/fromager/packagesettings/_typedefs.py Outdated
Comment thread src/fromager/packagesettings/_typedefs.py Outdated
Comment thread src/fromager/packagesettings/_typedefs.py Outdated
Comment thread src/fromager/packagesettings/_typedefs.py
Copy link
Copy Markdown
Collaborator

@tiran tiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smoparth You can get rid of most helper functions and type checks by using Pydantic's built-in constraints, dedicated data types, and after validator instead of before validator. Pydantic has great documentation that explains its validator concept very well.

@mprpic The checks are going to detect misconfiguration at startup, e.g. in fromager lint.

Comment thread src/fromager/packagesettings/_typedefs.py Outdated
Comment thread src/fromager/packagesettings/_typedefs.py Outdated
Comment thread src/fromager/packagesettings/_typedefs.py Outdated
@smoparth smoparth force-pushed the feat/sbom-annotated-types branch from a271eb0 to 523d4fd Compare April 27, 2026 13:37
@smoparth smoparth requested review from mprpic and tiran April 27, 2026 13:51
Comment thread src/fromager/packagesettings/_models.py Outdated
Comment thread src/fromager/packagesettings/_models.py Outdated
Comment thread src/fromager/packagesettings/_typedefs.py Outdated
Comment thread src/fromager/packagesettings/__init__.py
Comment thread src/fromager/packagesettings/_models.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fromager/packagesettings/_models.py`:
- Around line 65-66: The repository_url field currently typed as AnyUrl should
be changed to HttpUrl to restrict validation to http/https; update the type
annotation for repository_url in the model (and the other occurrence of
repository_url later in the file) from AnyUrl | None to HttpUrl | None and
add/ensure an import for HttpUrl from pydantic so the model enforces
HTTP/HTTPS-only repository URLs at validation time.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6d1ce2bf-d742-4143-b7fb-8866001b7c24

📥 Commits

Reviewing files that changed from the base of the PR and between 523d4fd and cdf090c.

📒 Files selected for processing (5)
  • src/fromager/packagesettings/_models.py
  • src/fromager/packagesettings/_typedefs.py
  • src/fromager/sbom.py
  • tests/test_packagesettings.py
  • tests/test_sbom.py
✅ Files skipped from review due to trivial changes (1)
  • src/fromager/sbom.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_sbom.py

Comment thread src/fromager/packagesettings/_models.py
@smoparth smoparth requested a review from tiran April 30, 2026 12:01
Comment thread tests/test_sbom.py Outdated
"""Verify global repository_url adds qualifier to downstream but not upstream."""
settings = SbomSettings(repository_url="https://packages.redhat.com")
settings = SbomSettings(
repository_url="https://packages.redhat.com/simple", # type: ignore[arg-type]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why switch this to add /simple at the end? The repo URL should be generic at this point and is more of a namespace that defines our packages, not an actual pinpointed location.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mprpic You're right. I only added /simple to work around AnyUrl normalizing bare hostnames with a trailing slash. I'll revert the URLs back to "https://packages.redhat.com" but update the assertions to expect the normalized form ("https://packages.redhat.com/")??

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does it normalize to add the trailing slash? If I assign https://packages.redhat.com to the repo URL variable, does pydantic overwrite that to https://packages.redhat.com/?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when the user writes "https://packages.redhat.com" in their settings.yaml YAML, the value that ends up in the SBOM JSON is "https://packages.redhat.com/". Pydantic's AnyUrl does this normalization.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mprpic Here's the reference for that behavior: pydantic#7186. AnyUrl adds a trailing / to bare hostnames without a path component.

A workaround from that thread strips it via AfterValidator:

Annotated[AnyUrl, AfterValidator(lambda x: str(x).rstrip("/"))]
This preserves AnyUrl validation without altering the original URL. Should I apply this, or is the trailing slash acceptable?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we don't want that then. The value should be left as is specified in the settings. Can we just use the config that disables this behavior: https://pydantic.dev/articles/pydantic-v2-12-release#preserve-empty-url-paths ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes updated the model config and pinned pydantic>=2.12 in pyproject.toml. URLs are now preserved as-is. PTAL.

@tiran I have added a pin pydantic>=2.12 in pyproject.toml due to this thread.

@smoparth smoparth requested a review from mprpic April 30, 2026 13:38
Replace generic `str` fields in SbomSettings and PurlConfig with
Pydantic's built-in types annotated validators to validate input and make the schema
self-documenting.

Added pydantic>=2.12 minimum version for url_preserve_empty_path support.

Changes:
- PurlType: StringConstraints with strip, lowercase, min_length
- UpstreamPurl: AfterValidator with PackageURL.from_string()
- URL fields (namespace, repository_url): use pydantic.AnyUrl directly
- supplier/creators: remain plain str per SPDX 2.3 JSON schema

The @field_validator on PurlConfig.upstream is replaced by the
UpstreamPurl annotated type, moving validation into the type itself.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Closes: #1072
Signed-off-by: Shanmukh Pawan <smoparth@redhat.com>
@smoparth smoparth force-pushed the feat/sbom-annotated-types branch from cdf090c to 8d9b4c0 Compare May 4, 2026 20:20
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_sbom.py (1)

64-66: 💤 Low value

# type: ignore[arg-type] recurs at every AnyUrl field call site — consider a union type in the model.

Passing str to an AnyUrl-typed field is the standard Pydantic runtime coercion pattern, so this works correctly. However, the repeated # type: ignore[arg-type] comments suggest the field definitions in _models.py could be tightened. Declaring those fields as str | AnyUrl (or the richer Annotated form) would let mypy/pyright accept raw strings without suppressions and make the intent explicit.

Also applies to: 87-87, 137-139, 164-166

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_sbom.py` around lines 64 - 66, Tests repeatedly use raw str values
for AnyUrl-typed fields (e.g., the repository_url passed to SbomSettings),
causing repeated "# type: ignore[arg-type]" comments; update the model types in
_models.py by changing the affected fields (such as the repository_url attribute
on SbomSettings and the other AnyUrl fields referenced at lines ~87, ~137-139,
~164-166) from AnyUrl to a union allowing raw strings (e.g., str | AnyUrl or
typing.Union[str, AnyUrl], or the equivalent Annotated form) so static type
checkers accept plain string literals and remove the need for per-call ignores.
Ensure imports are added (from typing import Union or use | syntax) and update
the field annotations only where the test uses string literals.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_sbom.py`:
- Around line 64-66: Tests repeatedly use raw str values for AnyUrl-typed fields
(e.g., the repository_url passed to SbomSettings), causing repeated "# type:
ignore[arg-type]" comments; update the model types in _models.py by changing the
affected fields (such as the repository_url attribute on SbomSettings and the
other AnyUrl fields referenced at lines ~87, ~137-139, ~164-166) from AnyUrl to
a union allowing raw strings (e.g., str | AnyUrl or typing.Union[str, AnyUrl],
or the equivalent Annotated form) so static type checkers accept plain string
literals and remove the need for per-call ignores. Ensure imports are added
(from typing import Union or use | syntax) and update the field annotations only
where the test uses string literals.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eadd3ba9-1dc9-40e2-9356-d2be5f351a05

📥 Commits

Reviewing files that changed from the base of the PR and between cdf090c and 8d9b4c0.

📒 Files selected for processing (6)
  • pyproject.toml
  • src/fromager/packagesettings/_models.py
  • src/fromager/packagesettings/_typedefs.py
  • src/fromager/sbom.py
  • tests/test_packagesettings.py
  • tests/test_sbom.py
✅ Files skipped from review due to trivial changes (1)
  • src/fromager/sbom.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/fromager/packagesettings/_typedefs.py
  • src/fromager/packagesettings/_models.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SBOM: use annotated data types in Pydantic config

3 participants