Skip to content

(GH-538) Define newtypes for tags field#1382

Draft
michaeltlombardi wants to merge 3 commits intoPowerShell:mainfrom
michaeltlombardi:gh-538/main/tag-newtype
Draft

(GH-538) Define newtypes for tags field#1382
michaeltlombardi wants to merge 3 commits intoPowerShell:mainfrom
michaeltlombardi:gh-538/main/tag-newtype

Conversation

@michaeltlombardi
Copy link
Collaborator

@michaeltlombardi michaeltlombardi commented Feb 9, 2026

PR Summary

This change addresses limitation in representing the tags field by defining two newtypes:

  • Tag as a wrapper around String that follows the "parse, don't validate" pattern to move pattern validation into the constructor. It also treats comparisons with tags case-insensitively. The defined JSON Schema for this type is marked as inline because it is always and only used to validate items in the tags field for various structs.
  • TagList as a transparent wrapper around HashSet<Tag>. It provides no additional functionality, it just enables us to define a JSON Schema for this reusable type. The representation here uses HashSet rather than Vec because the items in a tag list must be unique (JSON Schema keyword uniqueItems defined as true).

This change modifies the code for the ResourceManifest and ExtensionManifest structs to use TagList instead of Option<Vec<String>> and updates other code as necessary to address the change in type.

PR Context

Prior to this change, the representation for both resource and extension tags was defined as Option<Vec<String>>. With this construction, we can't define a reusable schema for tag lists (because we don't own the JsonSchema trait or the Option<T> type).

The existing implementation also doesn't adhere to the validation from the canonical JSON Schema for tags, which defines the regex pattern ^\w+$ for tags.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces two newtypes (Tag and TagList) to replace the previous Option<Vec<String>> representation for tags in resource and extension manifests. The change implements the "parse, don't validate" pattern by moving tag validation into the constructor and enables case-insensitive tag comparisons while providing canonical JSON schemas for these types.

Changes:

  • Defines Tag newtype wrapping String with regex validation (^\w+$) and case-insensitive comparison/hashing
  • Defines TagList newtype wrapping HashSet<Tag> to ensure uniqueness and provide reusable schema
  • Updates ResourceManifest and ExtensionManifest to use TagList instead of Option<Vec<String>>
  • Adds comprehensive integration tests following the repository's testing patterns
  • Updates tag filtering logic in list_resources to leverage case-insensitive comparison

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
lib/dsc-lib/src/types/tag.rs Defines Tag newtype with validation pattern ^\w+$, case-insensitive comparison/hashing, and extensive trait implementations
lib/dsc-lib/src/types/tag_list.rs Defines TagList newtype wrapping HashSet<Tag> with ergonomic trait implementations
lib/dsc-lib/src/types/mod.rs Exports new Tag and TagList types
lib/dsc-lib/src/extensions/extension_manifest.rs Changes tags field from Option<Vec<String>> to TagList with #[serde(default)]
lib/dsc-lib/src/dscresources/resource_manifest.rs Changes tags field from Option<Vec<String>> to TagList with #[serde(default)]
lib/dsc-lib/src/dscerror.rs Adds InvalidTag error variant for tag validation failures
lib/dsc-lib/locales/en-us.toml Adds localized error message keys for invalid tag errors
lib/dsc-lib/locales/schemas.definitions.yaml Adds comprehensive documentation for tag and tags schema definitions
dsc/src/subcommand.rs Updates tag filtering to use is_empty() and case-insensitive comparison via Tag type
lib/dsc-lib/tests/integration/types/tag.rs Comprehensive integration tests for Tag covering methods, schema, serde, and traits
lib/dsc-lib/tests/integration/types/tag_list.rs Comprehensive integration tests for TagList covering schema, serde, and traits
lib/dsc-lib/tests/integration/types/mod.rs Registers new test modules for tag and tag_list

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Prior to this change, the representation for both resource and
extension tags was defined as `Option<Vec<String>>`. With this
construction, we can't define a reusable schema for tag lists (because
we don't own the `JsonSchema` trait or the `Option<T>` type).

The existing implementation also doesn't adhere to the validation from
the canonical JSON Schema for tags, which defines the regex pattern
`^\w+$` for tags.

This change addresses these limitations by defining two newtypes:

- `Tag` as a wrapper around `String` that follows the "parse, don't
  validate" pattern to move pattern validation into the constructor. It
  also treats comparisons with tags case-insensitively. The defined
  JSON Schema for this type is marked as `inline` because it is always
  and only used to validate items in the `tags` field for various
  structs.
- `TagList` as a transparent wrapper around `HashSet<Tag>`. It provides
  no additional functionality, it just enables us to define a JSON
  Schema for this reusable type. The representation here uses `HashSet`
  rather than `Vec` because the items in a tag list must be unique
  (JSON Schema keyword `uniqueItems` defined as `true`).

  We can still skip serializing when the `HashSet` is empty, like we can
  skip serializing when an `Option` is `None`.
Prior to this change, the resource manifest struct defined the `tags`
field as `Option<Vec<String>>`. This change updates the field to use
the `TagList` newtype, which wraps `HashSet<Tag>` to provide better
semantics and JSON Schema for the type.

This change ensures that references to the `tags` field are updated
through the code base.
Prior to this change, the extension manifest struct defined the `tags`
field as `Option<Vec<String>>`. This change updates the field to use
the `TagList` newtype, which wraps `HashSet<Tag>` to provide better
semantics and JSON Schema for the type.

The `tags` field isn't directly accessed in any existing code.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +18 to +31
#[derive(Debug, Default, Clone, PartialEq, Eq, Serialize, Deserialize, JsonSchema, DscRepoSchema)]
#[schemars(
title = t!("schemas.definitions.tags.title"),
description = t!("schemas.definitions.tags.description"),
extend(
"markdownDescription" = t!("schemas.definitions.tags.markdownDescription"),
)
)]
#[dsc_repo_schema(
base_name = "tags",
folder_path = "definitions",
)]
pub struct TagList(HashSet<Tag>);

Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TagList derives Serialize over a HashSet<Tag>, which means JSON/YAML serialization order will be non-deterministic (HashSet iteration order varies between runs/platforms). If manifests/resources are ever serialized (e.g., CLI JSON output), this can cause unstable output and flaky tests. Consider implementing a custom Serialize for TagList that emits a sorted array (you already have Ord on Tag), or switch the backing collection to BTreeSet<Tag> for deterministic iteration/serialization.

Copilot uses AI. Check for mistakes.

impl Hash for Tag {
fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
self.0.to_lowercase().hash(state)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tag hashing lowercases by allocating a new String (to_lowercase()) on every hash call. Because Tag is used in HashSet/HashMap, this can become a hot path (multiple hashes per insert/lookup). Consider hashing by iterating over self.0.chars() and feeding each character’s lowercase mapping into the hasher to avoid allocating, while keeping behavior consistent with your case-insensitive equality.

Suggested change
self.0.to_lowercase().hash(state)
for ch in self.0.chars() {
for lower in ch.to_lowercase() {
lower.hash(state);
}
}

Copilot uses AI. Check for mistakes.
#[error("{t} '{0}' - {t2}: '{1}'", t = t!("dscerror.invalidTagPrefix"), t2 = t!("dscerror.invalidTagSuffix"))]
InvalidTag(String, String),

#[error("{t} '{0}' - {t2}: '{1}'", t = t!("dscerror.invalidTypeNamePrefix"), t2 = t!("dscerror.InvalidTypeNameSuffix"))]
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The i18n key for the InvalidTypeName suffix uses dscerror.InvalidTypeNameSuffix (capital I), but the locale file defines dscerror.invalidTypeNameSuffix. This will likely cause the error message to fall back to the raw key instead of the translation. Update the key to match the locale entry (or add the missing key in the locale file).

Suggested change
#[error("{t} '{0}' - {t2}: '{1}'", t = t!("dscerror.invalidTypeNamePrefix"), t2 = t!("dscerror.InvalidTypeNameSuffix"))]
#[error("{t} '{0}' - {t2}: '{1}'", t = t!("dscerror.invalidTypeNamePrefix"), t2 = t!("dscerror.invalidTypeNameSuffix"))]

Copilot uses AI. Check for mistakes.
#[test_case("abc_123_DEF" => matches Ok(_); "mixed valid characters string is valid")]
#[test_case("a.b" => matches Err(_); "string with invalid character raises error")]
#[test_case("" => matches Err(_); "empty string raises error")]
fn str(text: &str)-> Result<Tag, DscError> {
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor formatting nit: keep spacing consistent with the rest of the tests (e.g., add a space before -> in the return type).

Suggested change
fn str(text: &str)-> Result<Tag, DscError> {
fn str(text: &str) -> Result<Tag, DscError> {

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant