Reimplement notice file generation for third-party licenses through Rust, now with CEF credits#3808
Reimplement notice file generation for third-party licenses through Rust, now with CEF credits#3808timon-schelling merged 16 commits intomasterfrom
Conversation
Summary of ChangesHello @timon-schelling, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the third-party license generation process by introducing a new, unified Rust-based tool. This change streamlines dependency management within the Nix ecosystem, improves the accuracy and reliability of license collection for both Rust and JavaScript dependencies, and integrates the process directly into the build pipeline. The overall impact is a more robust and maintainable system for tracking and presenting third-party software licenses. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-executed refactoring of the third-party license generation process. It replaces a complex Vite/Rollup plugin setup with a dedicated Rust tool, which improves modularity, clarity, and maintainability. The Nix flake configuration has also been cleaned up by removing flake-utils. My review includes a few suggestions to further improve the robustness and efficiency of the new license generation tool and its related Nix expressions.
| fn filter(licenses: &mut Vec<LicenseEntry>) { | ||
| licenses.iter_mut().for_each(|l| { | ||
| l.packages.retain(|p| !(p.authors.len() == 1 && p.authors[0].contains("contact@graphite.art"))); | ||
| }); | ||
| licenses.retain(|l| !l.packages.is_empty()); | ||
| } |
There was a problem hiding this comment.
The author string "contact@graphite.art" is used to filter out first-party crates. This is a "magic string" that could be defined as a constant for better readability and maintainability. This makes it easier to find and change if the contact email ever changes.
| fn filter(licenses: &mut Vec<LicenseEntry>) { | |
| licenses.iter_mut().for_each(|l| { | |
| l.packages.retain(|p| !(p.authors.len() == 1 && p.authors[0].contains("contact@graphite.art"))); | |
| }); | |
| licenses.retain(|l| !l.packages.is_empty()); | |
| } | |
| fn filter(licenses: &mut Vec<LicenseEntry>) { | |
| const FIRST_PARTY_AUTHOR: &str = "contact@graphite.art"; | |
| licenses.iter_mut().for_each(|l| { | |
| l.packages.retain(|p| !(p.authors.len() == 1 && p.authors[0].contains(FIRST_PARTY_AUTHOR))); | |
| }); | |
| licenses.retain(|l| !l.packages.is_empty()); | |
| } |
| fn dedup_by_licence_text(vec: Vec<LicenseEntry>) -> Vec<LicenseEntry> { | ||
| let mut map: BTreeMap<String, LicenseEntry> = BTreeMap::new(); | ||
|
|
||
| for entry in vec { | ||
| match map.entry(entry.text.clone()) { | ||
| std::collections::btree_map::Entry::Occupied(mut e) => { | ||
| e.get_mut().packages.extend(entry.packages); | ||
| } | ||
| std::collections::btree_map::Entry::Vacant(e) => { | ||
| e.insert(entry); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| map.into_values().collect() | ||
| } |
There was a problem hiding this comment.
The current implementation for deduplicating licenses clones the entire license text to use it as a key in the BTreeMap. For large license texts, this can be inefficient in terms of memory and performance.
A more efficient approach would be to use a hash of the license text as the key in a HashMap. This avoids cloning large strings and should be faster for comparisons.
| fn dedup_by_licence_text(vec: Vec<LicenseEntry>) -> Vec<LicenseEntry> { | |
| let mut map: BTreeMap<String, LicenseEntry> = BTreeMap::new(); | |
| for entry in vec { | |
| match map.entry(entry.text.clone()) { | |
| std::collections::btree_map::Entry::Occupied(mut e) => { | |
| e.get_mut().packages.extend(entry.packages); | |
| } | |
| std::collections::btree_map::Entry::Vacant(e) => { | |
| e.insert(entry); | |
| } | |
| } | |
| } | |
| map.into_values().collect() | |
| } | |
| fn dedup_by_licence_text(vec: Vec<LicenseEntry>) -> Vec<LicenseEntry> { | |
| let mut map: std::collections::HashMap<String, LicenseEntry> = std::collections::HashMap::new(); | |
| for entry in vec { | |
| let key = sha256::digest(entry.text.as_bytes()); | |
| match map.entry(key) { | |
| std::collections::hash_map::Entry::Occupied(mut e) => { | |
| e.get_mut().packages.extend(entry.packages); | |
| } | |
| std::collections::hash_map::Entry::Vacant(e) => { | |
| e.insert(entry); | |
| } | |
| } | |
| } | |
| map.into_values().collect() | |
| } |
ff06e6b to
1ed7b1b
Compare
This includes licenses from npm, cargo and newly added cef (parsed from CREDITS.html)
1ed7b1b to
712f7db
Compare
8771efb to
74122f9
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request is a significant and well-executed refactoring of the third-party license generation process. It replaces a complex system with a dedicated Rust tool, improving maintainability and robustness. The accompanying Nix expression refactoring is also a welcome cleanup. I have a few suggestions to improve the new Rust tool's code quality and fix a potential bug in the filtering logic.
|
|
||
| fn filter(licenses: &mut Vec<LicenseEntry>) { | ||
| licenses.iter_mut().for_each(|l| { | ||
| l.packages.retain(|p| !(p.authors.len() == 1 && p.authors[0].contains("contact@graphite.art"))); |
There was a problem hiding this comment.
The current filtering logic for internal packages only checks if there is exactly one author and that author's string contains contact@graphite.art. This might miss cases where an internal package has multiple authors listed. Using iter().any() would be more robust and correctly filter out packages if any of their authors match the internal contact address.
| l.packages.retain(|p| !(p.authors.len() == 1 && p.authors[0].contains("contact@graphite.art"))); | |
| l.packages.retain(|p| !p.authors.iter().any(|author| author.contains("contact@graphite.art"))); |
| .map(|pkg| match &pkg { | ||
| Package { name, authors, url: Some(url) } if !authors.is_empty() => format!("{} - [{}] - {}", name, authors.join(", "), url), | ||
| Package { name, authors: _, url: Some(url) } => format!("{} - {}", name, url), | ||
| Package { name, authors, url: None } if !authors.is_empty() => format!("{} - [{}]", name, authors.join(", ")), | ||
| _ => pkg.name.clone(), | ||
| }) |
There was a problem hiding this comment.
This match statement for formatting the package information is a bit hard to read and maintain due to multiple similar arms with subtle differences. A more imperative approach of building the string step-by-step would be more readable and easier to modify in the future.
| .map(|pkg| match &pkg { | |
| Package { name, authors, url: Some(url) } if !authors.is_empty() => format!("{} - [{}] - {}", name, authors.join(", "), url), | |
| Package { name, authors: _, url: Some(url) } => format!("{} - {}", name, url), | |
| Package { name, authors, url: None } if !authors.is_empty() => format!("{} - [{}]", name, authors.join(", ")), | |
| _ => pkg.name.clone(), | |
| }) | |
| .map(|pkg| { | |
| let mut line = pkg.name.clone(); | |
| if !pkg.authors.is_empty() { | |
| line.push_str(&format!(" - [{}]"), pkg.authors.join(", ")); | |
| } | |
| if let Some(url) = &pkg.url { | |
| line.push_str(&format!(" - {}", url)); | |
| } | |
| line | |
| }) |
| "The package{} listed here {} licensed under the terms of the {} printed beneath", | ||
| if multi { "s" } else { "" }, | ||
| if multi { "are" } else { "is" }, | ||
| if let Some(license) = license.name.as_ref() { license.to_string() } else { "license".to_string() } |
There was a problem hiding this comment.
This expression can be simplified for better readability and conciseness by using as_deref() and unwrap_or().
| if let Some(license) = license.name.as_ref() { license.to_string() } else { "license".to_string() } | |
| if let Some(license) = license.name.as_deref() { license.to_string() } else { "license".to_string() } |
There was a problem hiding this comment.
6 issues found across 40 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name=".nix/pkgs/tools/third-party-licenses.nix">
<violation number="1" location=".nix/pkgs/tools/third-party-licenses.nix:24">
P2: `cargoArtifacts` is attached as an output attribute on the derivation rather than being passed as an input to `buildPackage`. The `//` here merges onto the *result* of `buildPackage`, not its arguments. This is inconsistent with the project's own pattern in `graphite.nix` and with crane's documented best practice. The `cargoArtifacts` should be passed into `buildPackage` so it can reuse the pre-built dependency artifacts, matching the established pattern:
```nix
deps.crane.lib.buildPackage (common // {
cargoArtifacts = deps.crane.lib.buildDepsOnly common;
})
// {
inherit cargoVendorDir;
meta.mainProgram = "third-party-licenses";
}
```</violation>
</file>
<file name="tools/third-party-licenses/src/cef.rs">
<violation number="1" location="tools/third-party-licenses/src/cef.rs:75">
P2: Non-deterministic file selection: if both `credits.html` and `credits.html.xz` exist in the directory, `find()` on `read_dir()` picks whichever the OS enumerates first. Consider preferring one format explicitly (e.g., prefer the uncompressed file if it exists, falling back to `.xz`).</violation>
</file>
<file name="tools/third-party-licenses/src/main.rs">
<violation number="1" location="tools/third-party-licenses/src/main.rs:93">
P2: Multiple `std::process::exit(1)` calls bypass Rust's normal stack unwinding. Consider having `main` return `Result` and using `?` to propagate errors, which enables proper cleanup and is more idiomatic.
(Based on your team's feedback about avoiding `exit()` and preferring returning for cleanup.) [FEEDBACK_USED]</violation>
<violation number="2" location="tools/third-party-licenses/src/main.rs:144">
P1: Bug: `header.len()` returns byte count but is mixed with `.chars().count()` (character count) from package lines when computing `max_len` and padding. If the license name contains non-ASCII characters (e.g., accented letters), this will produce misaligned box-drawing output or panic from arithmetic underflow. Use `.chars().count()` consistently.</violation>
</file>
<file name=".nix/flake.nix">
<violation number="1" location=".nix/flake.nix:56">
P2: `info.cargoVendored` is defined but never used anywhere in the codebase. Either this should be consumed by downstream derivations (e.g., via `inherit (info) cargoVendored;` in `graphite.nix`) or removed to avoid dead code and unnecessary evaluation.</violation>
</file>
<file name="frontend/vite.config.ts">
<violation number="1" location="frontend/vite.config.ts:64">
P2: The `buildStart` hook fires during `vite dev` as well, so `cargo run -p third-party-licenses` will execute on every dev server startup, unnecessarily slowing down development. Add `apply: 'build'` to restrict this plugin to production builds only.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| if let Some(license) = license.name.as_ref() { license.to_string() } else { "license".to_string() } | ||
| ); | ||
|
|
||
| let max_len = std::iter::once(header.len()).chain(package_lines.iter().map(|l| l.chars().count())).max().unwrap_or(0); |
There was a problem hiding this comment.
P1: Bug: header.len() returns byte count but is mixed with .chars().count() (character count) from package lines when computing max_len and padding. If the license name contains non-ASCII characters (e.g., accented letters), this will produce misaligned box-drawing output or panic from arithmetic underflow. Use .chars().count() consistently.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tools/third-party-licenses/src/main.rs, line 144:
<comment>Bug: `header.len()` returns byte count but is mixed with `.chars().count()` (character count) from package lines when computing `max_len` and padding. If the license name contains non-ASCII characters (e.g., accented letters), this will produce misaligned box-drawing output or panic from arithmetic underflow. Use `.chars().count()` consistently.</comment>
<file context>
@@ -0,0 +1,229 @@
+ if let Some(license) = license.name.as_ref() { license.to_string() } else { "license".to_string() }
+ );
+
+ let max_len = std::iter::once(header.len()).chain(package_lines.iter().map(|l| l.chars().count())).max().unwrap_or(0);
+
+ let padded_packages: Vec<String> = package_lines
</file context>
| process::exit(1); | ||
| }) | ||
| .filter_map(|entry| entry.ok()) | ||
| .find(|entry| { |
There was a problem hiding this comment.
P2: Non-deterministic file selection: if both credits.html and credits.html.xz exist in the directory, find() on read_dir() picks whichever the OS enumerates first. Consider preferring one format explicitly (e.g., prefer the uncompressed file if it exists, falling back to .xz).
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tools/third-party-licenses/src/cef.rs, line 75:
<comment>Non-deterministic file selection: if both `credits.html` and `credits.html.xz` exist in the directory, `find()` on `read_dir()` picks whichever the OS enumerates first. Consider preferring one format explicitly (e.g., prefer the uncompressed file if it exists, falling back to `.xz`).</comment>
<file context>
@@ -0,0 +1,105 @@
+ process::exit(1);
+ })
+ .filter_map(|entry| entry.ok())
+ .find(|entry| {
+ let name = entry.file_name();
+ name.eq_ignore_ascii_case("credits.html") || name.eq_ignore_ascii_case("credits.html.xz")
</file context>
| @@ -0,0 +1,229 @@ | |||
| use std::collections::HashMap; | |||
There was a problem hiding this comment.
P2: Multiple std::process::exit(1) calls bypass Rust's normal stack unwinding. Consider having main return Result and using ? to propagate errors, which enables proper cleanup and is more idiomatic.
(Based on your team's feedback about avoiding exit() and preferring returning for cleanup.)
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tools/third-party-licenses/src/main.rs, line 93:
<comment>Multiple `std::process::exit(1)` calls bypass Rust's normal stack unwinding. Consider having `main` return `Result` and using `?` to propagate errors, which enables proper cleanup and is more idiomatic.
(Based on your team's feedback about avoiding `exit()` and preferring returning for cleanup.) </comment>
<file context>
@@ -0,0 +1,229 @@
+ if let Some(parent) = output_path.parent() {
+ fs::create_dir_all(parent).unwrap_or_else(|e| {
+ eprintln!("Failed to create directory {}: {e}", parent.display());
+ std::process::exit(1);
+ });
+ }
</file context>
| src = ./..; | ||
| filter = path: type: !(type == "directory" && builtins.baseNameOf path == ".nix"); | ||
| }; | ||
| cargoVendored = deps.crane.lib.vendorCargoDeps { inherit (info) src; }; |
There was a problem hiding this comment.
P2: info.cargoVendored is defined but never used anywhere in the codebase. Either this should be consumed by downstream derivations (e.g., via inherit (info) cargoVendored; in graphite.nix) or removed to avoid dead code and unnecessary evaluation.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .nix/flake.nix, line 56:
<comment>`info.cargoVendored` is defined but never used anywhere in the codebase. Either this should be consumed by downstream derivations (e.g., via `inherit (info) cargoVendored;` in `graphite.nix`) or removed to avoid dead code and unnecessary evaluation.</comment>
<file context>
@@ -27,143 +26,85 @@
+ src = ./..;
+ filter = path: type: !(type == "directory" && builtins.baseNameOf path == ".nix");
+ };
+ cargoVendored = deps.crane.lib.vendorCargoDeps { inherit (info) src; };
+ };
</file context>
| encoding: "utf8", | ||
| shell: true, | ||
| if (mode !== "native") { | ||
| plugins.push({ |
There was a problem hiding this comment.
P2: The buildStart hook fires during vite dev as well, so cargo run -p third-party-licenses will execute on every dev server startup, unnecessarily slowing down development. Add apply: 'build' to restrict this plugin to production builds only.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At frontend/vite.config.ts, line 64:
<comment>The `buildStart` hook fires during `vite dev` as well, so `cargo run -p third-party-licenses` will execute on every dev server startup, unnecessarily slowing down development. Add `apply: 'build'` to restrict this plugin to production builds only.</comment>
<file context>
@@ -58,374 +25,63 @@ export default defineConfig(({ mode }) => {
- encoding: "utf8",
- shell: true,
+ if (mode !== "native") {
+ plugins.push({
+ name: "third-party-licenses",
+ buildStart() {
</file context>
| plugins.push({ | |
| plugins.push({ | |
| apply: "build", |
Closes #3730