feat(graph): add suggest-base subcommand with collection impact analysis#1103
feat(graph): add suggest-base subcommand with collection impact analysis#1103dhellmann wants to merge 1 commit intopython-wheel-build:mainfrom
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughTwo files are modified to introduce a Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 40 minutes and 6 seconds.Comment |
|
Here's some sample output for some downstream collections. Using the command: Output (click to expand)produced this output: |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/fromager/commands/graph.py`:
- Around line 1050-1082: base-only (orphan) packages from the supplied base
graph are not computed or emitted; compute orphan_packages = base_packages -
union(*collections) after base_packages is populated, then pass this
orphan_packages into the output helpers (_suggest_base_json and
_suggest_base_table) so the JSON/table include base packages that no analyzed
collection references; update the calls to _suggest_base_json and
_suggest_base_table to accept the new orphan_packages argument and update those
functions' signatures to render the orphan set (and keep existing fields like
in_base annotation intact).
- Around line 1041-1048: The collections dict is keyed by a non-unique short
name (from _get_collection_name(path)), causing collisions; instead key the dict
by a unique identifier (e.g., the collection file path or its resolved string)
while still computing and storing the short/display name from
_get_collection_name(path) for output. In the loop that iterates
collection_graphs, use a unique key like str(path.resolve()) or path.as_posix()
for collections[key] = pkgs and keep name = _get_collection_name(path) only for
logging/display, ensuring no silent overwrites of other collections.
In `@tests/test_graph_commands.py`:
- Around line 664-688: The tests test_suggest_base_too_few_graphs and
test_suggest_base_invalid_min_collections are currently catching any Exception;
change both pytest.raises calls to specifically expect click.UsageError (e.g.,
pytest.raises(click.UsageError, match="...")) so the validation path is
asserted; also ensure the test module imports click (add "import click" at top
if missing) and keep the existing match strings unchanged to verify the error
message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c3d964ed-e7b1-4950-a7d6-57e1d8d093f0
📒 Files selected for processing (2)
src/fromager/commands/graph.pytests/test_graph_commands.py
Analyzes multiple collection graph files to find packages that appear across >= N collections. These shared packages are candidates for a "base" collection built once and reused across parallel collection builds. Also computes collection impact analysis showing how the proposed base would affect each input collection: remaining package counts per collection and cross-collection counts for remaining packages, helping identify candidates for a secondary base. New public command: fromager graph suggest-base GRAPH [GRAPH ...] Options: --min-collections INT threshold (default: 50% of input graphs, rounded up) --base PATH mark packages already in an existing base graph --format table|json output format (default: table) Implementation: - _get_collection_packages(): load graph, return canonical names - _find_shared_packages(): find overlap, sort by count desc then name asc - _compute_collection_impact(): per-collection remaining package analysis - _suggest_base_table(): rich MARKDOWN table output with impact sections - _suggest_base_json(): structured JSON with metadata, candidates, and collection_impact key - _suggest_base_impl(): testable core extracted from the click command Tests: 13 new unit tests covering helpers, table/JSON output, --base flag, impact analysis, and error cases. Closes: python-wheel-build#973 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
99b5601 to
ce90870
Compare
| @click.argument("collection_graphs", nargs=-1, required=True) | ||
| @click.pass_obj | ||
| def suggest_base( | ||
| wkctx: context.WorkContext, |
There was a problem hiding this comment.
WorkContext is redundant for this command. All required data is captured by cli args and used in same and we don't pass it anywhere from here.
| @click.option( | ||
| "--base", | ||
| "base_graph", | ||
| type=str, |
There was a problem hiding this comment.
Can we use clickext.ClickPath(exists=True) instead of type=str, it gives early validation with a clean Click error message. It also returns pathlib.Path directly, which would eliminate the manual type conversion.
| default="table", | ||
| help="Output format (default: table)", | ||
| ) | ||
| @click.argument("collection_graphs", nargs=-1, required=True) |
There was a problem hiding this comment.
Same suggestion as above to use clickext.ClickPath(exists=True)
| raise click.UsageError("At least 2 collection graphs are required") | ||
| if min_collections is None: | ||
| min_collections = max(2, math.ceil(len(collection_graphs) / 2)) | ||
| elif min_collections < 2: |
There was a problem hiding this comment.
Nit: We can delegate this check to click by using type=click.IntRange(min=2)
| key = str(pathlib.Path(path).resolve()) | ||
| name = _get_collection_name(path) | ||
| pkgs = _get_collection_packages(path) | ||
| if not pkgs: |
There was a problem hiding this comment.
The len(collection_graphs) >= 2 check is done before excluding empty collections, so passing 2 graphs where one is empty leaves only 1 collection. Can we add if len(collections) < 2: raise click.UsageError after the loop too.
| console.print(base_only_table) | ||
|
|
||
|
|
||
| def _suggest_base_json( |
There was a problem hiding this comment.
Nit: Would it be useful to group these parameters into a dataclass (ex: SuggestBaseResult )? The signature is bit wide and it might make things easier to extend later and maintain contract between json output and table output. Just asking to to hear your thoughts.
|
I was planning to work this up, but held off because #1030 hasn't reached consensus yet and wanted to avoid churn. Also, I was thinking the two commands share some helpers, so I was waiting to align on the approach first. Thanks for getting this done though -- glad it's moving forward! 😄 |
Summary
graph suggest-basecommand to identify packages shared acrossmultiple collections as candidates for a base collection
after removing base candidates, and cross-collection counts to help
identify secondary base candidates
collection_impactsectionin both
Test Plan
hatch run test:test tests/test_graph_commands.pyhatch run lint:fix && hatch run mypy:check src/fromager/commands/graph.pyfromager graph suggest-base graphs/*.json 2>/dev/null | lessfromager graph suggest-base --format json graphs/*.json 2>/dev/null | python3 -m json.tool | lessCloses: #973
🤖 Generated with Claude Code