fix: treat local prefix paths as local to prevent remote routing#40
fix: treat local prefix paths as local to prevent remote routing#40pipe1os wants to merge 19 commits into
Conversation
…er, add error tests
|
Warning Review limit reached
More reviews will be available in 51 minutes and 49 seconds. Learn how PR review limits work. To continue reviewing without waiting, enable usage-based billing in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
WalkthroughAdds remote GGUF model inspection over HTTP range requests via a new ChangesRemote GGUF Support and GGUF_group Rendering
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
There was a problem hiding this comment.
Pull Request Overview
The PR successfully implements the intended local path routing logic but introduces substantial unannounced scope by adding remote GGUF support and quantization comparison tables. While the GGUF functionality is beneficial, the implementation contains critical flaws: authentication headers are missing when streaming remote GGUF files (breaking gated model access), and remote model disk sizes can be corrupted by local filesystem metadata if a local directory shares the same name as a repository ID. Additionally, the parser logic in src/modelinfo/parsers/huggingface.py is complex and currently lacks sufficient test coverage. The routing logic itself is partially incomplete, as relative local paths without a leading prefix (e.g., 'subdir/model') will still trigger unnecessary network requests to Hugging Face if the file is missing locally. These security and logic issues should be addressed before merging.
About this PR
- There is a systemic pattern of code duplication in the test mocks across test_parsers.py and test_cli.py. Extracting these into a shared mock factory would improve long-term maintainability.
- Significant scope creep: The PR introduces remote GGUF parsing and UI comparison tables which are not mentioned in the PR title or description.
1 comment outside of the diff
src/modelinfo/cli.py
line 133🟡 MEDIUM RISK
Suggestion: Theanalyze_modelfunction is becoming a 'God function' that orchestrates too many concerns. Consider refactoring the path resolution logic and the format-specific parsing into separate helper functions.
Test suggestions
- Routing: Verify paths starting with '.', '/', '~' are treated as local and do not trigger HF fetch when missing.
- Routing: Verify paths like 'org/model' still trigger Hugging Face remote fetch.
- Remote GGUF: Fetch and parse header for a single .gguf file in a remote repository.
- Remote GGUF: Handle repositories with multiple .gguf files by returning a GGUF_group format.
- Remote GGUF: Fetch specific .gguf file when provided as 'user/repo/filename.gguf'.
- UI: Render a comparison table for GGUF_group showing fits/vram across all quantizations.
- Error Handling: Verify 401 (Unauthorized) and 404 (Not Found) responses from HF are handled with descriptive errors.
- Automated test: Verify that RemoteFileStream correctly includes Hugging Face Authorization headers.
- Automated test: Verify that disk_size is not overwritten by local directory sizes when repository IDs collide with local path names.
- Improve unit test coverage for src/modelinfo/parsers/huggingface.py due to high complexity.
Prompt proposal for missing tests
Consider implementing these tests if applicable:
1. Automated test: Verify that RemoteFileStream correctly includes Hugging Face Authorization headers.
2. Automated test: Verify that disk_size is not overwritten by local directory sizes when repository IDs collide with local path names.
3. Improve unit test coverage for src/modelinfo/parsers/huggingface.py due to high complexity.
TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback
| if os.path.exists(file_path): | ||
| disk_size = os.path.getsize(file_path) |
There was a problem hiding this comment.
🟡 MEDIUM RISK
Prevent overwriting the remote model's disk size with local filesystem metadata when the repository ID matches a local path name.
| or os.path.isabs(file_path) | ||
| ) | ||
| if not is_local_path: | ||
| if "/" in file_path or not file_path_lower.endswith((".safetensors", ".gguf", ".pt", ".bin", ".index.json")): |
There was a problem hiding this comment.
🟡 MEDIUM RISK
Suggestion: This logic still triggers a remote fetch for relative local paths containing a slash (e.g., 'folder/model.gguf') if they are missing locally. Consider treating all paths containing a slash as local if they don't match a repository ID pattern.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
tests/test_parsers.py (1)
119-131: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winMake the GGUF-group stub independent of fetch order.
This fake only serves a header for
model-q4.gguf, which couples the test to the current internal choice of variant. A harmless change that inspectsmodel-q8.gguffirst would fail this test without changing user-visible behavior.Example tweak
- elif "model-q4.gguf" in url: + elif url.endswith(".gguf"): import struct header = b"GGUF" + struct.pack("<IQQ", 2, 0, 0) return header🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_parsers.py` around lines 119 - 131, The GGUF-group test stub is tied to one specific variant because fake_make_request only returns a GGUF header for model-q4.gguf. Update the stub so it serves a valid GGUF header for any requested model variant in the sibling list, or otherwise make it independent of which variant is fetched first, while keeping the behavior in fake_make_request and the parser test setup unchanged.tests/test_cli.py (1)
307-336: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winMake the routing test deterministic instead of swallowing failures.
The loop at Lines 329-334 can hide unrelated regressions, and this test still depends on the real filesystem because
os.path.existsis never stubbed. Patch the downstream helpers so the remote cases can complete normally, then assert withoutexcept Exception: pass.One way to tighten it
def test_analyze_model_local_path_routing(monkeypatch): """Test that analyze_model treats paths starting with local prefix as local, raising an error instead of routing to Hugging Face.""" from modelinfo.parsers import huggingface + monkeypatch.setattr(cli.os.path, "exists", lambda _path: False) + monkeypatch.setattr(cli, "calculate_footprint", lambda *args, **kwargs: { + "total_params": 0, + "base_memory_bytes": 0.0, + "kv_cache_bytes": 0.0, + "overhead_bytes": 0.0, + "total_memory_bytes": 0.0, + "num_layers": 0, + "kv_dim": 0, + "primary_dtype": "Unknown", + "kv_is_estimate": False, + "penalty_percentage": 0.0, + "vllm_metrics": {}, + }) + monkeypatch.setattr(cli, "identify_architecture_name", lambda *_args, **_kwargs: "Mock") + hf_fetched = [] def fake_fetch(repo_id, *, fetch_tensors, timeout): hf_fetched.append(repo_id) return {}, None, "SafeTensors", 0.0 @@ - for path in remote_paths: - try: - cli.analyze_model(path, context_override=128) - except Exception: - # We don't care if calculation fails later because of empty dict from fake_fetch, - # we just care that it triggers fetch_huggingface_repo. - pass + for path in remote_paths: + cli.analyze_model(path, context_override=128)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_cli.py` around lines 307 - 336, The routing test is masking failures and still depends on real filesystem behavior, so make it deterministic by stubbing the downstream helpers used by cli.analyze_model and modelinfo.parsers.huggingface.fetch_huggingface_repo instead of swallowing exceptions. Keep the local path assertions around analyze_model, but for the remote path cases patch the needed helpers so execution completes normally, then assert directly that fetch_huggingface_repo was called with the expected repo IDs rather than using a try/except pass.Source: Linters/SAST tools
src/modelinfo/parsers/huggingface.py (1)
223-223: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAddress the Ruff unused-variable warning.
shardis unpacked but never used; prefix it to keep the linter clean.- shard, shard_header = future.result() + _shard, shard_header = future.result()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/modelinfo/parsers/huggingface.py` at line 223, The unpacking in the Hugging Face parser leaves an unused variable warning because shard is never referenced after future.result() returns. Update the assignment in the code path around the future.result() call in the parser logic to prefix the unused value with an underscore while keeping shard_header as-is, so Ruff no longer flags it.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/modelinfo/parsers/gguf.py`:
- Around line 58-88: The GGUF parser in the parse routine should not call
struct.unpack directly on potentially short reads from RemoteFileStream.read(),
since EOF/416 can surface as struct.error instead of a GGUF parse failure. Add a
small exact-read helper in the GGUF parser and use it for all fixed-width fields
in this block, including the version, tensor_count, kv_count, key lengths, value
types, name lengths, dimensions, tensor types, and the skipped offset bytes, so
malformed or truncated files consistently raise a deterministic invalid-GGUF
error.
In `@src/modelinfo/parsers/huggingface.py`:
- Line 105: The resolve URL construction in huggingface.py is interpolating
repo_id and filename directly, so paths with spaces or reserved characters can
break requests. Add a small helper that quotes each path component with
urllib.parse.quote(..., safe="/") and use it everywhere the /resolve/main/...
URL is built, including the request creation around the Request call and any
other resolve-path helpers in this module.
In `@src/modelinfo/ui.py`:
- Around line 111-114: The early return after the `console.print(...)` block in
`show_model_info` skips the native-context warning for grouped GGUF repos, so
keep the `GGUF_group` flow going long enough to hit the warning logic later.
Update the `show_model_info` path so grouped repos still reach the `max_context`
check and emit the native-context warning before exiting, using the existing
`GGUF_group` and warning code as the anchor points.
- Around line 59-114: The GGUF_group branch in the UI rendering path is
returning before the --vllm logic can run, so it never uses vllm_metrics and
keeps showing the static VRAM heuristic. Update the GGUF_group handling in the
same function so it respects the existing is_vllm flow used later in the method,
and only uses the grouped GGUF table when --vllm is not enabled. Make sure the
row fit/capacity labels for grouped variants are driven by the vLLM metrics path
when requested, not by the hardcoded total_vram_bytes calculation.
---
Nitpick comments:
In `@src/modelinfo/parsers/huggingface.py`:
- Line 223: The unpacking in the Hugging Face parser leaves an unused variable
warning because shard is never referenced after future.result() returns. Update
the assignment in the code path around the future.result() call in the parser
logic to prefix the unused value with an underscore while keeping shard_header
as-is, so Ruff no longer flags it.
In `@tests/test_cli.py`:
- Around line 307-336: The routing test is masking failures and still depends on
real filesystem behavior, so make it deterministic by stubbing the downstream
helpers used by cli.analyze_model and
modelinfo.parsers.huggingface.fetch_huggingface_repo instead of swallowing
exceptions. Keep the local path assertions around analyze_model, but for the
remote path cases patch the needed helpers so execution completes normally, then
assert directly that fetch_huggingface_repo was called with the expected repo
IDs rather than using a try/except pass.
In `@tests/test_parsers.py`:
- Around line 119-131: The GGUF-group test stub is tied to one specific variant
because fake_make_request only returns a GGUF header for model-q4.gguf. Update
the stub so it serves a valid GGUF header for any requested model variant in the
sibling list, or otherwise make it independent of which variant is fetched
first, while keeping the behavior in fake_make_request and the parser test setup
unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: a31eb8a9-0d27-46bc-9482-f0857e2e673a
📒 Files selected for processing (7)
README.mdsrc/modelinfo/cli.pysrc/modelinfo/parsers/gguf.pysrc/modelinfo/parsers/huggingface.pysrc/modelinfo/ui.pytests/test_cli.pytests/test_parsers.py
| magic = f.read(4) | ||
| if magic != b"GGUF": | ||
| raise ValueError("Invalid GGUF file: Magic bytes missing.") | ||
|
|
||
| version = struct.unpack("<I", f.read(4))[0] | ||
| if version < 2: | ||
| raise ValueError(f"Unsupported GGUF version: {version}") | ||
|
|
||
| tensor_count = struct.unpack("<Q", f.read(8))[0] | ||
| kv_count = struct.unpack("<Q", f.read(8))[0] | ||
|
|
||
| with open(path, "rb") as f: | ||
| magic = f.read(4) | ||
| if magic != b"GGUF": | ||
| raise ValueError("Invalid GGUF file: Magic bytes missing.") | ||
|
|
||
| version = struct.unpack("<I", f.read(4))[0] | ||
| if version < 2: | ||
| raise ValueError(f"Unsupported GGUF version: {version}") | ||
|
|
||
| tensor_count = struct.unpack("<Q", f.read(8))[0] | ||
| kv_count = struct.unpack("<Q", f.read(8))[0] | ||
| metadata = {} | ||
| for _ in range(kv_count): | ||
| key_len = struct.unpack("<Q", f.read(8))[0] | ||
| key_name = f.read(key_len).decode("utf-8") | ||
| val_type = struct.unpack("<I", f.read(4))[0] | ||
| metadata[key_name] = _read_gguf_value(f, val_type) | ||
|
|
||
| metadata = {} | ||
| for _ in range(kv_count): | ||
| key_len = struct.unpack("<Q", f.read(8))[0] | ||
| key_name = f.read(key_len).decode("utf-8") | ||
| val_type = struct.unpack("<I", f.read(4))[0] | ||
| metadata[key_name] = _read_gguf_value(f, val_type) | ||
|
|
||
| tensors["__metadata__"] = metadata | ||
|
|
||
| for _ in range(tensor_count): | ||
| name_len = struct.unpack("<Q", f.read(8))[0] | ||
| name = f.read(name_len).decode("utf-8") | ||
|
|
||
| n_dims = struct.unpack("<I", f.read(4))[0] | ||
| shape = [] | ||
| for _ in range(n_dims): | ||
| shape.append(struct.unpack("<Q", f.read(8))[0]) | ||
|
|
||
| t_type = struct.unpack("<I", f.read(4))[0] | ||
| f.read(8) # skip offset bytes | ||
|
|
||
| # Strict GGUF tensor type mapping | ||
| dtype = GGML_TYPE_MAP.get(t_type, "Unknown") | ||
|
|
||
| tensors[name] = {"shape": shape, "dtype": dtype} | ||
| tensors["__metadata__"] = metadata | ||
|
|
||
| for _ in range(tensor_count): | ||
| name_len = struct.unpack("<Q", f.read(8))[0] | ||
| name = f.read(name_len).decode("utf-8") | ||
|
|
||
| n_dims = struct.unpack("<I", f.read(4))[0] | ||
| shape = [] | ||
| for _ in range(n_dims): | ||
| shape.append(struct.unpack("<Q", f.read(8))[0]) | ||
|
|
||
| t_type = struct.unpack("<I", f.read(4))[0] | ||
| f.read(8) # skip offset bytes |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Normalize short reads into GGUF parse errors.
RemoteFileStream.read() may return fewer bytes on EOF/416, so these struct.unpack(...) calls can leak struct.error instead of a deterministic invalid-GGUF error. Add a small exact-read helper and use it for fixed-width fields.
Proposed fix
+def _read_exact(f: Any, size: int, field: str) -> bytes:
+ data = f.read(size)
+ if len(data) != size:
+ raise ValueError(f"Invalid GGUF file: truncated while reading {field}.")
+ return data
+
+
def _parse_gguf_header_from_stream(f: Any) -> Dict[str, Any]:
tensors: Dict[str, Any] = {}
- magic = f.read(4)
+ magic = _read_exact(f, 4, "magic")
if magic != b"GGUF":
raise ValueError("Invalid GGUF file: Magic bytes missing.")
- version = struct.unpack("<I", f.read(4))[0]
+ version = struct.unpack("<I", _read_exact(f, 4, "version"))[0]Apply the same helper to the remaining fixed-size reads in this parser.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/modelinfo/parsers/gguf.py` around lines 58 - 88, The GGUF parser in the
parse routine should not call struct.unpack directly on potentially short reads
from RemoteFileStream.read(), since EOF/416 can surface as struct.error instead
of a GGUF parse failure. Add a small exact-read helper in the GGUF parser and
use it for all fixed-width fields in this block, including the version,
tensor_count, kv_count, key lengths, value types, name lengths, dimensions,
tensor types, and the skipped offset bytes, so malformed or truncated files
consistently raise a deterministic invalid-GGUF error.
| return json.loads(json_bytes) | ||
|
|
||
| def _get_remote_file_size_fallback(repo_id: str, filename: str, timeout: float = 10.0) -> float: | ||
| req = urllib.request.Request(f"{_get_hf_endpoint()}/{repo_id}/resolve/main/{filename}", method="HEAD") |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
git ls-files src/modelinfo/parsers/huggingface.py
ast-grep outline src/modelinfo/parsers/huggingface.py --view expanded
sed -n '1,260p' src/modelinfo/parsers/huggingface.py | cat -nRepository: pipe1os/modelinfo-cli
Length of output: 12862
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the remaining portion of the file.
sed -n '260,360p' src/modelinfo/parsers/huggingface.py | cat -n
# Search for any existing URL-encoding helper or quote usage in the parser.
rg -n "quote\(|resolve/main|_hf_resolve_url|urllib\.parse" src/modelinfo/parsersRepository: pipe1os/modelinfo-cli
Length of output: 5747
Quote Hub resolve path components. Every /resolve/main/... URL in this file interpolates repo_id/filename directly, so files with spaces or other reserved characters can fail to download. Add a small helper that urllib.parse.quote(..., safe="/")s each path component and reuse it for all resolve URLs.
🧰 Tools
🪛 Ruff (0.15.18)
[error] 105-105: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/modelinfo/parsers/huggingface.py` at line 105, The resolve URL
construction in huggingface.py is interpolating repo_id and filename directly,
so paths with spaces or reserved characters can break requests. Add a small
helper that quotes each path component with urllib.parse.quote(..., safe="/")
and use it everywhere the /resolve/main/... URL is built, including the request
creation around the Request call and any other resolve-path helpers in this
module.
| if format_name == "GGUF_group": | ||
| metadata = tensors.get("__metadata__", {}) | ||
| variants = metadata.get("gguf_variants", []) | ||
| repo_id = metadata.get("repo_id", "") | ||
|
|
||
| console.print(f"[bold]Repository:[/bold] {repo_id}") | ||
| console.print("[bold]Format:[/bold] GGUF (Multiple Quantizations)") | ||
| console.print(f"[bold]Architecture:[/bold] {arch_name}") | ||
| if max_context: | ||
| console.print(f"[bold]Context Limit:[/bold] {max_context:,} tokens") | ||
| console.print() | ||
|
|
||
| table = Table(box=None, show_header=True, header_style="bold", pad_edge=False, padding=(0, 2)) | ||
| table.add_column("Quantization File") | ||
| table.add_column("File Size", justify="right") | ||
| table.add_column("KV Cache", justify="right") | ||
| table.add_column("Total VRAM", justify="right") | ||
|
|
||
| show_fits = gpu_name is not None | ||
| if show_fits: | ||
| table.add_column("Fits", justify="left") | ||
|
|
||
| kv_cache_bytes = footprint["kv_cache_bytes"] | ||
| penalty_percentage = footprint.get("penalty_percentage", 0.0) | ||
| cuda_overhead = 600 * 1024 * 1024 * gpu_count | ||
|
|
||
| sorted_variants = sorted(variants, key=lambda x: x["size"], reverse=True) | ||
| for var in sorted_variants: | ||
| filename = var["filename"] | ||
| size_bytes = var["size"] | ||
| variant_overhead = cuda_overhead + (size_bytes * penalty_percentage) | ||
| total_vram_bytes = size_bytes + kv_cache_bytes + variant_overhead | ||
|
|
||
| file_size_str = format_bytes(size_bytes) | ||
| kv_cache_str = format_bytes(kv_cache_bytes) | ||
|
|
||
| vram_color = get_vram_color(total_vram_bytes, max_vram_gb) | ||
| total_vram_str = f"[{vram_color}]~{format_bytes(total_vram_bytes)}[/{vram_color}]" | ||
|
|
||
| row_data = [filename, file_size_str, kv_cache_str, total_vram_str] | ||
| if show_fits: | ||
| utilization = total_vram_bytes / (max_vram_gb * 1024**3) if max_vram_gb > 0 else 2.0 | ||
| if utilization <= gpu_util: | ||
| fit_text = "[green]✓ Yes[/green]" | ||
| elif utilization <= 0.99: | ||
| fit_text = "[yellow]⚠ Warning[/yellow]" | ||
| else: | ||
| fit_text = "[red]✗ No[/red]" | ||
| row_data.append(fit_text) | ||
|
|
||
| table.add_row(*row_data) | ||
|
|
||
| console.print(table) | ||
| console.print() | ||
| console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]") | ||
| return |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Don't silently ignore --vllm for GGUF_group.
This early return skips the is_vllm path at Lines 172-199, so GGUF_group output never uses vllm_metrics and still labels rows with the static VRAM heuristic. modelinfo <repo> --vllm can therefore show misleading fit/capacity output for grouped GGUF repos.
Possible minimal safeguard
if format_name == "GGUF_group":
+ if is_vllm:
+ console.print(
+ "[yellow]vLLM capacity planning requires a specific GGUF file. "
+ "Rerun with `modelinfo <repo>/<quant>.gguf --vllm`.[/yellow]"
+ )
+ return
+
metadata = tensors.get("__metadata__", {})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if format_name == "GGUF_group": | |
| metadata = tensors.get("__metadata__", {}) | |
| variants = metadata.get("gguf_variants", []) | |
| repo_id = metadata.get("repo_id", "") | |
| console.print(f"[bold]Repository:[/bold] {repo_id}") | |
| console.print("[bold]Format:[/bold] GGUF (Multiple Quantizations)") | |
| console.print(f"[bold]Architecture:[/bold] {arch_name}") | |
| if max_context: | |
| console.print(f"[bold]Context Limit:[/bold] {max_context:,} tokens") | |
| console.print() | |
| table = Table(box=None, show_header=True, header_style="bold", pad_edge=False, padding=(0, 2)) | |
| table.add_column("Quantization File") | |
| table.add_column("File Size", justify="right") | |
| table.add_column("KV Cache", justify="right") | |
| table.add_column("Total VRAM", justify="right") | |
| show_fits = gpu_name is not None | |
| if show_fits: | |
| table.add_column("Fits", justify="left") | |
| kv_cache_bytes = footprint["kv_cache_bytes"] | |
| penalty_percentage = footprint.get("penalty_percentage", 0.0) | |
| cuda_overhead = 600 * 1024 * 1024 * gpu_count | |
| sorted_variants = sorted(variants, key=lambda x: x["size"], reverse=True) | |
| for var in sorted_variants: | |
| filename = var["filename"] | |
| size_bytes = var["size"] | |
| variant_overhead = cuda_overhead + (size_bytes * penalty_percentage) | |
| total_vram_bytes = size_bytes + kv_cache_bytes + variant_overhead | |
| file_size_str = format_bytes(size_bytes) | |
| kv_cache_str = format_bytes(kv_cache_bytes) | |
| vram_color = get_vram_color(total_vram_bytes, max_vram_gb) | |
| total_vram_str = f"[{vram_color}]~{format_bytes(total_vram_bytes)}[/{vram_color}]" | |
| row_data = [filename, file_size_str, kv_cache_str, total_vram_str] | |
| if show_fits: | |
| utilization = total_vram_bytes / (max_vram_gb * 1024**3) if max_vram_gb > 0 else 2.0 | |
| if utilization <= gpu_util: | |
| fit_text = "[green]✓ Yes[/green]" | |
| elif utilization <= 0.99: | |
| fit_text = "[yellow]⚠ Warning[/yellow]" | |
| else: | |
| fit_text = "[red]✗ No[/red]" | |
| row_data.append(fit_text) | |
| table.add_row(*row_data) | |
| console.print(table) | |
| console.print() | |
| console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]") | |
| return | |
| if format_name == "GGUF_group": | |
| if is_vllm: | |
| console.print( | |
| "[yellow]vLLM capacity planning requires a specific GGUF file. " | |
| "Rerun with `modelinfo <repo>/<quant>.gguf --vllm`.[/yellow]" | |
| ) | |
| return | |
| metadata = tensors.get("__metadata__", {}) | |
| variants = metadata.get("gguf_variants", []) | |
| repo_id = metadata.get("repo_id", "") | |
| console.print(f"[bold]Repository:[/bold] {repo_id}") | |
| console.print("[bold]Format:[/bold] GGUF (Multiple Quantizations)") | |
| console.print(f"[bold]Architecture:[/bold] {arch_name}") | |
| if max_context: | |
| console.print(f"[bold]Context Limit:[/bold] {max_context:,} tokens") | |
| console.print() | |
| table = Table(box=None, show_header=True, header_style="bold", pad_edge=False, padding=(0, 2)) | |
| table.add_column("Quantization File") | |
| table.add_column("File Size", justify="right") | |
| table.add_column("KV Cache", justify="right") | |
| table.add_column("Total VRAM", justify="right") | |
| show_fits = gpu_name is not None | |
| if show_fits: | |
| table.add_column("Fits", justify="left") | |
| kv_cache_bytes = footprint["kv_cache_bytes"] | |
| penalty_percentage = footprint.get("penalty_percentage", 0.0) | |
| cuda_overhead = 600 * 1024 * 1024 * gpu_count | |
| sorted_variants = sorted(variants, key=lambda x: x["size"], reverse=True) | |
| for var in sorted_variants: | |
| filename = var["filename"] | |
| size_bytes = var["size"] | |
| variant_overhead = cuda_overhead + (size_bytes * penalty_percentage) | |
| total_vram_bytes = size_bytes + kv_cache_bytes + variant_overhead | |
| file_size_str = format_bytes(size_bytes) | |
| kv_cache_str = format_bytes(kv_cache_bytes) | |
| vram_color = get_vram_color(total_vram_bytes, max_vram_gb) | |
| total_vram_str = f"[{vram_color}]~{format_bytes(total_vram_bytes)}[/{vram_color}]" | |
| row_data = [filename, file_size_str, kv_cache_str, total_vram_str] | |
| if show_fits: | |
| utilization = total_vram_bytes / (max_vram_gb * 1024**3) if max_vram_gb > 0 else 2.0 | |
| if utilization <= gpu_util: | |
| fit_text = "[green]✓ Yes[/green]" | |
| elif utilization <= 0.99: | |
| fit_text = "[yellow]⚠ Warning[/yellow]" | |
| else: | |
| fit_text = "[red]✗ No[/red]" | |
| row_data.append(fit_text) | |
| table.add_row(*row_data) | |
| console.print(table) | |
| console.print() | |
| console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]") | |
| return |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/modelinfo/ui.py` around lines 59 - 114, The GGUF_group branch in the UI
rendering path is returning before the --vllm logic can run, so it never uses
vllm_metrics and keeps showing the static VRAM heuristic. Update the GGUF_group
handling in the same function so it respects the existing is_vllm flow used
later in the method, and only uses the grouped GGUF table when --vllm is not
enabled. Make sure the row fit/capacity labels for grouped variants are driven
by the vLLM metrics path when requested, not by the hardcoded total_vram_bytes
calculation.
| console.print(table) | ||
| console.print() | ||
| console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]") | ||
| return |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Keep the native-context warning on the GGUF_group path.
The return here also bypasses the warning at Lines 216-217, so grouped repos inspected with --context above max_context get no warning at all.
Small fix
console.print(table)
console.print()
+ if context_length > 0 and max_context is not None and context_length > max_context:
+ console.print(
+ f"[bold yellow]WARNING: Requested context ({context_length:,}) exceeds model's native limit ({max_context:,}).[/bold yellow]"
+ )
+ console.print()
console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]")
return📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| console.print(table) | |
| console.print() | |
| console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]") | |
| return | |
| console.print(table) | |
| console.print() | |
| console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]") | |
| if context_length > 0 and max_context is not None and context_length > max_context: | |
| console.print( | |
| f"[bold yellow]WARNING: Requested context ({context_length:,}) exceeds model's native limit ({max_context:,}).[/bold yellow]" | |
| ) | |
| console.print() | |
| return |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/modelinfo/ui.py` around lines 111 - 114, The early return after the
`console.print(...)` block in `show_model_info` skips the native-context warning
for grouped GGUF repos, so keep the `GGUF_group` flow going long enough to hit
the warning logic later. Update the `show_model_info` path so grouped repos
still reach the `max_context` check and emit the native-context warning before
exiting, using the existing `GGUF_group` and warning code as the anchor points.
Summary
Refines the local vs. remote path detection logic to avoid treating nonexistent local path typos (e.g. starting with
./,../,/,~) as remote Hugging Face repositories.Motivation & Context
Currently, the CLI attempts to resolve any nonexistent path that contains a slash as a remote Hugging Face repository. If a user makes a typo on a local path (such as
./typo-path.gguf), the tool attempts a remote HF hub fetch. This leads to confusing API/404/auth errors and unnecessary network requests.This change ensures that paths starting with
.,/,~or absolute paths are recognized as local files, producing a clean file-not-found error immediately instead of attempting to fetch from Hugging Face.Type of Change
How Has This Been Tested?
Added unit tests in
tests/test_cli.pyto verify thatanalyze_modelraises local file errors for missing paths starting with local prefix characters, while still successfully routing standard Hugging Face model IDs to the remote fetcher.Verified test suite passes locally.
Unit tests
Integration tests
Manual testing
Screenshots (if appropriate)
N/A
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Documentation