Skip to content

fix: treat local prefix paths as local to prevent remote routing#40

Closed
pipe1os wants to merge 19 commits into
mainfrom
advisor/007-refine-remote-detection
Closed

fix: treat local prefix paths as local to prevent remote routing#40
pipe1os wants to merge 19 commits into
mainfrom
advisor/007-refine-remote-detection

Conversation

@pipe1os

@pipe1os pipe1os commented Jun 27, 2026

Copy link
Copy Markdown
Owner

Summary

Refines the local vs. remote path detection logic to avoid treating nonexistent local path typos (e.g. starting with ./, ../, /, ~) as remote Hugging Face repositories.

Motivation & Context

Currently, the CLI attempts to resolve any nonexistent path that contains a slash as a remote Hugging Face repository. If a user makes a typo on a local path (such as ./typo-path.gguf), the tool attempts a remote HF hub fetch. This leads to confusing API/404/auth errors and unnecessary network requests.

This change ensures that paths starting with ., /, ~ or absolute paths are recognized as local files, producing a clean file-not-found error immediately instead of attempting to fetch from Hugging Face.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactoring (no functional changes, no api changes)
  • Documentation update

How Has This Been Tested?

  • Added unit tests in tests/test_cli.py to verify that analyze_model raises local file errors for missing paths starting with local prefix characters, while still successfully routing standard Hugging Face model IDs to the remote fetcher.

  • Verified test suite passes locally.

  • Unit tests

  • Integration tests

  • Manual testing

Screenshots (if appropriate)

N/A

Checklist

  • My code follows the code style of this project.
  • My commit messages follow the Conventional Commits format, are lowercase, imperative, and specific.
  • I have updated the documentation accordingly (if applicable).
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Summary by CodeRabbit

  • New Features

    • Added support for analyzing remote GGUF models, including single files, grouped quantization variants, and explicit file targets.
    • Enhanced model info output to show GGUF variant details and GPU fit guidance when available.
  • Bug Fixes

    • Improved detection of local vs. remote paths to avoid unnecessary remote lookups.
    • Refined size and metadata reporting for remote models and GGUF headers.
  • Documentation

    • Expanded usage examples with more remote model commands, including GGUF and quantization comparisons.

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@pipe1os, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 51 minutes and 49 seconds. Learn how PR review limits work.

To continue reviewing without waiting, enable usage-based billing in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1cc6f56d-2efb-4599-8c42-55e110e781c6

📥 Commits

Reviewing files that changed from the base of the PR and between 0ef126b and 65ab0c4.

📒 Files selected for processing (2)
  • src/modelinfo/cli.py
  • tests/test_cli.py

Walkthrough

Adds remote GGUF model inspection over HTTP range requests via a new RemoteFileStream. fetch_huggingface_repo is extended to detect and fetch single or grouped GGUF repos, returning a new GGUF_group format. The CLI routing, UI rendering, and README examples are updated accordingly.

Changes

Remote GGUF Support and GGUF_group Rendering

Layer / File(s) Summary
GGUF parser refactored to accept file-like objects
src/modelinfo/parsers/gguf.py
parse_gguf_header now accepts a path or open stream; core logic extracted into _parse_gguf_header_from_stream.
RemoteFileStream and remote GGUF fetchers
src/modelinfo/parsers/huggingface.py
Adds RemoteFileStream (HTTP range-backed stream, 50MB cap), _get_remote_file_size_fallback, _fetch_remote_gguf_single, _fetch_remote_gguf_group, and refactored concurrent sharded SafeTensors fetchers.
fetch_huggingface_repo orchestration
src/modelinfo/parsers/huggingface.py
Splits repo_id into real_repo_id/target_filename for .gguf paths, prefers SafeTensors when present, falls back to GGUF single or group, and adjusts 401/404 error handling.
CLI local/remote routing and disk_size fixes
src/modelinfo/cli.py
analyze_model gains an is_remote flag to avoid misclassifying local paths (./, ../, /, ~/); widens GGUF metadata extraction to include GGUF_group; simplifies disk_size to require local file existence.
GGUF_group quantization variant table in UI
src/modelinfo/ui.py
print_model_info adds a GGUF_group early-return branch rendering a Rich table of variants with VRAM coloring, optional Fits column, and a tip command; replaces hardcoded 0.90 threshold with gpu_util.
Tests and README docs
tests/test_parsers.py, tests/test_cli.py, README.md
Parser tests cover remote GGUF single/group/explicit/401/404 cases; CLI tests cover GGUF_group analysis and local/remote routing; UI tests cover table rendering with and without GPU; README adds multi-command remote usage examples.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • pipe1os/modelinfo-cli#31: Introduced _get_hf_endpoint() and HF_ENDPOINT validation that this PR builds upon for the remote GGUF fetching infrastructure in huggingface.py.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.62% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the primary change: keeping local-prefix paths local to avoid remote Hugging Face routing.
Description check ✅ Passed The PR description follows the template and includes the required summary, motivation, change type, testing, screenshots, and checklist sections.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch advisor/007-refine-remote-detection

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codacy-production

codacy-production Bot commented Jun 27, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

AI Reviewer: first review requested successfully. AI can make mistakes. Always validate suggestions.

Run reviewer

TIP This summary will be updated as you push new changes.

@codacy-production codacy-production Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

The PR successfully implements the intended local path routing logic but introduces substantial unannounced scope by adding remote GGUF support and quantization comparison tables. While the GGUF functionality is beneficial, the implementation contains critical flaws: authentication headers are missing when streaming remote GGUF files (breaking gated model access), and remote model disk sizes can be corrupted by local filesystem metadata if a local directory shares the same name as a repository ID. Additionally, the parser logic in src/modelinfo/parsers/huggingface.py is complex and currently lacks sufficient test coverage. The routing logic itself is partially incomplete, as relative local paths without a leading prefix (e.g., 'subdir/model') will still trigger unnecessary network requests to Hugging Face if the file is missing locally. These security and logic issues should be addressed before merging.

About this PR

  • There is a systemic pattern of code duplication in the test mocks across test_parsers.py and test_cli.py. Extracting these into a shared mock factory would improve long-term maintainability.
  • Significant scope creep: The PR introduces remote GGUF parsing and UI comparison tables which are not mentioned in the PR title or description.
1 comment outside of the diff
src/modelinfo/cli.py

line 133 🟡 MEDIUM RISK
Suggestion: The analyze_model function is becoming a 'God function' that orchestrates too many concerns. Consider refactoring the path resolution logic and the format-specific parsing into separate helper functions.

Test suggestions

  • Routing: Verify paths starting with '.', '/', '~' are treated as local and do not trigger HF fetch when missing.
  • Routing: Verify paths like 'org/model' still trigger Hugging Face remote fetch.
  • Remote GGUF: Fetch and parse header for a single .gguf file in a remote repository.
  • Remote GGUF: Handle repositories with multiple .gguf files by returning a GGUF_group format.
  • Remote GGUF: Fetch specific .gguf file when provided as 'user/repo/filename.gguf'.
  • UI: Render a comparison table for GGUF_group showing fits/vram across all quantizations.
  • Error Handling: Verify 401 (Unauthorized) and 404 (Not Found) responses from HF are handled with descriptive errors.
  • Automated test: Verify that RemoteFileStream correctly includes Hugging Face Authorization headers.
  • Automated test: Verify that disk_size is not overwritten by local directory sizes when repository IDs collide with local path names.
  • Improve unit test coverage for src/modelinfo/parsers/huggingface.py due to high complexity.
Prompt proposal for missing tests
Consider implementing these tests if applicable:
1. Automated test: Verify that RemoteFileStream correctly includes Hugging Face Authorization headers.
2. Automated test: Verify that disk_size is not overwritten by local directory sizes when repository IDs collide with local path names.
3. Improve unit test coverage for src/modelinfo/parsers/huggingface.py due to high complexity.

TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback

Comment thread src/modelinfo/cli.py Outdated
Comment on lines +221 to +222
if os.path.exists(file_path):
disk_size = os.path.getsize(file_path)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM RISK

Prevent overwriting the remote model's disk size with local filesystem metadata when the repository ID matches a local path name.

Comment thread src/modelinfo/cli.py Outdated
or os.path.isabs(file_path)
)
if not is_local_path:
if "/" in file_path or not file_path_lower.endswith((".safetensors", ".gguf", ".pt", ".bin", ".index.json")):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM RISK

Suggestion: This logic still triggers a remote fetch for relative local paths containing a slash (e.g., 'folder/model.gguf') if they are missing locally. Consider treating all paths containing a slash as local if they don't match a repository ID pattern.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
tests/test_parsers.py (1)

119-131: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Make the GGUF-group stub independent of fetch order.

This fake only serves a header for model-q4.gguf, which couples the test to the current internal choice of variant. A harmless change that inspects model-q8.gguf first would fail this test without changing user-visible behavior.

Example tweak
-        elif "model-q4.gguf" in url:
+        elif url.endswith(".gguf"):
             import struct
             header = b"GGUF" + struct.pack("<IQQ", 2, 0, 0)
             return header
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_parsers.py` around lines 119 - 131, The GGUF-group test stub is
tied to one specific variant because fake_make_request only returns a GGUF
header for model-q4.gguf. Update the stub so it serves a valid GGUF header for
any requested model variant in the sibling list, or otherwise make it
independent of which variant is fetched first, while keeping the behavior in
fake_make_request and the parser test setup unchanged.
tests/test_cli.py (1)

307-336: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Make the routing test deterministic instead of swallowing failures.

The loop at Lines 329-334 can hide unrelated regressions, and this test still depends on the real filesystem because os.path.exists is never stubbed. Patch the downstream helpers so the remote cases can complete normally, then assert without except Exception: pass.

One way to tighten it
 def test_analyze_model_local_path_routing(monkeypatch):
     """Test that analyze_model treats paths starting with local prefix as local, raising an error instead of routing to Hugging Face."""
     from modelinfo.parsers import huggingface

+    monkeypatch.setattr(cli.os.path, "exists", lambda _path: False)
+    monkeypatch.setattr(cli, "calculate_footprint", lambda *args, **kwargs: {
+        "total_params": 0,
+        "base_memory_bytes": 0.0,
+        "kv_cache_bytes": 0.0,
+        "overhead_bytes": 0.0,
+        "total_memory_bytes": 0.0,
+        "num_layers": 0,
+        "kv_dim": 0,
+        "primary_dtype": "Unknown",
+        "kv_is_estimate": False,
+        "penalty_percentage": 0.0,
+        "vllm_metrics": {},
+    })
+    monkeypatch.setattr(cli, "identify_architecture_name", lambda *_args, **_kwargs: "Mock")
+
     hf_fetched = []
     def fake_fetch(repo_id, *, fetch_tensors, timeout):
         hf_fetched.append(repo_id)
         return {}, None, "SafeTensors", 0.0
@@
-    for path in remote_paths:
-        try:
-            cli.analyze_model(path, context_override=128)
-        except Exception:
-            # We don't care if calculation fails later because of empty dict from fake_fetch,
-            # we just care that it triggers fetch_huggingface_repo.
-            pass
+    for path in remote_paths:
+        cli.analyze_model(path, context_override=128)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cli.py` around lines 307 - 336, The routing test is masking
failures and still depends on real filesystem behavior, so make it deterministic
by stubbing the downstream helpers used by cli.analyze_model and
modelinfo.parsers.huggingface.fetch_huggingface_repo instead of swallowing
exceptions. Keep the local path assertions around analyze_model, but for the
remote path cases patch the needed helpers so execution completes normally, then
assert directly that fetch_huggingface_repo was called with the expected repo
IDs rather than using a try/except pass.

Source: Linters/SAST tools

src/modelinfo/parsers/huggingface.py (1)

223-223: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Address the Ruff unused-variable warning.

shard is unpacked but never used; prefix it to keep the linter clean.

-            shard, shard_header = future.result()
+            _shard, shard_header = future.result()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/modelinfo/parsers/huggingface.py` at line 223, The unpacking in the
Hugging Face parser leaves an unused variable warning because shard is never
referenced after future.result() returns. Update the assignment in the code path
around the future.result() call in the parser logic to prefix the unused value
with an underscore while keeping shard_header as-is, so Ruff no longer flags it.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/modelinfo/parsers/gguf.py`:
- Around line 58-88: The GGUF parser in the parse routine should not call
struct.unpack directly on potentially short reads from RemoteFileStream.read(),
since EOF/416 can surface as struct.error instead of a GGUF parse failure. Add a
small exact-read helper in the GGUF parser and use it for all fixed-width fields
in this block, including the version, tensor_count, kv_count, key lengths, value
types, name lengths, dimensions, tensor types, and the skipped offset bytes, so
malformed or truncated files consistently raise a deterministic invalid-GGUF
error.

In `@src/modelinfo/parsers/huggingface.py`:
- Line 105: The resolve URL construction in huggingface.py is interpolating
repo_id and filename directly, so paths with spaces or reserved characters can
break requests. Add a small helper that quotes each path component with
urllib.parse.quote(..., safe="/") and use it everywhere the /resolve/main/...
URL is built, including the request creation around the Request call and any
other resolve-path helpers in this module.

In `@src/modelinfo/ui.py`:
- Around line 111-114: The early return after the `console.print(...)` block in
`show_model_info` skips the native-context warning for grouped GGUF repos, so
keep the `GGUF_group` flow going long enough to hit the warning logic later.
Update the `show_model_info` path so grouped repos still reach the `max_context`
check and emit the native-context warning before exiting, using the existing
`GGUF_group` and warning code as the anchor points.
- Around line 59-114: The GGUF_group branch in the UI rendering path is
returning before the --vllm logic can run, so it never uses vllm_metrics and
keeps showing the static VRAM heuristic. Update the GGUF_group handling in the
same function so it respects the existing is_vllm flow used later in the method,
and only uses the grouped GGUF table when --vllm is not enabled. Make sure the
row fit/capacity labels for grouped variants are driven by the vLLM metrics path
when requested, not by the hardcoded total_vram_bytes calculation.

---

Nitpick comments:
In `@src/modelinfo/parsers/huggingface.py`:
- Line 223: The unpacking in the Hugging Face parser leaves an unused variable
warning because shard is never referenced after future.result() returns. Update
the assignment in the code path around the future.result() call in the parser
logic to prefix the unused value with an underscore while keeping shard_header
as-is, so Ruff no longer flags it.

In `@tests/test_cli.py`:
- Around line 307-336: The routing test is masking failures and still depends on
real filesystem behavior, so make it deterministic by stubbing the downstream
helpers used by cli.analyze_model and
modelinfo.parsers.huggingface.fetch_huggingface_repo instead of swallowing
exceptions. Keep the local path assertions around analyze_model, but for the
remote path cases patch the needed helpers so execution completes normally, then
assert directly that fetch_huggingface_repo was called with the expected repo
IDs rather than using a try/except pass.

In `@tests/test_parsers.py`:
- Around line 119-131: The GGUF-group test stub is tied to one specific variant
because fake_make_request only returns a GGUF header for model-q4.gguf. Update
the stub so it serves a valid GGUF header for any requested model variant in the
sibling list, or otherwise make it independent of which variant is fetched
first, while keeping the behavior in fake_make_request and the parser test setup
unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a31eb8a9-0d27-46bc-9482-f0857e2e673a

📥 Commits

Reviewing files that changed from the base of the PR and between 0e82634 and 0ef126b.

📒 Files selected for processing (7)
  • README.md
  • src/modelinfo/cli.py
  • src/modelinfo/parsers/gguf.py
  • src/modelinfo/parsers/huggingface.py
  • src/modelinfo/ui.py
  • tests/test_cli.py
  • tests/test_parsers.py

Comment on lines +58 to +88
magic = f.read(4)
if magic != b"GGUF":
raise ValueError("Invalid GGUF file: Magic bytes missing.")

version = struct.unpack("<I", f.read(4))[0]
if version < 2:
raise ValueError(f"Unsupported GGUF version: {version}")

tensor_count = struct.unpack("<Q", f.read(8))[0]
kv_count = struct.unpack("<Q", f.read(8))[0]

with open(path, "rb") as f:
magic = f.read(4)
if magic != b"GGUF":
raise ValueError("Invalid GGUF file: Magic bytes missing.")

version = struct.unpack("<I", f.read(4))[0]
if version < 2:
raise ValueError(f"Unsupported GGUF version: {version}")

tensor_count = struct.unpack("<Q", f.read(8))[0]
kv_count = struct.unpack("<Q", f.read(8))[0]
metadata = {}
for _ in range(kv_count):
key_len = struct.unpack("<Q", f.read(8))[0]
key_name = f.read(key_len).decode("utf-8")
val_type = struct.unpack("<I", f.read(4))[0]
metadata[key_name] = _read_gguf_value(f, val_type)

metadata = {}
for _ in range(kv_count):
key_len = struct.unpack("<Q", f.read(8))[0]
key_name = f.read(key_len).decode("utf-8")
val_type = struct.unpack("<I", f.read(4))[0]
metadata[key_name] = _read_gguf_value(f, val_type)

tensors["__metadata__"] = metadata

for _ in range(tensor_count):
name_len = struct.unpack("<Q", f.read(8))[0]
name = f.read(name_len).decode("utf-8")

n_dims = struct.unpack("<I", f.read(4))[0]
shape = []
for _ in range(n_dims):
shape.append(struct.unpack("<Q", f.read(8))[0])

t_type = struct.unpack("<I", f.read(4))[0]
f.read(8) # skip offset bytes

# Strict GGUF tensor type mapping
dtype = GGML_TYPE_MAP.get(t_type, "Unknown")

tensors[name] = {"shape": shape, "dtype": dtype}
tensors["__metadata__"] = metadata

for _ in range(tensor_count):
name_len = struct.unpack("<Q", f.read(8))[0]
name = f.read(name_len).decode("utf-8")

n_dims = struct.unpack("<I", f.read(4))[0]
shape = []
for _ in range(n_dims):
shape.append(struct.unpack("<Q", f.read(8))[0])

t_type = struct.unpack("<I", f.read(4))[0]
f.read(8) # skip offset bytes

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Normalize short reads into GGUF parse errors.

RemoteFileStream.read() may return fewer bytes on EOF/416, so these struct.unpack(...) calls can leak struct.error instead of a deterministic invalid-GGUF error. Add a small exact-read helper and use it for fixed-width fields.

Proposed fix
+def _read_exact(f: Any, size: int, field: str) -> bytes:
+    data = f.read(size)
+    if len(data) != size:
+        raise ValueError(f"Invalid GGUF file: truncated while reading {field}.")
+    return data
+
+
 def _parse_gguf_header_from_stream(f: Any) -> Dict[str, Any]:
     tensors: Dict[str, Any] = {}
-    magic = f.read(4)
+    magic = _read_exact(f, 4, "magic")
     if magic != b"GGUF":
         raise ValueError("Invalid GGUF file: Magic bytes missing.")
         
-    version = struct.unpack("<I", f.read(4))[0]
+    version = struct.unpack("<I", _read_exact(f, 4, "version"))[0]

Apply the same helper to the remaining fixed-size reads in this parser.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/modelinfo/parsers/gguf.py` around lines 58 - 88, The GGUF parser in the
parse routine should not call struct.unpack directly on potentially short reads
from RemoteFileStream.read(), since EOF/416 can surface as struct.error instead
of a GGUF parse failure. Add a small exact-read helper in the GGUF parser and
use it for all fixed-width fields in this block, including the version,
tensor_count, kv_count, key lengths, value types, name lengths, dimensions,
tensor types, and the skipped offset bytes, so malformed or truncated files
consistently raise a deterministic invalid-GGUF error.

return json.loads(json_bytes)

def _get_remote_file_size_fallback(repo_id: str, filename: str, timeout: float = 10.0) -> float:
req = urllib.request.Request(f"{_get_hf_endpoint()}/{repo_id}/resolve/main/{filename}", method="HEAD")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

git ls-files src/modelinfo/parsers/huggingface.py
ast-grep outline src/modelinfo/parsers/huggingface.py --view expanded
sed -n '1,260p' src/modelinfo/parsers/huggingface.py | cat -n

Repository: pipe1os/modelinfo-cli

Length of output: 12862


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the remaining portion of the file.
sed -n '260,360p' src/modelinfo/parsers/huggingface.py | cat -n

# Search for any existing URL-encoding helper or quote usage in the parser.
rg -n "quote\(|resolve/main|_hf_resolve_url|urllib\.parse" src/modelinfo/parsers

Repository: pipe1os/modelinfo-cli

Length of output: 5747


Quote Hub resolve path components. Every /resolve/main/... URL in this file interpolates repo_id/filename directly, so files with spaces or other reserved characters can fail to download. Add a small helper that urllib.parse.quote(..., safe="/")s each path component and reuse it for all resolve URLs.

🧰 Tools
🪛 Ruff (0.15.18)

[error] 105-105: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/modelinfo/parsers/huggingface.py` at line 105, The resolve URL
construction in huggingface.py is interpolating repo_id and filename directly,
so paths with spaces or reserved characters can break requests. Add a small
helper that quotes each path component with urllib.parse.quote(..., safe="/")
and use it everywhere the /resolve/main/... URL is built, including the request
creation around the Request call and any other resolve-path helpers in this
module.

Comment thread src/modelinfo/ui.py
Comment on lines +59 to +114
if format_name == "GGUF_group":
metadata = tensors.get("__metadata__", {})
variants = metadata.get("gguf_variants", [])
repo_id = metadata.get("repo_id", "")

console.print(f"[bold]Repository:[/bold] {repo_id}")
console.print("[bold]Format:[/bold] GGUF (Multiple Quantizations)")
console.print(f"[bold]Architecture:[/bold] {arch_name}")
if max_context:
console.print(f"[bold]Context Limit:[/bold] {max_context:,} tokens")
console.print()

table = Table(box=None, show_header=True, header_style="bold", pad_edge=False, padding=(0, 2))
table.add_column("Quantization File")
table.add_column("File Size", justify="right")
table.add_column("KV Cache", justify="right")
table.add_column("Total VRAM", justify="right")

show_fits = gpu_name is not None
if show_fits:
table.add_column("Fits", justify="left")

kv_cache_bytes = footprint["kv_cache_bytes"]
penalty_percentage = footprint.get("penalty_percentage", 0.0)
cuda_overhead = 600 * 1024 * 1024 * gpu_count

sorted_variants = sorted(variants, key=lambda x: x["size"], reverse=True)
for var in sorted_variants:
filename = var["filename"]
size_bytes = var["size"]
variant_overhead = cuda_overhead + (size_bytes * penalty_percentage)
total_vram_bytes = size_bytes + kv_cache_bytes + variant_overhead

file_size_str = format_bytes(size_bytes)
kv_cache_str = format_bytes(kv_cache_bytes)

vram_color = get_vram_color(total_vram_bytes, max_vram_gb)
total_vram_str = f"[{vram_color}]~{format_bytes(total_vram_bytes)}[/{vram_color}]"

row_data = [filename, file_size_str, kv_cache_str, total_vram_str]
if show_fits:
utilization = total_vram_bytes / (max_vram_gb * 1024**3) if max_vram_gb > 0 else 2.0
if utilization <= gpu_util:
fit_text = "[green]✓ Yes[/green]"
elif utilization <= 0.99:
fit_text = "[yellow]⚠ Warning[/yellow]"
else:
fit_text = "[red]✗ No[/red]"
row_data.append(fit_text)

table.add_row(*row_data)

console.print(table)
console.print()
console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]")
return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don't silently ignore --vllm for GGUF_group.

This early return skips the is_vllm path at Lines 172-199, so GGUF_group output never uses vllm_metrics and still labels rows with the static VRAM heuristic. modelinfo <repo> --vllm can therefore show misleading fit/capacity output for grouped GGUF repos.

Possible minimal safeguard
     if format_name == "GGUF_group":
+        if is_vllm:
+            console.print(
+                "[yellow]vLLM capacity planning requires a specific GGUF file. "
+                "Rerun with `modelinfo <repo>/<quant>.gguf --vllm`.[/yellow]"
+            )
+            return
+
         metadata = tensors.get("__metadata__", {})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if format_name == "GGUF_group":
metadata = tensors.get("__metadata__", {})
variants = metadata.get("gguf_variants", [])
repo_id = metadata.get("repo_id", "")
console.print(f"[bold]Repository:[/bold] {repo_id}")
console.print("[bold]Format:[/bold] GGUF (Multiple Quantizations)")
console.print(f"[bold]Architecture:[/bold] {arch_name}")
if max_context:
console.print(f"[bold]Context Limit:[/bold] {max_context:,} tokens")
console.print()
table = Table(box=None, show_header=True, header_style="bold", pad_edge=False, padding=(0, 2))
table.add_column("Quantization File")
table.add_column("File Size", justify="right")
table.add_column("KV Cache", justify="right")
table.add_column("Total VRAM", justify="right")
show_fits = gpu_name is not None
if show_fits:
table.add_column("Fits", justify="left")
kv_cache_bytes = footprint["kv_cache_bytes"]
penalty_percentage = footprint.get("penalty_percentage", 0.0)
cuda_overhead = 600 * 1024 * 1024 * gpu_count
sorted_variants = sorted(variants, key=lambda x: x["size"], reverse=True)
for var in sorted_variants:
filename = var["filename"]
size_bytes = var["size"]
variant_overhead = cuda_overhead + (size_bytes * penalty_percentage)
total_vram_bytes = size_bytes + kv_cache_bytes + variant_overhead
file_size_str = format_bytes(size_bytes)
kv_cache_str = format_bytes(kv_cache_bytes)
vram_color = get_vram_color(total_vram_bytes, max_vram_gb)
total_vram_str = f"[{vram_color}]~{format_bytes(total_vram_bytes)}[/{vram_color}]"
row_data = [filename, file_size_str, kv_cache_str, total_vram_str]
if show_fits:
utilization = total_vram_bytes / (max_vram_gb * 1024**3) if max_vram_gb > 0 else 2.0
if utilization <= gpu_util:
fit_text = "[green]✓ Yes[/green]"
elif utilization <= 0.99:
fit_text = "[yellow]⚠ Warning[/yellow]"
else:
fit_text = "[red]✗ No[/red]"
row_data.append(fit_text)
table.add_row(*row_data)
console.print(table)
console.print()
console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]")
return
if format_name == "GGUF_group":
if is_vllm:
console.print(
"[yellow]vLLM capacity planning requires a specific GGUF file. "
"Rerun with `modelinfo <repo>/<quant>.gguf --vllm`.[/yellow]"
)
return
metadata = tensors.get("__metadata__", {})
variants = metadata.get("gguf_variants", [])
repo_id = metadata.get("repo_id", "")
console.print(f"[bold]Repository:[/bold] {repo_id}")
console.print("[bold]Format:[/bold] GGUF (Multiple Quantizations)")
console.print(f"[bold]Architecture:[/bold] {arch_name}")
if max_context:
console.print(f"[bold]Context Limit:[/bold] {max_context:,} tokens")
console.print()
table = Table(box=None, show_header=True, header_style="bold", pad_edge=False, padding=(0, 2))
table.add_column("Quantization File")
table.add_column("File Size", justify="right")
table.add_column("KV Cache", justify="right")
table.add_column("Total VRAM", justify="right")
show_fits = gpu_name is not None
if show_fits:
table.add_column("Fits", justify="left")
kv_cache_bytes = footprint["kv_cache_bytes"]
penalty_percentage = footprint.get("penalty_percentage", 0.0)
cuda_overhead = 600 * 1024 * 1024 * gpu_count
sorted_variants = sorted(variants, key=lambda x: x["size"], reverse=True)
for var in sorted_variants:
filename = var["filename"]
size_bytes = var["size"]
variant_overhead = cuda_overhead + (size_bytes * penalty_percentage)
total_vram_bytes = size_bytes + kv_cache_bytes + variant_overhead
file_size_str = format_bytes(size_bytes)
kv_cache_str = format_bytes(kv_cache_bytes)
vram_color = get_vram_color(total_vram_bytes, max_vram_gb)
total_vram_str = f"[{vram_color}]~{format_bytes(total_vram_bytes)}[/{vram_color}]"
row_data = [filename, file_size_str, kv_cache_str, total_vram_str]
if show_fits:
utilization = total_vram_bytes / (max_vram_gb * 1024**3) if max_vram_gb > 0 else 2.0
if utilization <= gpu_util:
fit_text = "[green]✓ Yes[/green]"
elif utilization <= 0.99:
fit_text = "[yellow]⚠ Warning[/yellow]"
else:
fit_text = "[red]✗ No[/red]"
row_data.append(fit_text)
table.add_row(*row_data)
console.print(table)
console.print()
console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]")
return
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/modelinfo/ui.py` around lines 59 - 114, The GGUF_group branch in the UI
rendering path is returning before the --vllm logic can run, so it never uses
vllm_metrics and keeps showing the static VRAM heuristic. Update the GGUF_group
handling in the same function so it respects the existing is_vllm flow used
later in the method, and only uses the grouped GGUF table when --vllm is not
enabled. Make sure the row fit/capacity labels for grouped variants are driven
by the vLLM metrics path when requested, not by the hardcoded total_vram_bytes
calculation.

Comment thread src/modelinfo/ui.py
Comment on lines +111 to +114
console.print(table)
console.print()
console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]")
return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Keep the native-context warning on the GGUF_group path.

The return here also bypasses the warning at Lines 216-217, so grouped repos inspected with --context above max_context get no warning at all.

Small fix
         console.print(table)
         console.print()
+        if context_length > 0 and max_context is not None and context_length > max_context:
+            console.print(
+                f"[bold yellow]WARNING: Requested context ({context_length:,}) exceeds model's native limit ({max_context:,}).[/bold yellow]"
+            )
+            console.print()
         console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]")
         return
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
console.print(table)
console.print()
console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]")
return
console.print(table)
console.print()
console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]")
if context_length > 0 and max_context is not None and context_length > max_context:
console.print(
f"[bold yellow]WARNING: Requested context ({context_length:,}) exceeds model's native limit ({max_context:,}).[/bold yellow]"
)
console.print()
return
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/modelinfo/ui.py` around lines 111 - 114, The early return after the
`console.print(...)` block in `show_model_info` skips the native-context warning
for grouped GGUF repos, so keep the `GGUF_group` flow going long enough to hit
the warning logic later. Update the `show_model_info` path so grouped repos
still reach the `max_context` check and emit the native-context warning before
exiting, using the existing `GGUF_group` and warning code as the anchor points.

@pipe1os pipe1os closed this Jun 27, 2026
@pipe1os pipe1os deleted the advisor/007-refine-remote-detection branch June 27, 2026 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant