fix: treat local prefix paths as local to prevent remote routing by pipe1os · Pull Request #40 · pipe1os/modelinfo-cli

pipe1os · 2026-06-27T15:50:33Z

Summary

Refines the local vs. remote path detection logic to avoid treating nonexistent local path typos (e.g. starting with ./, ../, /, ~) as remote Hugging Face repositories.

Motivation & Context

Currently, the CLI attempts to resolve any nonexistent path that contains a slash as a remote Hugging Face repository. If a user makes a typo on a local path (such as ./typo-path.gguf), the tool attempts a remote HF hub fetch. This leads to confusing API/404/auth errors and unnecessary network requests.

This change ensures that paths starting with ., /, ~ or absolute paths are recognized as local files, producing a clean file-not-found error immediately instead of attempting to fetch from Hugging Face.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactoring (no functional changes, no api changes)
Documentation update

How Has This Been Tested?

Added unit tests in tests/test_cli.py to verify that analyze_model raises local file errors for missing paths starting with local prefix characters, while still successfully routing standard Hugging Face model IDs to the remote fetcher.
Verified test suite passes locally.
Unit tests
Integration tests
Manual testing

Screenshots (if appropriate)

N/A

Checklist

My code follows the code style of this project.
My commit messages follow the Conventional Commits format, are lowercase, imperative, and specific.
I have updated the documentation accordingly (if applicable).
I have added tests to cover my changes.
All new and existing tests passed.

Summary by CodeRabbit

New Features
- Added support for analyzing remote GGUF models, including single files, grouped quantization variants, and explicit file targets.
- Enhanced model info output to show GGUF variant details and GPU fit guidance when available.
Bug Fixes
- Improved detection of local vs. remote paths to avoid unnecessary remote lookups.
- Refined size and metadata reporting for remote models and GGUF headers.
Documentation
- Expanded usage examples with more remote model commands, including GGUF and quantization comparisons.

…er, add error tests

…xity

… remote routing

coderabbitai · 2026-06-27T15:50:45Z

Warning

Review limit reached

@pipe1os, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 51 minutes and 49 seconds. Learn how PR review limits work.

To continue reviewing without waiting, enable usage-based billing in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1cc6f56d-2efb-4599-8c42-55e110e781c6

📥 Commits

Reviewing files that changed from the base of the PR and between 0ef126b and 65ab0c4.

📒 Files selected for processing (2)

src/modelinfo/cli.py
tests/test_cli.py

Walkthrough

Adds remote GGUF model inspection over HTTP range requests via a new RemoteFileStream. fetch_huggingface_repo is extended to detect and fetch single or grouped GGUF repos, returning a new GGUF_group format. The CLI routing, UI rendering, and README examples are updated accordingly.

Changes

Remote GGUF Support and GGUF_group Rendering

Layer / File(s)	Summary
GGUF parser refactored to accept file-like objects `src/modelinfo/parsers/gguf.py`	`parse_gguf_header` now accepts a path or open stream; core logic extracted into `_parse_gguf_header_from_stream`.
RemoteFileStream and remote GGUF fetchers `src/modelinfo/parsers/huggingface.py`	Adds `RemoteFileStream` (HTTP range-backed stream, 50MB cap), `_get_remote_file_size_fallback`, `_fetch_remote_gguf_single`, `_fetch_remote_gguf_group`, and refactored concurrent sharded SafeTensors fetchers.
`fetch_huggingface_repo` orchestration `src/modelinfo/parsers/huggingface.py`	Splits `repo_id` into `real_repo_id`/`target_filename` for `.gguf` paths, prefers SafeTensors when present, falls back to GGUF single or group, and adjusts 401/404 error handling.
CLI local/remote routing and `disk_size` fixes `src/modelinfo/cli.py`	`analyze_model` gains an `is_remote` flag to avoid misclassifying local paths (`./`, `../`, `/`, `~/`); widens GGUF metadata extraction to include `GGUF_group`; simplifies `disk_size` to require local file existence.
`GGUF_group` quantization variant table in UI `src/modelinfo/ui.py`	`print_model_info` adds a `GGUF_group` early-return branch rendering a Rich table of variants with VRAM coloring, optional Fits column, and a tip command; replaces hardcoded `0.90` threshold with `gpu_util`.
Tests and README docs `tests/test_parsers.py`, `tests/test_cli.py`, `README.md`	Parser tests cover remote GGUF single/group/explicit/401/404 cases; CLI tests cover `GGUF_group` analysis and local/remote routing; UI tests cover table rendering with and without GPU; README adds multi-command remote usage examples.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

pipe1os/modelinfo-cli#31: Introduced _get_hf_endpoint() and HF_ENDPOINT validation that this PR builds upon for the remote GGUF fetching infrastructure in huggingface.py.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 47.62% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the primary change: keeping local-prefix paths local to avoid remote Hugging Face routing.
Description check	✅ Passed	The PR description follows the template and includes the required summary, motivation, change type, testing, screenshots, and checklist sections.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch advisor/007-refine-remote-detection

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

codacy-production · 2026-06-27T15:51:29Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

AI Reviewer: first review requested successfully. AI can make mistakes. Always validate suggestions.

_{TIP This summary will be updated as you push new changes.}

codacy-production

Pull Request Overview

The PR successfully implements the intended local path routing logic but introduces substantial unannounced scope by adding remote GGUF support and quantization comparison tables. While the GGUF functionality is beneficial, the implementation contains critical flaws: authentication headers are missing when streaming remote GGUF files (breaking gated model access), and remote model disk sizes can be corrupted by local filesystem metadata if a local directory shares the same name as a repository ID. Additionally, the parser logic in src/modelinfo/parsers/huggingface.py is complex and currently lacks sufficient test coverage. The routing logic itself is partially incomplete, as relative local paths without a leading prefix (e.g., 'subdir/model') will still trigger unnecessary network requests to Hugging Face if the file is missing locally. These security and logic issues should be addressed before merging.

About this PR

There is a systemic pattern of code duplication in the test mocks across test_parsers.py and test_cli.py. Extracting these into a shared mock factory would improve long-term maintainability.
Significant scope creep: The PR introduces remote GGUF parsing and UI comparison tables which are not mentioned in the PR title or description.

1 comment outside of the diff

src/modelinfo/cli.py

_{line 133 🟡 MEDIUM RISK}
Suggestion: The analyze_model function is becoming a 'God function' that orchestrates too many concerns. Consider refactoring the path resolution logic and the format-specific parsing into separate helper functions.

Test suggestions

Prompt proposal for missing tests

Consider implementing these tests if applicable:
1. Automated test: Verify that RemoteFileStream correctly includes Hugging Face Authorization headers.
2. Automated test: Verify that disk_size is not overwritten by local directory sizes when repository IDs collide with local path names.
3. Improve unit test coverage for src/modelinfo/parsers/huggingface.py due to high complexity.

_{TIP Improve review quality by adding custom instructions}
_{TIP How was this review? Give us feedback}

codacy-production · 2026-06-27T15:53:21Z

+    if os.path.exists(file_path):
+        disk_size = os.path.getsize(file_path)


_{🟡 MEDIUM RISK}

Prevent overwriting the remote model's disk size with local filesystem metadata when the repository ID matches a local path name.

codacy-production · 2026-06-27T15:53:21Z

+            or os.path.isabs(file_path)
+        )
+        if not is_local_path:
+            if "/" in file_path or not file_path_lower.endswith((".safetensors", ".gguf", ".pt", ".bin", ".index.json")):


_{🟡 MEDIUM RISK}

Suggestion: This logic still triggers a remote fetch for relative local paths containing a slash (e.g., 'folder/model.gguf') if they are missing locally. Consider treating all paths containing a slash as local if they don't match a repository ID pattern.

…, and test helper

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (3)

tests/test_parsers.py (1)

119-131: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Make the GGUF-group stub independent of fetch order.

This fake only serves a header for model-q4.gguf, which couples the test to the current internal choice of variant. A harmless change that inspects model-q8.gguf first would fail this test without changing user-visible behavior.
Example tweak
-        elif "model-q4.gguf" in url:
+        elif url.endswith(".gguf"):
             import struct
             header = b"GGUF" + struct.pack("<IQQ", 2, 0, 0)
             return header
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_parsers.py` around lines 119 - 131, The GGUF-group test stub is
tied to one specific variant because fake_make_request only returns a GGUF
header for model-q4.gguf. Update the stub so it serves a valid GGUF header for
any requested model variant in the sibling list, or otherwise make it
independent of which variant is fetched first, while keeping the behavior in
fake_make_request and the parser test setup unchanged.

tests/test_cli.py (1)

307-336: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Make the routing test deterministic instead of swallowing failures.

The loop at Lines 329-334 can hide unrelated regressions, and this test still depends on the real filesystem because os.path.exists is never stubbed. Patch the downstream helpers so the remote cases can complete normally, then assert without except Exception: pass.

One way to tighten it

 def test_analyze_model_local_path_routing(monkeypatch):
     """Test that analyze_model treats paths starting with local prefix as local, raising an error instead of routing to Hugging Face."""
     from modelinfo.parsers import huggingface

+    monkeypatch.setattr(cli.os.path, "exists", lambda _path: False)
+    monkeypatch.setattr(cli, "calculate_footprint", lambda *args, **kwargs: {
+        "total_params": 0,
+        "base_memory_bytes": 0.0,
+        "kv_cache_bytes": 0.0,
+        "overhead_bytes": 0.0,
+        "total_memory_bytes": 0.0,
+        "num_layers": 0,
+        "kv_dim": 0,
+        "primary_dtype": "Unknown",
+        "kv_is_estimate": False,
+        "penalty_percentage": 0.0,
+        "vllm_metrics": {},
+    })
+    monkeypatch.setattr(cli, "identify_architecture_name", lambda *_args, **_kwargs: "Mock")
+
     hf_fetched = []
     def fake_fetch(repo_id, *, fetch_tensors, timeout):
         hf_fetched.append(repo_id)
         return {}, None, "SafeTensors", 0.0
@@
-    for path in remote_paths:
-        try:
-            cli.analyze_model(path, context_override=128)
-        except Exception:
-            # We don't care if calculation fails later because of empty dict from fake_fetch,
-            # we just care that it triggers fetch_huggingface_repo.
-            pass
+    for path in remote_paths:
+        cli.analyze_model(path, context_override=128)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cli.py` around lines 307 - 336, The routing test is masking
failures and still depends on real filesystem behavior, so make it deterministic
by stubbing the downstream helpers used by cli.analyze_model and
modelinfo.parsers.huggingface.fetch_huggingface_repo instead of swallowing
exceptions. Keep the local path assertions around analyze_model, but for the
remote path cases patch the needed helpers so execution completes normally, then
assert directly that fetch_huggingface_repo was called with the expected repo
IDs rather than using a try/except pass.

Source: Linters/SAST tools

src/modelinfo/parsers/huggingface.py (1)

223-223: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Address the Ruff unused-variable warning.

shard is unpacked but never used; prefix it to keep the linter clean.

-            shard, shard_header = future.result()
+            _shard, shard_header = future.result()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/modelinfo/parsers/huggingface.py` at line 223, The unpacking in the
Hugging Face parser leaves an unused variable warning because shard is never
referenced after future.result() returns. Update the assignment in the code path
around the future.result() call in the parser logic to prefix the unused value
with an underscore while keeping shard_header as-is, so Ruff no longer flags it.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/modelinfo/parsers/gguf.py`:
- Around line 58-88: The GGUF parser in the parse routine should not call
struct.unpack directly on potentially short reads from RemoteFileStream.read(),
since EOF/416 can surface as struct.error instead of a GGUF parse failure. Add a
small exact-read helper in the GGUF parser and use it for all fixed-width fields
in this block, including the version, tensor_count, kv_count, key lengths, value
types, name lengths, dimensions, tensor types, and the skipped offset bytes, so
malformed or truncated files consistently raise a deterministic invalid-GGUF
error.

In `@src/modelinfo/parsers/huggingface.py`:
- Line 105: The resolve URL construction in huggingface.py is interpolating
repo_id and filename directly, so paths with spaces or reserved characters can
break requests. Add a small helper that quotes each path component with
urllib.parse.quote(..., safe="/") and use it everywhere the /resolve/main/...
URL is built, including the request creation around the Request call and any
other resolve-path helpers in this module.

In `@src/modelinfo/ui.py`:
- Around line 111-114: The early return after the `console.print(...)` block in
`show_model_info` skips the native-context warning for grouped GGUF repos, so
keep the `GGUF_group` flow going long enough to hit the warning logic later.
Update the `show_model_info` path so grouped repos still reach the `max_context`
check and emit the native-context warning before exiting, using the existing
`GGUF_group` and warning code as the anchor points.
- Around line 59-114: The GGUF_group branch in the UI rendering path is
returning before the --vllm logic can run, so it never uses vllm_metrics and
keeps showing the static VRAM heuristic. Update the GGUF_group handling in the
same function so it respects the existing is_vllm flow used later in the method,
and only uses the grouped GGUF table when --vllm is not enabled. Make sure the
row fit/capacity labels for grouped variants are driven by the vLLM metrics path
when requested, not by the hardcoded total_vram_bytes calculation.

---

Nitpick comments:
In `@src/modelinfo/parsers/huggingface.py`:
- Line 223: The unpacking in the Hugging Face parser leaves an unused variable
warning because shard is never referenced after future.result() returns. Update
the assignment in the code path around the future.result() call in the parser
logic to prefix the unused value with an underscore while keeping shard_header
as-is, so Ruff no longer flags it.

In `@tests/test_cli.py`:
- Around line 307-336: The routing test is masking failures and still depends on
real filesystem behavior, so make it deterministic by stubbing the downstream
helpers used by cli.analyze_model and
modelinfo.parsers.huggingface.fetch_huggingface_repo instead of swallowing
exceptions. Keep the local path assertions around analyze_model, but for the
remote path cases patch the needed helpers so execution completes normally, then
assert directly that fetch_huggingface_repo was called with the expected repo
IDs rather than using a try/except pass.

In `@tests/test_parsers.py`:
- Around line 119-131: The GGUF-group test stub is tied to one specific variant
because fake_make_request only returns a GGUF header for model-q4.gguf. Update
the stub so it serves a valid GGUF header for any requested model variant in the
sibling list, or otherwise make it independent of which variant is fetched
first, while keeping the behavior in fake_make_request and the parser test setup
unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a31eb8a9-0d27-46bc-9482-f0857e2e673a

📥 Commits

Reviewing files that changed from the base of the PR and between 0e82634 and 0ef126b.

📒 Files selected for processing (7)

README.md
src/modelinfo/cli.py
src/modelinfo/parsers/gguf.py
src/modelinfo/parsers/huggingface.py
src/modelinfo/ui.py
tests/test_cli.py
tests/test_parsers.py

coderabbitai · 2026-06-27T15:56:44Z

+    magic = f.read(4)
+    if magic != b"GGUF":
+        raise ValueError("Invalid GGUF file: Magic bytes missing.")
+
+    version = struct.unpack("<I", f.read(4))[0]
+    if version < 2:
+        raise ValueError(f"Unsupported GGUF version: {version}")
+
+    tensor_count = struct.unpack("<Q", f.read(8))[0]
+    kv_count = struct.unpack("<Q", f.read(8))[0]

-    with open(path, "rb") as f:
-        magic = f.read(4)
-        if magic != b"GGUF":
-            raise ValueError("Invalid GGUF file: Magic bytes missing.")
-
-        version = struct.unpack("<I", f.read(4))[0]
-        if version < 2:
-            raise ValueError(f"Unsupported GGUF version: {version}")
-
-        tensor_count = struct.unpack("<Q", f.read(8))[0]
-        kv_count = struct.unpack("<Q", f.read(8))[0]
+    metadata = {}
+    for _ in range(kv_count):
+        key_len = struct.unpack("<Q", f.read(8))[0]
+        key_name = f.read(key_len).decode("utf-8")
+        val_type = struct.unpack("<I", f.read(4))[0]
+        metadata[key_name] = _read_gguf_value(f, val_type)

-        metadata = {}
-        for _ in range(kv_count):
-            key_len = struct.unpack("<Q", f.read(8))[0]
-            key_name = f.read(key_len).decode("utf-8")
-            val_type = struct.unpack("<I", f.read(4))[0]
-            metadata[key_name] = _read_gguf_value(f, val_type)
-
-        tensors["__metadata__"] = metadata
-
-        for _ in range(tensor_count):
-            name_len = struct.unpack("<Q", f.read(8))[0]
-            name = f.read(name_len).decode("utf-8")
-
-            n_dims = struct.unpack("<I", f.read(4))[0]
-            shape = []
-            for _ in range(n_dims):
-                shape.append(struct.unpack("<Q", f.read(8))[0])
-
-            t_type = struct.unpack("<I", f.read(4))[0]
-            f.read(8)  # skip offset bytes
-
-            # Strict GGUF tensor type mapping
-            dtype = GGML_TYPE_MAP.get(t_type, "Unknown")
-
-            tensors[name] = {"shape": shape, "dtype": dtype}
+    tensors["__metadata__"] = metadata
+
+    for _ in range(tensor_count):
+        name_len = struct.unpack("<Q", f.read(8))[0]
+        name = f.read(name_len).decode("utf-8")
+
+        n_dims = struct.unpack("<I", f.read(4))[0]
+        shape = []
+        for _ in range(n_dims):
+            shape.append(struct.unpack("<Q", f.read(8))[0])
+
+        t_type = struct.unpack("<I", f.read(4))[0]
+        f.read(8)  # skip offset bytes


🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Normalize short reads into GGUF parse errors.

RemoteFileStream.read() may return fewer bytes on EOF/416, so these struct.unpack(...) calls can leak struct.error instead of a deterministic invalid-GGUF error. Add a small exact-read helper and use it for fixed-width fields.

Proposed fix

+def _read_exact(f: Any, size: int, field: str) -> bytes: + data = f.read(size) + if len(data) != size: + raise ValueError(f"Invalid GGUF file: truncated while reading {field}.") + return data + + def _parse_gguf_header_from_stream(f: Any) -> Dict[str, Any]: tensors: Dict[str, Any] = {} - magic = f.read(4) + magic = _read_exact(f, 4, "magic") if magic != b"GGUF": raise ValueError("Invalid GGUF file: Magic bytes missing.") - version = struct.unpack("<I", f.read(4))[0] + version = struct.unpack("<I", _read_exact(f, 4, "version"))[0]

Apply the same helper to the remaining fixed-size reads in this parser.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/modelinfo/parsers/gguf.py` around lines 58 - 88, The GGUF parser in the parse routine should not call struct.unpack directly on potentially short reads from RemoteFileStream.read(), since EOF/416 can surface as struct.error instead of a GGUF parse failure. Add a small exact-read helper in the GGUF parser and use it for all fixed-width fields in this block, including the version, tensor_count, kv_count, key lengths, value types, name lengths, dimensions, tensor types, and the skipped offset bytes, so malformed or truncated files consistently raise a deterministic invalid-GGUF error.

coderabbitai · 2026-06-27T15:56:44Z

    return json.loads(json_bytes)

+def _get_remote_file_size_fallback(repo_id: str, filename: str, timeout: float = 10.0) -> float:
+    req = urllib.request.Request(f"{_get_hf_endpoint()}/{repo_id}/resolve/main/{filename}", method="HEAD")


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail git ls-files src/modelinfo/parsers/huggingface.py ast-grep outline src/modelinfo/parsers/huggingface.py --view expanded sed -n '1,260p' src/modelinfo/parsers/huggingface.py | cat -n

Repository: pipe1os/modelinfo-cli

Length of output: 12862

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the remaining portion of the file. sed -n '260,360p' src/modelinfo/parsers/huggingface.py | cat -n # Search for any existing URL-encoding helper or quote usage in the parser. rg -n "quote\(|resolve/main|_hf_resolve_url|urllib\.parse" src/modelinfo/parsers

Repository: pipe1os/modelinfo-cli

Length of output: 5747

Quote Hub resolve path components. Every /resolve/main/... URL in this file interpolates repo_id/filename directly, so files with spaces or other reserved characters can fail to download. Add a small helper that urllib.parse.quote(..., safe="/")s each path component and reuse it for all resolve URLs.

🧰 Tools

🪛 Ruff (0.15.18)

[error] 105-105: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/modelinfo/parsers/huggingface.py` at line 105, The resolve URL construction in huggingface.py is interpolating repo_id and filename directly, so paths with spaces or reserved characters can break requests. Add a small helper that quotes each path component with urllib.parse.quote(..., safe="/") and use it everywhere the /resolve/main/... URL is built, including the request creation around the Request call and any other resolve-path helpers in this module.

coderabbitai · 2026-06-27T15:56:44Z

+    if format_name == "GGUF_group":
+        metadata = tensors.get("__metadata__", {})
+        variants = metadata.get("gguf_variants", [])
+        repo_id = metadata.get("repo_id", "")
+
+        console.print(f"[bold]Repository:[/bold]      {repo_id}")
+        console.print("[bold]Format:[/bold]          GGUF (Multiple Quantizations)")
+        console.print(f"[bold]Architecture:[/bold]    {arch_name}")
+        if max_context:
+            console.print(f"[bold]Context Limit:[/bold]   {max_context:,} tokens")
+        console.print()
+
+        table = Table(box=None, show_header=True, header_style="bold", pad_edge=False, padding=(0, 2))
+        table.add_column("Quantization File")
+        table.add_column("File Size", justify="right")
+        table.add_column("KV Cache", justify="right")
+        table.add_column("Total VRAM", justify="right")
+
+        show_fits = gpu_name is not None
+        if show_fits:
+            table.add_column("Fits", justify="left")
+
+        kv_cache_bytes = footprint["kv_cache_bytes"]
+        penalty_percentage = footprint.get("penalty_percentage", 0.0)
+        cuda_overhead = 600 * 1024 * 1024 * gpu_count
+
+        sorted_variants = sorted(variants, key=lambda x: x["size"], reverse=True)
+        for var in sorted_variants:
+            filename = var["filename"]
+            size_bytes = var["size"]
+            variant_overhead = cuda_overhead + (size_bytes * penalty_percentage)
+            total_vram_bytes = size_bytes + kv_cache_bytes + variant_overhead
+
+            file_size_str = format_bytes(size_bytes)
+            kv_cache_str = format_bytes(kv_cache_bytes)
+
+            vram_color = get_vram_color(total_vram_bytes, max_vram_gb)
+            total_vram_str = f"[{vram_color}]~{format_bytes(total_vram_bytes)}[/{vram_color}]"
+
+            row_data = [filename, file_size_str, kv_cache_str, total_vram_str]
+            if show_fits:
+                utilization = total_vram_bytes / (max_vram_gb * 1024**3) if max_vram_gb > 0 else 2.0
+                if utilization <= gpu_util:
+                    fit_text = "[green]✓ Yes[/green]"
+                elif utilization <= 0.99:
+                    fit_text = "[yellow]⚠ Warning[/yellow]"
+                else:
+                    fit_text = "[red]✗ No[/red]"
+                row_data.append(fit_text)
+
+            table.add_row(*row_data)
+
+        console.print(table)
+        console.print()
+        console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]")
+        return


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don't silently ignore --vllm for GGUF_group.

This early return skips the is_vllm path at Lines 172-199, so GGUF_group output never uses vllm_metrics and still labels rows with the static VRAM heuristic. modelinfo <repo> --vllm can therefore show misleading fit/capacity output for grouped GGUF repos.

Possible minimal safeguard

if format_name == "GGUF_group": + if is_vllm: + console.print( + "[yellow]vLLM capacity planning requires a specific GGUF file. " + "Rerun with `modelinfo <repo>/<quant>.gguf --vllm`.[/yellow]" + ) + return + metadata = tensors.get("__metadata__", {})

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if format_name == "GGUF_group":

metadata = tensors.get("__metadata__", {})

variants = metadata.get("gguf_variants", [])

repo_id = metadata.get("repo_id", "")

console.print(f"[bold]Repository:[/bold] {repo_id}")

console.print("[bold]Format:[/bold] GGUF (Multiple Quantizations)")

console.print(f"[bold]Architecture:[/bold] {arch_name}")

if max_context:

console.print(f"[bold]Context Limit:[/bold] {max_context:,} tokens")

console.print()

table = Table(box=None, show_header=True, header_style="bold", pad_edge=False, padding=(0, 2))

table.add_column("Quantization File")

table.add_column("File Size", justify="right")

table.add_column("KV Cache", justify="right")

table.add_column("Total VRAM", justify="right")

show_fits = gpu_name is not None

if show_fits:

table.add_column("Fits", justify="left")

kv_cache_bytes = footprint["kv_cache_bytes"]

penalty_percentage = footprint.get("penalty_percentage", 0.0)

cuda_overhead = 600 * 1024 * 1024 * gpu_count

sorted_variants = sorted(variants, key=lambda x: x["size"], reverse=True)

for var in sorted_variants:

filename = var["filename"]

size_bytes = var["size"]

variant_overhead = cuda_overhead + (size_bytes * penalty_percentage)

total_vram_bytes = size_bytes + kv_cache_bytes + variant_overhead

file_size_str = format_bytes(size_bytes)

kv_cache_str = format_bytes(kv_cache_bytes)

vram_color = get_vram_color(total_vram_bytes, max_vram_gb)

total_vram_str = f"[{vram_color}]~{format_bytes(total_vram_bytes)}[/{vram_color}]"

row_data = [filename, file_size_str, kv_cache_str, total_vram_str]

if show_fits:

utilization = total_vram_bytes / (max_vram_gb * 1024**3) if max_vram_gb > 0 else 2.0

if utilization <= gpu_util:

fit_text = "[green]✓ Yes[/green]"

elif utilization <= 0.99:

fit_text = "[yellow]⚠ Warning[/yellow]"

else:

fit_text = "[red]✗ No[/red]"

row_data.append(fit_text)

table.add_row(*row_data)

console.print(table)

console.print()

console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]")

return

if format_name == "GGUF_group":

if is_vllm:

console.print(

"[yellow]vLLM capacity planning requires a specific GGUF file. "

"Rerun with `modelinfo <repo>/<quant>.gguf --vllm`.[/yellow]"

)

return

metadata = tensors.get("__metadata__", {})

variants = metadata.get("gguf_variants", [])

repo_id = metadata.get("repo_id", "")

console.print(f"[bold]Repository:[/bold] {repo_id}")

console.print("[bold]Format:[/bold] GGUF (Multiple Quantizations)")

console.print(f"[bold]Architecture:[/bold] {arch_name}")

if max_context:

console.print(f"[bold]Context Limit:[/bold] {max_context:,} tokens")

console.print()

table = Table(box=None, show_header=True, header_style="bold", pad_edge=False, padding=(0, 2))

table.add_column("Quantization File")

table.add_column("File Size", justify="right")

table.add_column("KV Cache", justify="right")

table.add_column("Total VRAM", justify="right")

show_fits = gpu_name is not None

if show_fits:

table.add_column("Fits", justify="left")

kv_cache_bytes = footprint["kv_cache_bytes"]

penalty_percentage = footprint.get("penalty_percentage", 0.0)

cuda_overhead = 600 * 1024 * 1024 * gpu_count

sorted_variants = sorted(variants, key=lambda x: x["size"], reverse=True)

for var in sorted_variants:

filename = var["filename"]

size_bytes = var["size"]

variant_overhead = cuda_overhead + (size_bytes * penalty_percentage)

total_vram_bytes = size_bytes + kv_cache_bytes + variant_overhead

file_size_str = format_bytes(size_bytes)

kv_cache_str = format_bytes(kv_cache_bytes)

vram_color = get_vram_color(total_vram_bytes, max_vram_gb)

total_vram_str = f"[{vram_color}]~{format_bytes(total_vram_bytes)}[/{vram_color}]"

row_data = [filename, file_size_str, kv_cache_str, total_vram_str]

if show_fits:

utilization = total_vram_bytes / (max_vram_gb * 1024**3) if max_vram_gb > 0 else 2.0

if utilization <= gpu_util:

fit_text = "[green]✓ Yes[/green]"

elif utilization <= 0.99:

fit_text = "[yellow]⚠ Warning[/yellow]"

else:

fit_text = "[red]✗ No[/red]"

row_data.append(fit_text)

table.add_row(*row_data)

console.print(table)

console.print()

console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]")

return

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/modelinfo/ui.py` around lines 59 - 114, The GGUF_group branch in the UI rendering path is returning before the --vllm logic can run, so it never uses vllm_metrics and keeps showing the static VRAM heuristic. Update the GGUF_group handling in the same function so it respects the existing is_vllm flow used later in the method, and only uses the grouped GGUF table when --vllm is not enabled. Make sure the row fit/capacity labels for grouped variants are driven by the vLLM metrics path when requested, not by the hardcoded total_vram_bytes calculation.

coderabbitai · 2026-06-27T15:56:44Z

+        console.print(table)
+        console.print()
+        console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]")
+        return


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Keep the native-context warning on the GGUF_group path.

The return here also bypasses the warning at Lines 216-217, so grouped repos inspected with --context above max_context get no warning at all.

Small fix

console.print(table) console.print() + if context_length > 0 and max_context is not None and context_length > max_context: + console.print( + f"[bold yellow]WARNING: Requested context ({context_length:,}) exceeds model's native limit ({max_context:,}).[/bold yellow]" + ) + console.print() console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]") return

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

console.print(table)

console.print()

console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]")

return

console.print(table)

console.print()

console.print(f"[dim]Tip: To view details for a specific quantization, run: modelinfo {repo_id}/{sorted_variants[0]['filename']}[/dim]")

if context_length > 0 and max_context is not None and context_length > max_context:

console.print(

f"[bold yellow]WARNING: Requested context ({context_length:,}) exceeds model's native limit ({max_context:,}).[/bold yellow]"

)

console.print()

return

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/modelinfo/ui.py` around lines 111 - 114, The early return after the `console.print(...)` block in `show_model_info` skips the native-context warning for grouped GGUF repos, so keep the `GGUF_group` flow going long enough to hit the warning logic later. Update the `show_model_info` path so grouped repos still reach the `max_context` check and emit the native-context warning before exiting, using the existing `GGUF_group` and warning code as the anchor points.

pipe1os added 11 commits June 27, 2026 11:16

implement remote gguf inspection on hugging face

1b1a090

split print_model_info test to comply with codacy method size limit

71ef3a3

fix codacy issues: add read limit, honor gpu_util, modularize hf pars…

d0c5474

…er, add error tests

refactor: split concurrent shards fetching to lower cyclomatic comple…

bebe2c1

…xity

fix codacy issues: compute GGUF group variant overhead dynamically

6555e0e

docs: document remote gguf inspection options in README.md

357ee16

fix: strip trailing slashes from model paths at entrypoint

5a8e6e2

fix: handle reverse tensor shape ordering for gguf shape guessing

7dc8576

fix: treat paths starting with local prefix as local files to prevent…

0ef126b

… remote routing

fix: handle concurrent remote shard download failures gracefully

b0b9744

fix: resolve safetensors shard index prefix splitting

9ba40d0

pipe1os added 2 commits June 27, 2026 11:52

merge: integrate branch 004-fix-safetensors-shard-prefix

738e291

merge: integrate branch 005-graceful-shard-downloads

1b1c58f

codacy-production Bot reviewed Jun 27, 2026

View reviewed changes

pipe1os added 4 commits June 27, 2026 11:53

merge: integrate branch 006-fix-gguf-shape-guessing

56bbf66

merge: integrate branch 007-refine-remote-detection

0cfe2e5

merge: integrate branch 008-fix-comparison-trailing-slash

2deb25a

fix: address codacy review feedback on disk size, regex, path parsing…

32bf12b

…, and test helper

coderabbitai Bot reviewed Jun 27, 2026

View reviewed changes

pipe1os added 2 commits June 27, 2026 11:58

merge: sync with main and apply updates

65ab0c4

merge: sync with main

9649242

pipe1os force-pushed the main branch from 32bf12b to 71abc38 Compare June 27, 2026 18:25

pipe1os closed this Jun 27, 2026

pipe1os deleted the advisor/007-refine-remote-detection branch June 27, 2026 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: treat local prefix paths as local to prevent remote routing#40

fix: treat local prefix paths as local to prevent remote routing#40
pipe1os wants to merge 19 commits into
mainfrom
advisor/007-refine-remote-detection

pipe1os commented Jun 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading

Review limit reached

❌ Failed checks (1 warning)

Uh oh!

codacy-production Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

codacy-production Bot left a comment

Uh oh!

codacy-production Bot Jun 27, 2026

Uh oh!

codacy-production Bot Jun 27, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Uh oh!

coderabbitai Bot Jun 27, 2026

Uh oh!

coderabbitai Bot Jun 27, 2026

Uh oh!

coderabbitai Bot Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if os.path.exists(file_path):
		disk_size = os.path.getsize(file_path)

Conversation

pipe1os commented Jun 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation & Context

Type of Change

How Has This Been Tested?

Screenshots (if appropriate)

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

codacy-production Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

codacy-production Bot left a comment

Choose a reason for hiding this comment

Pull Request Overview

About this PR

Test suggestions

Uh oh!

codacy-production Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

codacy-production Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pipe1os commented Jun 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading

codacy-production Bot commented Jun 27, 2026 •

edited

Loading