fix: handle concurrent remote shard download failures gracefully by pipe1os · Pull Request #43 · pipe1os/modelinfo-cli

pipe1os · 2026-06-27T15:50:54Z

Summary

This PR makes remote SafeTensors shard downloads resilient to individual shard network failures. Instead of raising an exception and halting the entire parsing process when a single shard fails to fetch, it catches exceptions gracefully on a per-shard basis, tracks missing shards, and returns the successfully retrieved tensors along with the correct count of missing shards.

Motivation & Context

When inspecting sharded SafeTensors models from Hugging Face remotely, the concurrent retrieval of shard headers runs in a ThreadPoolExecutor. Currently, if any single shard fails (due to network timeout, transient CDN errors, or missing files), the exception bubbles up and crashes the entire execution. By handling shard failures gracefully, we match the local sharded parser behavior where missing shards are recorded and reported without crashing the application.

Type of Change

Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Added test_remote_shard_download_failure in tests/test_parsers.py simulating a failed remote shard download raising an HTTPError.
Verified that the execution completes without raising exceptions, returns valid metadata from successfully downloaded shards, and reports missing_shards as 1.
Ran the test suite via .venv/bin/pytest tests/ -v.
Unit tests

Checklist

My code follows the code style of this project.
My commit messages follow the Conventional Commits format, are lowercase, imperative, and specific.
I have updated the documentation accordingly (if applicable).
I have added tests to cover my changes.
All new and existing tests passed.

…er, add error tests

…xity

… remote routing

coderabbitai · 2026-06-27T15:51:01Z

Warning

Review limit reached

@pipe1os, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 49 minutes and 36 seconds. Learn how PR review limits work.

To continue reviewing without waiting, enable usage-based billing in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: bc88f284-c028-40c2-9b7b-6808458958b5

📥 Commits

Reviewing files that changed from the base of the PR and between 0e82634 and aa331eb.

📒 Files selected for processing (2)

src/modelinfo/parsers/huggingface.py
tests/test_parsers.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch advisor/005-graceful-shard-downloads

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

codacy-production · 2026-06-27T15:52:16Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

AI Reviewer: first review requested successfully. AI can make mistakes. Always validate suggestions.

_{TIP This summary will be updated as you push new changes.}

codacy-production

Pull Request Overview

The PR successfully implements graceful error handling for concurrent remote shard downloads, but it also introduces significant GGUF remote inspection features and repository comparison tables that exceed the scope of the 'fix' title.

While Codacy reports that the PR is up to standards, there are several points of concern. A critical logic error was found in the disk size calculation for local sharded models. Additionally, the analyze_model function in src/modelinfo/cli.py has reached a high level of cyclomatic complexity (16) due to its role as a central dispatcher for multiple formats and locations, which should be refactored to ensure long-term maintainability.

About this PR

The 'is_remote' detection logic in src/modelinfo/cli.py (lines 152-155) may trigger false positives for non-existent local files that contain a slash in the path. This could lead to unnecessary network requests and slower error reporting for local path typos.
The UI threshold for the 'Fits' status has been changed from a hardcoded 0.90 to the user-provided 'gpu_util' (defaulting to 0.90). This logic change was not documented in the PR description and affects how the UI reports model compatibility.
There is significant scope creep in this PR. It is described as a fix for SafeTensors shard download failures but includes a large feature set for GGUF remote repositories. Consider splitting major features into separate PRs to simplify review and testing.

1 comment outside of the diff

src/modelinfo/cli.py

_{line 133 🟡 MEDIUM RISK}
The analyze_model function has high cyclomatic complexity (16) because it manages all logic for remote vs local model detection, multiple file formats (GGUF, SafeTensors, PyTorch), and directory validation in a single block. Refactor this to extract model loading and context calculation into separate functions, using a strategy pattern for file extensions.

Test suggestions

Simulate a network failure (502 Bad Gateway) for one shard in a multi-shard SafeTensors model and verify 'missing_shards' is correctly reported.
Verify that a remote GGUF repository with multiple files displays a summary table of all quantizations.
Verify that specifying a specific GGUF file within a remote repository (repo/path/file.gguf) fetches only that file's header.
Verify behavior when unauthorized (401) or missing (404) repositories are accessed.

_{TIP Improve review quality by adding custom instructions}
_{TIP How was this review? Give us feedback}

codacy-production · 2026-06-27T15:54:01Z

-    if format_name != "SafeTensors" or os.path.exists(file_path):
-        disk_size = os.path.getsize(file_path) if os.path.exists(file_path) else 0.0
+    if os.path.exists(file_path):
+        disk_size = os.path.getsize(file_path)


_{🟡 MEDIUM RISK}

Local disk_size calculation for sharded SafeTensors (.index.json) currently reports the size of the index JSON file itself instead of the total weight size of all shards. This creates a discrepancy between local and remote model reporting.

codacy-production · 2026-06-27T15:54:01Z

+    def fake_fetch(repo_id, *, fetch_tensors, timeout):
+        tensors = {
+            "__metadata__": {
+                "general.architecture": "llama",
+                "llama.block_count": 32,
+                "llama.attention.head_count_kv": 8,
+                "llama.attention.key_length": 128,
+                "gguf_variants": [
+                    {"filename": "model-q4.gguf", "size": 1000000000},
+                    {"filename": "model-q8.gguf", "size": 2000000000}
+                ],
+                "repo_id": "org/model-gguf"
+            }
+        }
+        return tensors, None, "GGUF_group", 0.0


_{⚪ LOW RISK}

Suggestion: The mock metadata structure is duplicated between this test and the helper function. Since you have already introduced _get_mock_gguf_group_data at line 231, you should use it here to keep the tests DRY.

Suggested change

def fake_fetch(repo_id, *, fetch_tensors, timeout):

tensors = {

"__metadata__": {

"general.architecture": "llama",

"llama.block_count": 32,

"llama.attention.head_count_kv": 8,

"llama.attention.key_length": 128,

"gguf_variants": [

{"filename": "model-q4.gguf", "size": 1000000000},

{"filename": "model-q8.gguf", "size": 2000000000}

],

"repo_id": "org/model-gguf"

}

}

return tensors, None, "GGUF_group", 0.0

def fake_fetch(repo_id, *, fetch_tensors, timeout):

tensors, _ = _get_mock_gguf_group_data()

return tensors, None, "GGUF_group", 0.0

codacy-production · 2026-06-27T15:54:01Z



-def parse_gguf_header(path: str) -> Dict[str, Any]:
+def parse_gguf_header(path_or_file: str | Any) -> Dict[str, Any]:


_{⚪ LOW RISK}

Nitpick: Consider using typing.BinaryIO for the stream parameter instead of Any to improve type safety and static analysis for binary file-like objects.

…, and test helper

pipe1os added 11 commits June 27, 2026 11:16

implement remote gguf inspection on hugging face

1b1a090

split print_model_info test to comply with codacy method size limit

71ef3a3

fix codacy issues: add read limit, honor gpu_util, modularize hf pars…

d0c5474

…er, add error tests

refactor: split concurrent shards fetching to lower cyclomatic comple…

bebe2c1

…xity

fix codacy issues: compute GGUF group variant overhead dynamically

6555e0e

docs: document remote gguf inspection options in README.md

357ee16

fix: strip trailing slashes from model paths at entrypoint

5a8e6e2

fix: handle reverse tensor shape ordering for gguf shape guessing

7dc8576

fix: treat paths starting with local prefix as local files to prevent…

0ef126b

… remote routing

fix: handle concurrent remote shard download failures gracefully

b0b9744

fix: resolve safetensors shard index prefix splitting

9ba40d0

pipe1os added 5 commits June 27, 2026 11:52

merge: integrate branch 004-fix-safetensors-shard-prefix

738e291

merge: integrate branch 005-graceful-shard-downloads

1b1c58f

merge: integrate branch 006-fix-gguf-shape-guessing

56bbf66

merge: integrate branch 007-refine-remote-detection

0cfe2e5

merge: integrate branch 008-fix-comparison-trailing-slash

2deb25a

codacy-production Bot reviewed Jun 27, 2026

View reviewed changes

pipe1os added 2 commits June 27, 2026 11:56

fix: address codacy review feedback on disk size, regex, path parsing…

32bf12b

…, and test helper

merge: sync with main and apply updates

aa331eb

pipe1os force-pushed the advisor/005-graceful-shard-downloads branch from b8cf42f to aa331eb Compare June 27, 2026 16:00

merge: sync with main

1acb0cb

pipe1os force-pushed the main branch from 32bf12b to 71abc38 Compare June 27, 2026 18:25

pipe1os closed this Jun 27, 2026

pipe1os deleted the advisor/005-graceful-shard-downloads branch June 27, 2026 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle concurrent remote shard download failures gracefully#43

fix: handle concurrent remote shard download failures gracefully#43
pipe1os wants to merge 19 commits into
mainfrom
advisor/005-graceful-shard-downloads

pipe1os commented Jun 27, 2026

Uh oh!

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading

Review limit reached

Uh oh!

codacy-production Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

codacy-production Bot left a comment

Uh oh!

codacy-production Bot Jun 27, 2026

Uh oh!

codacy-production Bot Jun 27, 2026

Uh oh!

codacy-production Bot Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant



		def parse_gguf_header(path: str) -> Dict[str, Any]:
		def parse_gguf_header(path_or_file: str \| Any) -> Dict[str, Any]:

Conversation

pipe1os commented Jun 27, 2026

Summary

Motivation & Context

Type of Change

How Has This Been Tested?

Checklist

Uh oh!

coderabbitai Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

codacy-production Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

codacy-production Bot left a comment

Choose a reason for hiding this comment

Pull Request Overview

About this PR

Test suggestions

Uh oh!

codacy-production Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

codacy-production Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

codacy-production Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading

codacy-production Bot commented Jun 27, 2026 •

edited

Loading