Skip to content

fix: handle concurrent remote shard download failures gracefully#43

Closed
pipe1os wants to merge 19 commits into
mainfrom
advisor/005-graceful-shard-downloads
Closed

fix: handle concurrent remote shard download failures gracefully#43
pipe1os wants to merge 19 commits into
mainfrom
advisor/005-graceful-shard-downloads

Conversation

@pipe1os

@pipe1os pipe1os commented Jun 27, 2026

Copy link
Copy Markdown
Owner

Summary

This PR makes remote SafeTensors shard downloads resilient to individual shard network failures. Instead of raising an exception and halting the entire parsing process when a single shard fails to fetch, it catches exceptions gracefully on a per-shard basis, tracks missing shards, and returns the successfully retrieved tensors along with the correct count of missing shards.

Motivation & Context

When inspecting sharded SafeTensors models from Hugging Face remotely, the concurrent retrieval of shard headers runs in a ThreadPoolExecutor. Currently, if any single shard fails (due to network timeout, transient CDN errors, or missing files), the exception bubbles up and crashes the entire execution. By handling shard failures gracefully, we match the local sharded parser behavior where missing shards are recorded and reported without crashing the application.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

  • Added test_remote_shard_download_failure in tests/test_parsers.py simulating a failed remote shard download raising an HTTPError.

  • Verified that the execution completes without raising exceptions, returns valid metadata from successfully downloaded shards, and reports missing_shards as 1.

  • Ran the test suite via .venv/bin/pytest tests/ -v.

  • Unit tests

Checklist

  • My code follows the code style of this project.
  • My commit messages follow the Conventional Commits format, are lowercase, imperative, and specific.
  • I have updated the documentation accordingly (if applicable).
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@pipe1os, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 49 minutes and 36 seconds. Learn how PR review limits work.

To continue reviewing without waiting, enable usage-based billing in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: bc88f284-c028-40c2-9b7b-6808458958b5

📥 Commits

Reviewing files that changed from the base of the PR and between 0e82634 and aa331eb.

📒 Files selected for processing (2)
  • src/modelinfo/parsers/huggingface.py
  • tests/test_parsers.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch advisor/005-graceful-shard-downloads

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codacy-production

codacy-production Bot commented Jun 27, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

AI Reviewer: first review requested successfully. AI can make mistakes. Always validate suggestions.

Run reviewer

TIP This summary will be updated as you push new changes.

@codacy-production codacy-production Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

The PR successfully implements graceful error handling for concurrent remote shard downloads, but it also introduces significant GGUF remote inspection features and repository comparison tables that exceed the scope of the 'fix' title.

While Codacy reports that the PR is up to standards, there are several points of concern. A critical logic error was found in the disk size calculation for local sharded models. Additionally, the analyze_model function in src/modelinfo/cli.py has reached a high level of cyclomatic complexity (16) due to its role as a central dispatcher for multiple formats and locations, which should be refactored to ensure long-term maintainability.

About this PR

  • The 'is_remote' detection logic in src/modelinfo/cli.py (lines 152-155) may trigger false positives for non-existent local files that contain a slash in the path. This could lead to unnecessary network requests and slower error reporting for local path typos.
  • The UI threshold for the 'Fits' status has been changed from a hardcoded 0.90 to the user-provided 'gpu_util' (defaulting to 0.90). This logic change was not documented in the PR description and affects how the UI reports model compatibility.
  • There is significant scope creep in this PR. It is described as a fix for SafeTensors shard download failures but includes a large feature set for GGUF remote repositories. Consider splitting major features into separate PRs to simplify review and testing.
1 comment outside of the diff
src/modelinfo/cli.py

line 133 🟡 MEDIUM RISK
The analyze_model function has high cyclomatic complexity (16) because it manages all logic for remote vs local model detection, multiple file formats (GGUF, SafeTensors, PyTorch), and directory validation in a single block. Refactor this to extract model loading and context calculation into separate functions, using a strategy pattern for file extensions.

Test suggestions

  • Simulate a network failure (502 Bad Gateway) for one shard in a multi-shard SafeTensors model and verify 'missing_shards' is correctly reported.
  • Verify that a remote GGUF repository with multiple files displays a summary table of all quantizations.
  • Verify that specifying a specific GGUF file within a remote repository (repo/path/file.gguf) fetches only that file's header.
  • Verify behavior when unauthorized (401) or missing (404) repositories are accessed.

TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback

Comment thread src/modelinfo/cli.py Outdated
if format_name != "SafeTensors" or os.path.exists(file_path):
disk_size = os.path.getsize(file_path) if os.path.exists(file_path) else 0.0
if os.path.exists(file_path):
disk_size = os.path.getsize(file_path)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM RISK

Local disk_size calculation for sharded SafeTensors (.index.json) currently reports the size of the index JSON file itself instead of the total weight size of all shards. This creates a discrepancy between local and remote model reporting.

Comment thread tests/test_cli.py
Comment on lines +189 to +203
def fake_fetch(repo_id, *, fetch_tensors, timeout):
tensors = {
"__metadata__": {
"general.architecture": "llama",
"llama.block_count": 32,
"llama.attention.head_count_kv": 8,
"llama.attention.key_length": 128,
"gguf_variants": [
{"filename": "model-q4.gguf", "size": 1000000000},
{"filename": "model-q8.gguf", "size": 2000000000}
],
"repo_id": "org/model-gguf"
}
}
return tensors, None, "GGUF_group", 0.0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚪ LOW RISK

Suggestion: The mock metadata structure is duplicated between this test and the helper function. Since you have already introduced _get_mock_gguf_group_data at line 231, you should use it here to keep the tests DRY.

Suggested change
def fake_fetch(repo_id, *, fetch_tensors, timeout):
tensors = {
"__metadata__": {
"general.architecture": "llama",
"llama.block_count": 32,
"llama.attention.head_count_kv": 8,
"llama.attention.key_length": 128,
"gguf_variants": [
{"filename": "model-q4.gguf", "size": 1000000000},
{"filename": "model-q8.gguf", "size": 2000000000}
],
"repo_id": "org/model-gguf"
}
}
return tensors, None, "GGUF_group", 0.0
def fake_fetch(repo_id, *, fetch_tensors, timeout):
tensors, _ = _get_mock_gguf_group_data()
return tensors, None, "GGUF_group", 0.0



def parse_gguf_header(path: str) -> Dict[str, Any]:
def parse_gguf_header(path_or_file: str | Any) -> Dict[str, Any]:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚪ LOW RISK

Nitpick: Consider using typing.BinaryIO for the stream parameter instead of Any to improve type safety and static analysis for binary file-like objects.

@pipe1os pipe1os force-pushed the advisor/005-graceful-shard-downloads branch from b8cf42f to aa331eb Compare June 27, 2026 16:00
@pipe1os pipe1os closed this Jun 27, 2026
@pipe1os pipe1os deleted the advisor/005-graceful-shard-downloads branch June 27, 2026 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant