fix: handle concurrent remote shard download failures gracefully#43
fix: handle concurrent remote shard download failures gracefully#43pipe1os wants to merge 19 commits into
Conversation
…er, add error tests
|
Warning Review limit reached
More reviews will be available in 49 minutes and 36 seconds. Learn how PR review limits work. To continue reviewing without waiting, enable usage-based billing in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
There was a problem hiding this comment.
Pull Request Overview
The PR successfully implements graceful error handling for concurrent remote shard downloads, but it also introduces significant GGUF remote inspection features and repository comparison tables that exceed the scope of the 'fix' title.
While Codacy reports that the PR is up to standards, there are several points of concern. A critical logic error was found in the disk size calculation for local sharded models. Additionally, the analyze_model function in src/modelinfo/cli.py has reached a high level of cyclomatic complexity (16) due to its role as a central dispatcher for multiple formats and locations, which should be refactored to ensure long-term maintainability.
About this PR
- The 'is_remote' detection logic in
src/modelinfo/cli.py(lines 152-155) may trigger false positives for non-existent local files that contain a slash in the path. This could lead to unnecessary network requests and slower error reporting for local path typos. - The UI threshold for the 'Fits' status has been changed from a hardcoded 0.90 to the user-provided 'gpu_util' (defaulting to 0.90). This logic change was not documented in the PR description and affects how the UI reports model compatibility.
- There is significant scope creep in this PR. It is described as a fix for SafeTensors shard download failures but includes a large feature set for GGUF remote repositories. Consider splitting major features into separate PRs to simplify review and testing.
1 comment outside of the diff
src/modelinfo/cli.py
line 133🟡 MEDIUM RISK
Theanalyze_modelfunction has high cyclomatic complexity (16) because it manages all logic for remote vs local model detection, multiple file formats (GGUF, SafeTensors, PyTorch), and directory validation in a single block. Refactor this to extract model loading and context calculation into separate functions, using a strategy pattern for file extensions.
Test suggestions
- Simulate a network failure (502 Bad Gateway) for one shard in a multi-shard SafeTensors model and verify 'missing_shards' is correctly reported.
- Verify that a remote GGUF repository with multiple files displays a summary table of all quantizations.
- Verify that specifying a specific GGUF file within a remote repository (repo/path/file.gguf) fetches only that file's header.
- Verify behavior when unauthorized (401) or missing (404) repositories are accessed.
TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback
| if format_name != "SafeTensors" or os.path.exists(file_path): | ||
| disk_size = os.path.getsize(file_path) if os.path.exists(file_path) else 0.0 | ||
| if os.path.exists(file_path): | ||
| disk_size = os.path.getsize(file_path) |
There was a problem hiding this comment.
🟡 MEDIUM RISK
Local disk_size calculation for sharded SafeTensors (.index.json) currently reports the size of the index JSON file itself instead of the total weight size of all shards. This creates a discrepancy between local and remote model reporting.
| def fake_fetch(repo_id, *, fetch_tensors, timeout): | ||
| tensors = { | ||
| "__metadata__": { | ||
| "general.architecture": "llama", | ||
| "llama.block_count": 32, | ||
| "llama.attention.head_count_kv": 8, | ||
| "llama.attention.key_length": 128, | ||
| "gguf_variants": [ | ||
| {"filename": "model-q4.gguf", "size": 1000000000}, | ||
| {"filename": "model-q8.gguf", "size": 2000000000} | ||
| ], | ||
| "repo_id": "org/model-gguf" | ||
| } | ||
| } | ||
| return tensors, None, "GGUF_group", 0.0 |
There was a problem hiding this comment.
⚪ LOW RISK
Suggestion: The mock metadata structure is duplicated between this test and the helper function. Since you have already introduced _get_mock_gguf_group_data at line 231, you should use it here to keep the tests DRY.
| def fake_fetch(repo_id, *, fetch_tensors, timeout): | |
| tensors = { | |
| "__metadata__": { | |
| "general.architecture": "llama", | |
| "llama.block_count": 32, | |
| "llama.attention.head_count_kv": 8, | |
| "llama.attention.key_length": 128, | |
| "gguf_variants": [ | |
| {"filename": "model-q4.gguf", "size": 1000000000}, | |
| {"filename": "model-q8.gguf", "size": 2000000000} | |
| ], | |
| "repo_id": "org/model-gguf" | |
| } | |
| } | |
| return tensors, None, "GGUF_group", 0.0 | |
| def fake_fetch(repo_id, *, fetch_tensors, timeout): | |
| tensors, _ = _get_mock_gguf_group_data() | |
| return tensors, None, "GGUF_group", 0.0 |
|
|
||
|
|
||
| def parse_gguf_header(path: str) -> Dict[str, Any]: | ||
| def parse_gguf_header(path_or_file: str | Any) -> Dict[str, Any]: |
There was a problem hiding this comment.
⚪ LOW RISK
Nitpick: Consider using typing.BinaryIO for the stream parameter instead of Any to improve type safety and static analysis for binary file-like objects.
b8cf42f to
aa331eb
Compare
Summary
This PR makes remote SafeTensors shard downloads resilient to individual shard network failures. Instead of raising an exception and halting the entire parsing process when a single shard fails to fetch, it catches exceptions gracefully on a per-shard basis, tracks missing shards, and returns the successfully retrieved tensors along with the correct count of missing shards.
Motivation & Context
When inspecting sharded SafeTensors models from Hugging Face remotely, the concurrent retrieval of shard headers runs in a ThreadPoolExecutor. Currently, if any single shard fails (due to network timeout, transient CDN errors, or missing files), the exception bubbles up and crashes the entire execution. By handling shard failures gracefully, we match the local sharded parser behavior where missing shards are recorded and reported without crashing the application.
Type of Change
How Has This Been Tested?
Added
test_remote_shard_download_failureintests/test_parsers.pysimulating a failed remote shard download raising anHTTPError.Verified that the execution completes without raising exceptions, returns valid metadata from successfully downloaded shards, and reports
missing_shardsas 1.Ran the test suite via
.venv/bin/pytest tests/ -v.Unit tests
Checklist