fix: handle reverse tensor shape ordering for gguf shape guessing#42
fix: handle reverse tensor shape ordering for gguf shape guessing#42pipe1os wants to merge 19 commits into
Conversation
…er, add error tests
|
Warning Review limit reached
More reviews will be available in 52 minutes and 12 seconds. Learn how PR review limits work. To continue reviewing without waiting, enable usage-based billing in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
There was a problem hiding this comment.
Pull Request Overview
While this PR is technically 'up to standards' according to Codacy, it introduces substantial scope creep that is not reflected in the title. Beyond the GGUF shape-guessing fix, it implements remote GGUF fetching, a custom RemoteFileStream, and UI comparison tables.
The implementation has caused a significant spike in complexity, particularly in src/modelinfo/parsers/huggingface.py (+41), which now functions as a monolithic dispatcher. Additionally, there are minor logic issues in the CLI error handling and code duplication in the new test suite. Addressing the complexity deltas through the suggested refactors is highly recommended before merging.
About this PR
- This PR contains significant scope creep. The title and summary focus on a specific bug fix for shape guessing, but the implementation introduces major new features (remote GGUF support, UI comparison tables, and a custom RemoteFileStream) that are not detailed in the description. This makes the review process more difficult and obscures architectural changes.
2 comments outside of the diff
src/modelinfo/cli.py
line 133🟡 MEDIUM RISK
The analyze_model function exceeds the cyclomatic complexity threshold (16). It currently handles URI resolution, local path validation, and multi-format parser dispatch. It is recommended to extract the format detection and parser selection logic into a separate factory function.
line 183⚪ LOW RISK
The error message is misleading when a local file exists but does not match supported extensions. If the file exists but no extension matches, the exception should state that the format is unsupported rather than saying the file was 'not found locally'.
Test suggestions
- Verify fallback shape guessing for GGUF k_proj uses shape[-1]
- Verify fallback shape guessing for GGUF fused qkv_proj uses shape[-1] // 3
- Verify identification of GGUF architecture based on 'general.' metadata keys
- Verify remote fetching of a single GGUF file from Hugging Face
- Verify remote fetching and grouping of multiple GGUF variants from a repository
TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback
| return 0.0 | ||
|
|
||
|
|
||
| class RemoteFileStream: |
There was a problem hiding this comment.
🔴 HIGH RISK
Suggestion: The complexity of this file has increased significantly (+41) with the addition of remote GGUF support and the RemoteFileStream implementation. fetch_huggingface_repo has become a monolithic function handling multiple file formats and sharding strategies. Consider refactoring it into smaller, format-specific functions (e.g., _handle_gguf_repo, _handle_safetensors_repo) to improve maintainability.
| {"rfilename": "model-q4.gguf", "size": 1000000000} | ||
| ] | ||
| }).encode("utf-8") | ||
| elif "model-q4.gguf" in url: | ||
| import struct | ||
| header = b"GGUF" + struct.pack("<IQQ", 2, 0, 0) | ||
| return header | ||
| raise ValueError(f"Unexpected url: {url}") | ||
|
|
||
| monkeypatch.setattr(huggingface, "_make_request", fake_make_request) | ||
|
|
||
| tensors, config, format_name, disk_size = huggingface.fetch_huggingface_repo("org/model-gguf") | ||
|
|
||
| assert format_name == "GGUF" |
There was a problem hiding this comment.
⚪ LOW RISK
Nitpick: Detected code duplication in test mocks. The fake_make_request logic is repeated across multiple test cases. It is recommended to extract this logic into a reusable pytest fixture or a helper function configured with a URL-to-response mapping.
| raise PermissionError(f"Gated/Private Model (401 Unauthorized). Set the HF_TOKEN environment variable to access {real_repo_id}") | ||
| if e.code == 404: | ||
| raise FileNotFoundError(f"Could not find repository on Hugging Face (404 Not Found): {repo_id}") | ||
| raise FileNotFoundError(f"Could not find repository on Hugging Face (404 Not Found): {real_repo_id}") |
There was a problem hiding this comment.
⚪ LOW RISK
Nitpick: Minor indentation inconsistency: line 302 has an extra leading space before 'raise FileNotFoundError'.
Summary
Fix the fallback shape guessing logic for GGUF models to correctly handle their column-major dimension ordering (
[width, height]).Motivation & Context
GGUF models store tensors in a column-major format compared to PyTorch/SafeTensors' row-major layout. As a result, when fallback shape guessing is triggered (such as when GGUF/config metadata is missing or unrecognized), querying
shape[0]for key projection or fused projection weights accesses the width (input dimension) instead of the height (output/key-value dimension). This leads to extremely inflatedkv_dimcalculations and inaccurate memory/VRAM estimations.This change detects whether the input tensors represent a GGUF model by checking for standard GGUF metadata keys starting with
general., and then usesshape[-1](the last dimension) instead ofshape[0]for fallback dimension extraction.Type of Change
How Has This Been Tested?
Added regression tests in
tests/test_calculator.pyverifying:shape[-1]fork_projtensors.qkv_projGGUF tensors correctly uses(shape[-1] // 3).All tests pass successfully.
Screenshots (if appropriate)
N/A
Checklist