Skip to content

fix: handle reverse tensor shape ordering for gguf shape guessing#42

Closed
pipe1os wants to merge 19 commits into
mainfrom
advisor/006-fix-gguf-shape-guessing
Closed

fix: handle reverse tensor shape ordering for gguf shape guessing#42
pipe1os wants to merge 19 commits into
mainfrom
advisor/006-fix-gguf-shape-guessing

Conversation

@pipe1os

@pipe1os pipe1os commented Jun 27, 2026

Copy link
Copy Markdown
Owner

Summary

Fix the fallback shape guessing logic for GGUF models to correctly handle their column-major dimension ordering ([width, height]).

Motivation & Context

GGUF models store tensors in a column-major format compared to PyTorch/SafeTensors' row-major layout. As a result, when fallback shape guessing is triggered (such as when GGUF/config metadata is missing or unrecognized), querying shape[0] for key projection or fused projection weights accesses the width (input dimension) instead of the height (output/key-value dimension). This leads to extremely inflated kv_dim calculations and inaccurate memory/VRAM estimations.

This change detects whether the input tensors represent a GGUF model by checking for standard GGUF metadata keys starting with general., and then uses shape[-1] (the last dimension) instead of shape[0] for fallback dimension extraction.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactoring (no functional changes, no api changes)
  • Documentation update

How Has This Been Tested?

Added regression tests in tests/test_calculator.py verifying:

  • Fallback shape guessing for GGUF models correctly uses shape[-1] for k_proj tensors.
  • Fallback shape guessing for fused qkv_proj GGUF tensors correctly uses (shape[-1] // 3).

All tests pass successfully.

  • Unit tests
  • Integration tests
  • Manual testing

Screenshots (if appropriate)

N/A

Checklist

  • My code follows the code style of this project.
  • My commit messages follow the Conventional Commits format, are lowercase, imperative, and specific.
  • I have updated the documentation accordingly (if applicable).
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@pipe1os, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 52 minutes and 12 seconds. Learn how PR review limits work.

To continue reviewing without waiting, enable usage-based billing in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: fb151d3e-a40b-4c8a-9635-93044f79a5dd

📥 Commits

Reviewing files that changed from the base of the PR and between 0e82634 and e972e57.

📒 Files selected for processing (2)
  • src/modelinfo/architecture.py
  • tests/test_calculator.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch advisor/006-fix-gguf-shape-guessing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codacy-production

codacy-production Bot commented Jun 27, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

AI Reviewer: first review requested successfully. AI can make mistakes. Always validate suggestions.

Run reviewer

TIP This summary will be updated as you push new changes.

@codacy-production codacy-production Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

While this PR is technically 'up to standards' according to Codacy, it introduces substantial scope creep that is not reflected in the title. Beyond the GGUF shape-guessing fix, it implements remote GGUF fetching, a custom RemoteFileStream, and UI comparison tables.

The implementation has caused a significant spike in complexity, particularly in src/modelinfo/parsers/huggingface.py (+41), which now functions as a monolithic dispatcher. Additionally, there are minor logic issues in the CLI error handling and code duplication in the new test suite. Addressing the complexity deltas through the suggested refactors is highly recommended before merging.

About this PR

  • This PR contains significant scope creep. The title and summary focus on a specific bug fix for shape guessing, but the implementation introduces major new features (remote GGUF support, UI comparison tables, and a custom RemoteFileStream) that are not detailed in the description. This makes the review process more difficult and obscures architectural changes.
2 comments outside of the diff
src/modelinfo/cli.py

line 133 🟡 MEDIUM RISK
The analyze_model function exceeds the cyclomatic complexity threshold (16). It currently handles URI resolution, local path validation, and multi-format parser dispatch. It is recommended to extract the format detection and parser selection logic into a separate factory function.

line 183 ⚪ LOW RISK
The error message is misleading when a local file exists but does not match supported extensions. If the file exists but no extension matches, the exception should state that the format is unsupported rather than saying the file was 'not found locally'.

Test suggestions

  • Verify fallback shape guessing for GGUF k_proj uses shape[-1]
  • Verify fallback shape guessing for GGUF fused qkv_proj uses shape[-1] // 3
  • Verify identification of GGUF architecture based on 'general.' metadata keys
  • Verify remote fetching of a single GGUF file from Hugging Face
  • Verify remote fetching and grouping of multiple GGUF variants from a repository

TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback

return 0.0


class RemoteFileStream:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 HIGH RISK

Suggestion: The complexity of this file has increased significantly (+41) with the addition of remote GGUF support and the RemoteFileStream implementation. fetch_huggingface_repo has become a monolithic function handling multiple file formats and sharding strategies. Consider refactoring it into smaller, format-specific functions (e.g., _handle_gguf_repo, _handle_safetensors_repo) to improve maintainability.

See Complexity in Codacy

Comment thread tests/test_parsers.py
Comment on lines +96 to +109
{"rfilename": "model-q4.gguf", "size": 1000000000}
]
}).encode("utf-8")
elif "model-q4.gguf" in url:
import struct
header = b"GGUF" + struct.pack("<IQQ", 2, 0, 0)
return header
raise ValueError(f"Unexpected url: {url}")

monkeypatch.setattr(huggingface, "_make_request", fake_make_request)

tensors, config, format_name, disk_size = huggingface.fetch_huggingface_repo("org/model-gguf")

assert format_name == "GGUF"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚪ LOW RISK

Nitpick: Detected code duplication in test mocks. The fake_make_request logic is repeated across multiple test cases. It is recommended to extract this logic into a reusable pytest fixture or a helper function configured with a URL-to-response mapping.

raise PermissionError(f"Gated/Private Model (401 Unauthorized). Set the HF_TOKEN environment variable to access {real_repo_id}")
if e.code == 404:
raise FileNotFoundError(f"Could not find repository on Hugging Face (404 Not Found): {repo_id}")
raise FileNotFoundError(f"Could not find repository on Hugging Face (404 Not Found): {real_repo_id}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚪ LOW RISK

Nitpick: Minor indentation inconsistency: line 302 has an extra leading space before 'raise FileNotFoundError'.

@pipe1os pipe1os closed this Jun 27, 2026
@pipe1os pipe1os deleted the advisor/006-fix-gguf-shape-guessing branch June 27, 2026 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant