Add --precision flag for reduced-precision inference by rioffe · Pull Request #79 · apple/ml-sharp

rioffe · 2026-03-04T19:31:10Z

Selectively casts heavy encoder/backbone modules (monodepth_model, feature_model) to bfloat16 or float16 while keeping lightweight heads in float32 for numerical stability. Achieves ~2x inference speedup on MPS with bfloat16.

Summary

Adds a --precision flag to sharp predict accepting float32 (default), bfloat16, and float16
Only the heavy encoder/backbone modules (monodepth_model, feature_model) are cast to reduced precision; lightweight heads remain in float32 for numerical stability
Forward hooks handle input/output casting so the rest of the pipeline is unaffected
Documents the new flag in README with usage example

Test plan

Run sharp predict without --precision flag and verify output is unchanged (float32 default)
Run sharp predict --precision bfloat16 and verify Gaussians are produced without errors
Run sharp predict --precision float16 and verify Gaussians are produced without errors
Compare output quality between float32 and reduced-precision runs on a known input
Verify inference speed improvement on supported hardware (CUDA/MPS) with bfloat16/float16

Selectively casts heavy encoder/backbone modules (monodepth_model, feature_model) to bfloat16 or float16 while keeping lightweight heads in float32 for numerical stability. Achieves ~2x inference speedup on MPS with bfloat16. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --precision flag for reduced-precision inference#79

Add --precision flag for reduced-precision inference#79
rioffe wants to merge 1 commit into
apple:mainfrom
rioffe:main

rioffe commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rioffe commented Mar 4, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant