Skip to content

Add --precision flag for reduced-precision inference#79

Open
rioffe wants to merge 1 commit intoapple:mainfrom
rioffe:main
Open

Add --precision flag for reduced-precision inference#79
rioffe wants to merge 1 commit intoapple:mainfrom
rioffe:main

Conversation

@rioffe
Copy link

@rioffe rioffe commented Mar 4, 2026

Selectively casts heavy encoder/backbone modules (monodepth_model, feature_model) to bfloat16 or float16 while keeping lightweight heads in float32 for numerical stability. Achieves ~2x inference speedup on MPS with bfloat16.

Summary

  • Adds a --precision flag to sharp predict accepting float32 (default), bfloat16, and float16
  • Only the heavy encoder/backbone modules (monodepth_model, feature_model) are cast to reduced precision; lightweight heads remain in float32 for numerical stability
  • Forward hooks handle input/output casting so the rest of the pipeline is unaffected
  • Documents the new flag in README with usage example

Test plan

  • Run sharp predict without --precision flag and verify output is unchanged (float32 default)
  • Run sharp predict --precision bfloat16 and verify Gaussians are produced without errors
  • Run sharp predict --precision float16 and verify Gaussians are produced without errors
  • Compare output quality between float32 and reduced-precision runs on a known input
  • Verify inference speed improvement on supported hardware (CUDA/MPS) with bfloat16/float16

Selectively casts heavy encoder/backbone modules (monodepth_model,
feature_model) to bfloat16 or float16 while keeping lightweight heads
in float32 for numerical stability. Achieves ~2x inference speedup
on MPS with bfloat16.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant