fp8-quantization

Here is 1 public repository matching this topic...

ToddThomson / Mila

A C++23 module-based DNN library for GPU-first LLM inference — explicit forward passes, no hidden execution engine, work at the metal. Validated token-for-token on Gemma 4 Unified, Llama 3.x and GPT-2 with compile-time FP8/FP4 weight quantization.

Updated Jun 24, 2026
C++

Improve this page

Add a description, image, and links to the fp8-quantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the fp8-quantization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fp8-quantization

Here is 1 public repository matching this topic...

ToddThomson / Mila

Improve this page

Add this topic to your repo