Skip to content

Add bitnet-embeddings-0.6b model adaptation and GGUF conversion tools#558

Open
isHuangXin wants to merge 2 commits into
microsoft:mainfrom
isHuangXin:dev-bitnet-embedding-0.6b
Open

Add bitnet-embeddings-0.6b model adaptation and GGUF conversion tools#558
isHuangXin wants to merge 2 commits into
microsoft:mainfrom
isHuangXin:dev-bitnet-embedding-0.6b

Conversation

@isHuangXin
Copy link
Copy Markdown

  • Add GGUF conversion tool for bitnet-embeddings-0.6b (safetensors -> F16 GGUF and I2_S GGUF)
  • Add Qwen3 architecture support in llama.cpp submodule with per-projection RMSNorm
  • Add I2_S ternary quantization (2-bit packed -1/0/+1) for lossless precision
  • Add f16 norm weight support for correct embedding inference
  • Add benchmark and accuracy verification scripts
  • Add GGUF layer inspection utilities for F16 and I2_S formats
  • Add bitnet-lut-kernels.h placeholder for standalone compilation
  • Update llama.cpp submodule to dev-bitnet-embedding-0.6b branch

…nversion

- Add GGUF conversion tool for bitnet-embeddings-0.6b (safetensors -> F16 GGUF and I2_S GGUF)
- Add Qwen3 architecture support in llama.cpp submodule with per-projection RMSNorm
- Add I2_S ternary quantization (2-bit packed -1/0/+1) for lossless precision
- Add f16 norm weight support for correct embedding inference
- Add benchmark and accuracy verification scripts
- Add GGUF layer inspection utilities for F16 and I2_S formats
- Add bitnet-lut-kernels.h placeholder for standalone compilation
- Update llama.cpp submodule to dev-bitnet-embedding-0.6b branch
@isHuangXin
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant