Skip to content
View themoddedcube's full-sized avatar

Block or report themoddedcube

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. turboquant-plus turboquant-plus Public

    TurboQuant+: 3-bit KV cache value quantization and group size optimization for long-context LLM inference

    Python 1

  2. ChannelQuant ChannelQuant Public

    Near-lossless 4× KV-cache compression for GQA models at ~4.1 bits/value — per-channel-key INT4 + static outlier ROM, with a reproducible reference model and paper.

    Python

  3. LonghornSilicon/kv-cache-engine LonghornSilicon/kv-cache-engine Public

    Hardware KV cache compression engine (SystemVerilog) using TurboQuant+ — keys at 4.25 bpv, values at ~3.0 bpv for 3–5× DRAM bandwidth reduction on LLM inference. Block 2 of the LonghornSilicon acce…

    SystemVerilog 1

  4. LonghornSilicon/adaptive-precision-attention LonghornSilicon/adaptive-precision-attention Public

    Entropy-guided mixed-precision attention: evolutionary search discovers that entropy is the optimal discriminator for per-block quantization decisions

    Python

  5. evoplace evoplace Public

    LLM-guided evolutionary VLSI placement: beating DreamPlace with evolved objective functions

    Python 1

  6. covenant covenant Public

    Contract-driven GPU analytical placer (DREAMPlace fork overlay) with a noise-calibrated evaluation protocol, paired multi-seed CIs, liveness gates, calibration arms.

    Python