VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models Paper • 2409.17066 • Published Sep 25, 2024 • 28
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Paper • 2410.02367 • Published Oct 3, 2024 • 47