smpanaro
's Collections
quant
updated
SqueezeLLM: Dense-and-Sparse Quantization
Paper
•
2306.07629
•
Published
•
4
Norm Tweaking: High-performance Low-bit Quantization of Large Language
Models
Paper
•
2309.02784
•
Published
•
1
Extreme Compression of Large Language Models via Additive Quantization
Paper
•
2401.06118
•
Published
•
12
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper
•
2402.04291
•
Published
•
48
OneBit: Towards Extremely Low-bit Large Language Models
Paper
•
2402.11295
•
Published
•
23
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for
Large Language Models
Paper
•
2402.14866
•
Published
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
Paper
•
2403.02775
•
Published
•
11
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper
•
2402.15319
•
Published
•
19
COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization
Paper
•
2403.07134
•
Published
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and
Inference of Large Language Models
Paper
•
2306.02272
•
Published
QuantEase: Optimization-based Quantization for Language Models
Paper
•
2309.01885
•
Published
•
4
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
•
2401.15024
•
Published
•
69
decoupleQ: Towards 2-bit Post-Training Uniform Quantization via
decoupling Parameters into Integer and Floating Points
Paper
•
2404.12759
•
Published
SpinQuant: LLM quantization with learned rotations
Paper
•
2405.16406
•
Published
•
1
Rotation and Permutation for Advanced Outlier Management and Efficient
Quantization of LLMs
Paper
•
2406.01721
•
Published
Attention-aware Post-training Quantization without Backpropagation
Paper
•
2406.13474
•
Published
•
1
Accuracy is Not All You Need
Paper
•
2407.09141
•
Published
•
1
FlatQuant: Flatness Matters for LLM Quantization
Paper
•
2410.09426
•
Published
•
13