-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 605 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 104 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 43
Collections
Discover the best community collections!
Collections including paper arxiv:2410.05265
-
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 48 -
OneBit: Towards Extremely Low-bit Large Language Models
Paper • 2402.11295 • Published • 23 -
A Survey on Transformer Compression
Paper • 2402.05964 • Published -
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
Paper • 2402.08958 • Published • 3
-
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Paper • 2407.11062 • Published • 8 -
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs
Paper • 2410.05265 • Published • 30 -
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Paper • 2308.13137 • Published • 17
-
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs
Paper • 2410.05265 • Published • 30 -
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
Paper • 2410.03450 • Published • 36 -
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
Paper • 2410.08196 • Published • 45 -
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
Paper • 2410.07303 • Published • 18
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 33 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 26 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 121 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 21
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 26 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 12 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 46 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 28