Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2410.05265

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 605
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 104
TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14, 2024 • 43

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Paper • 2402.04291 • Published Feb 6, 2024 • 48
OneBit: Towards Extremely Low-bit Large Language Models

Paper • 2402.11295 • Published Feb 17, 2024 • 23
A Survey on Transformer Compression

Paper • 2402.05964 • Published Feb 5, 2024
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Paper • 2402.08958 • Published Feb 14, 2024 • 3

LLM Compression

Quantization、Prunning、Distillation

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Paper • 2407.11062 • Published Jul 10, 2024 • 8
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs

Paper • 2410.05265 • Published Oct 7, 2024 • 30
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

Paper • 2308.13137 • Published Aug 25, 2023 • 17

PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs

Paper • 2410.05265 • Published Oct 7, 2024 • 30
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents

Paper • 2410.03450 • Published Oct 4, 2024 • 36
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

Paper • 2410.08196 • Published Oct 10, 2024 • 45
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow

Paper • 2410.07303 • Published Oct 9, 2024 • 18

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 33
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 26
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 121
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 21

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 26
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24, 2024 • 12
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 46
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 28

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs