Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability Paper • 2411.19943 • Published Nov 29, 2024 • 56
Cautious Optimizers: Improving Training with One Line of Code Paper • 2411.16085 • Published Nov 25, 2024 • 15
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Paper • 2410.23168 • Published Oct 30, 2024 • 24
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA Paper • 2410.20672 • Published Oct 28, 2024 • 6
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published Oct 22, 2024 • 89
MiniPLM: Knowledge Distillation for Pre-Training Language Models Paper • 2410.17215 • Published Oct 22, 2024 • 14
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models Paper • 2410.05229 • Published Oct 7, 2024 • 22
Eurus Collection Advancing LLM Reasoning Generalists with Preference Trees • 11 items • Updated Oct 22, 2024 • 24
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25, 2024 • 61
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19, 2024 • 48
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published Sep 17, 2024 • 18