-
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 7 -
Scaling Laws for Autoregressive Generative Modeling
Paper • 2010.14701 • Published -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 10 -
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4
Collections
Discover the best community collections!
Collections including paper arxiv:2403.08540
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 55 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 25 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 69 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 76
-
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Paper • 2403.08763 • Published • 50 -
Language models scale reliably with over-training and on downstream tasks
Paper • 2403.08540 • Published • 15 -
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36
-
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper • 2403.07816 • Published • 40 -
microsoft/phi-1_5
Text Generation • Updated • 97.6k • 1.32k -
Language models scale reliably with over-training and on downstream tasks
Paper • 2403.08540 • Published • 15 -
Akashpb13/Swahili_xlsr
Automatic Speech Recognition • Updated • 123 • 8
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 53 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 50 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 137 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 19
-
Rethinking Optimization and Architecture for Tiny Language Models
Paper • 2402.02791 • Published • 13 -
More Agents Is All You Need
Paper • 2402.05120 • Published • 53 -
Scaling Laws for Forgetting When Fine-Tuning Large Language Models
Paper • 2401.05605 • Published -
Aligning Large Language Models with Counterfactual DPO
Paper • 2401.09566 • Published • 2