admarcosai
's Collections
Efficient Training
updated
Rethinking Optimization and Architecture for Tiny Language Models
Paper
•
2402.02791
•
Published
•
12
Specialized Language Models with Cheap Inference from Limited Domain
Data
Paper
•
2402.01093
•
Published
•
45
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper
•
2401.17574
•
Published
•
15
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper
•
2401.02038
•
Published
•
62
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Paper
•
2312.00678
•
Published
•
2
TinyLlama: An Open-Source Small Language Model
Paper
•
2401.02385
•
Published
•
90
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
•
2401.02954
•
Published
•
41
Ziya2: Data-centric Learning is All LLMs Need
Paper
•
2311.03301
•
Published
•
16
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language
Modeling
Paper
•
2401.16380
•
Published
•
48
Towards Optimal Learning of Language Models
Paper
•
2402.17759
•
Published
•
16
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
•
2403.03507
•
Published
•
183
Beyond Language Models: Byte Models are Digital World Simulators
Paper
•
2402.19155
•
Published
•
49