-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 41 -
Qwen Technical Report
Paper • 2309.16609 • Published • 35 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 5 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 44
Collections
Discover the best community collections!
Collections including paper arxiv:2405.12250
-
Language Modeling Is Compression
Paper • 2309.10668 • Published • 83 -
Small-scale proxies for large-scale Transformer training instabilities
Paper • 2309.14322 • Published • 20 -
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval
Paper • 2309.15129 • Published • 6 -
Vision Transformers Need Registers
Paper • 2309.16588 • Published • 78