-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 22 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 82 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 145 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2407.21783
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 25 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 41 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 117 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 14
-
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 110 -
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Paper • 2409.12191 • Published • 76 -
Baichuan Alignment Technical Report
Paper • 2410.14940 • Published • 50 -
A Survey of Small Language Models
Paper • 2410.20011 • Published • 40
-
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 139 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 3 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 14 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 76
-
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 87 -
Visual Instruction Tuning
Paper • 2304.08485 • Published • 13 -
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 37 -
PALO: A Polyglot Large Multimodal Model for 5B People
Paper • 2402.14818 • Published • 23
-
Apple Intelligence Foundation Language Models
Paper • 2407.21075 • Published • 4 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 110 -
Nemotron-4 340B Technical Report
Paper • 2406.11704 • Published -
Gemma 2: Improving Open Language Models at a Practical Size
Paper • 2408.00118 • Published • 76