Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training Paper • 2405.15319 • Published May 24, 2024 • 26
Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published May 23, 2024 • 28
Transformers Can Do Arithmetic with the Right Embeddings Paper • 2405.17399 • Published May 27, 2024 • 52
Yuan 2.0-M32: Mixture of Experts with Attention Router Paper • 2405.17976 • Published May 28, 2024 • 18
LLM Augmented LLMs: Expanding Capabilities through Composition Paper • 2401.02412 • Published Jan 4, 2024 • 37
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation Paper • 2310.03214 • Published Oct 5, 2023 • 18
Meta-Transformer: A Unified Framework for Multimodal Learning Paper • 2307.10802 • Published Jul 20, 2023 • 44