Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Paper • 2402.19427 • Published Feb 29, 2024 • 52
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11, 2024 • 44
WizardLM: Empowering Large Language Models to Follow Complex Instructions Paper • 2304.12244 • Published Apr 24, 2023 • 14