LongAlign: A Recipe for Long Context Alignment of Large Language Models Paper • 2401.18058 • Published Jan 31, 2024 • 20
Efficient Tool Use with Chain-of-Abstraction Reasoning Paper • 2401.17464 • Published Jan 30, 2024 • 17
Scavenging Hyena: Distilling Transformers into Long Convolution Models Paper • 2401.17574 • Published Jan 31, 2024 • 15
Rethinking Interpretability in the Era of Large Language Models Paper • 2402.01761 • Published Jan 30, 2024 • 22
BlackMamba: Mixture of Experts for State-Space Models Paper • 2402.01771 • Published Feb 1, 2024 • 23
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29, 2024 • 26
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 76
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay Paper • 2402.04858 • Published Feb 7, 2024 • 14
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry Paper • 2402.04347 • Published Feb 6, 2024 • 13
Direct Language Model Alignment from Online AI Feedback Paper • 2402.04792 • Published Feb 7, 2024 • 29
Premise Order Matters in Reasoning with Large Language Models Paper • 2402.08939 • Published Feb 14, 2024 • 27
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? Paper • 2403.14624 • Published Mar 21, 2024 • 51