Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1, 2024 • 145
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models Paper • 2404.05567 • Published Apr 8, 2024 • 9