DiLoCo: Distributed Low-Communication Training of Language Models Paper • 2311.08105 • Published Nov 14, 2023 • 15
Asynchronous Local-SGD Training for Language Modeling Paper • 2401.09135 • Published Jan 17, 2024 • 11