DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models Paper • 2309.14509 • Published Sep 25, 2023 • 17