A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes
Paper
•
2102.06356
•
Published
Note Optimizer-Google
Note Optimizer-lamb
Note Optimizer-adamw https://arxiv.org/abs/2410.05192 Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective Stanford. WSD to LR