SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training Paper • 2501.06842 • Published 9 days ago • 15
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN Paper • 2412.13795 • Published Dec 18, 2024 • 19
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts Paper • 2407.03203 • Published Jul 3, 2024 • 12
Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise Paper • 2312.14567 • Published Dec 22, 2023 • 1
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning Paper • 2403.17919 • Published Mar 26, 2024 • 16