ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 125
view article Article Comparing the Performance of LLMs: A Deep Dive into Roberta, Llama 2, and Mistral for Disaster Tweets Analysis with Lora Nov 7, 2023 • 8
Gemma release Collection Groups the Gemma models released by the Google team. • 40 items • Updated Dec 13, 2024 • 329
The Unreasonable Ineffectiveness of the Deeper Layers Paper • 2403.17887 • Published Mar 26, 2024 • 79
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6, 2024 • 184
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 78
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models Paper • 2311.08692 • Published Nov 15, 2023 • 12
SelfEval: Leveraging the discriminative nature of generative models for evaluation Paper • 2311.10708 • Published Nov 17, 2023 • 14