RRM: Robust Reward Model Training Mitigates Reward Hacking Paper • 2409.13156 • Published Sep 20, 2024 • 5
CompCap: Improving Multimodal Large Language Models with Composite Captions Paper • 2412.05243 • Published Dec 6, 2024 • 18
Divide-or-Conquer? Which Part Should You Distill Your LLM? Paper • 2402.15000 • Published Feb 22, 2024 • 22