Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published 2 days ago • 36
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published 3 days ago • 25
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 3 days ago • 53
Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 4 days ago • 21
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper • 2501.13926 • Published 8 days ago • 31
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published 8 days ago • 22
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning Paper • 2501.12570 • Published 10 days ago • 22
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 9 days ago • 76
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Paper • 2501.12368 • Published 10 days ago • 39
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published 11 days ago • 88
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published 10 days ago • 80
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 9 days ago • 279