Training Software Engineering Agents and Verifiers with SWE-Gym Paper • 2412.21139 • Published 7 days ago • 16
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published 14 days ago • 28
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow Paper • 2410.07303 • Published Oct 9, 2024 • 18
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8, 2024 • 108
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion Paper • 2410.03825 • Published Oct 4, 2024 • 19
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Paper • 2410.03051 • Published Oct 4, 2024 • 5
Contrastive Localized Language-Image Pre-Training Paper • 2410.02746 • Published Oct 3, 2024 • 33
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 136
Language Models Learn to Mislead Humans via RLHF Paper • 2409.12822 • Published Sep 19, 2024 • 10
In-Context Imitation Learning via Next-Token Prediction Paper • 2408.15980 • Published Aug 28, 2024 • 9
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23, 2024 • 69
Shape of Motion: 4D Reconstruction from a Single Video Paper • 2407.13764 • Published Jul 18, 2024 • 19
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published Jun 24, 2024 • 59
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29, 2024 • 32
Rethinking Patch Dependence for Masked Autoencoders Paper • 2401.14391 • Published Jan 25, 2024 • 23
Towards A Better Metric for Text-to-Video Generation Paper • 2401.07781 • Published Jan 15, 2024 • 14