CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published 1 day ago • 31
Training Software Engineering Agents and Verifiers with SWE-Gym Paper • 2412.21139 • Published 4 days ago • 16
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper • 2412.21037 • Published 5 days ago • 20
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published 11 days ago • 59
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models Paper • 2412.18605 • Published 10 days ago • 17
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models Paper • 2412.18609 • Published 10 days ago • 13
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published 11 days ago • 33
Re-assessing ImageNet: How aligned is its single-label assumption with its multi-label nature? Paper • 2412.18409 • Published 11 days ago • 1
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks Paper • 2412.15204 • Published 15 days ago • 32
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper • 2412.14171 • Published 16 days ago • 23
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published 16 days ago • 47
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 22 days ago • 80
ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities Paper • 2412.06745 • Published 25 days ago • 6
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations Paper • 2412.08580 • Published 23 days ago • 45