Octo-planner: On-device Language Model for Planner-Action Agents Paper • 2406.18082 • Published Jun 26, 2024 • 48
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation Paper • 2406.09961 • Published Jun 14, 2024 • 55
Make It Count: Text-to-Image Generation with an Accurate Number of Objects Paper • 2406.10210 • Published Jun 14, 2024 • 77
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs Paper • 2406.11833 • Published Jun 17, 2024 • 61
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17, 2024 • 50
Bootstrapping Language Models with DPO Implicit Rewards Paper • 2406.09760 • Published Jun 14, 2024 • 38
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation Paper • 2406.12849 • Published Jun 18, 2024 • 49
Customizing Text-to-Image Models with a Single Image Pair Paper • 2405.01536 • Published May 2, 2024 • 19
LLM-AD: Large Language Model based Audio Description System Paper • 2405.00983 • Published May 2, 2024 • 17
Paint by Inpaint: Learning to Add Image Objects by Removing Them First Paper • 2404.18212 • Published Apr 28, 2024 • 27
A Careful Examination of Large Language Model Performance on Grade School Arithmetic Paper • 2405.00332 • Published May 1, 2024 • 30
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25, 2024 • 75
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model Paper • 2404.19759 • Published Apr 30, 2024 • 24