ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models Paper • 2401.13311 • Published Jan 24, 2024 • 11
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion Paper • 2401.13388 • Published Jan 24, 2024 • 11
Asynchronous Local-SGD Training for Language Modeling Paper • 2401.09135 • Published Jan 17, 2024 • 11
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models Paper • 2401.09047 • Published Jan 17, 2024 • 14
PALP: Prompt Aligned Personalization of Text-to-Image Models Paper • 2401.06105 • Published Jan 11, 2024 • 49
ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers Paper • 2401.02072 • Published Jan 4, 2024 • 10
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation Paper • 2401.02117 • Published Jan 4, 2024 • 31
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text Paper • 2311.07446 • Published Nov 13, 2023 • 29