LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences Paper • 2412.01292 • Published Dec 2, 2024 • 13
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI Paper • 2410.11623 • Published Oct 15, 2024 • 48
3D-VLA: A 3D Vision-Language-Action Generative World Model Paper • 2403.09631 • Published Mar 14, 2024 • 8
3D-VLA: A 3D Vision-Language-Action Generative World Model Paper • 2403.09631 • Published Mar 14, 2024 • 8
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding Paper • 2311.03354 • Published Nov 6, 2023 • 5
3D-LLM: Injecting the 3D World into Large Language Models Paper • 2307.12981 • Published Jul 24, 2023 • 36