GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration Paper • 2412.04440 • Published Dec 5, 2024 • 19
GRAPE: Generalizing Robot Policy via Preference Alignment Paper • 2411.19309 • Published Nov 28, 2024 • 42
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Paper • 2410.10139 • Published Oct 14, 2024 • 51
Quantifying Generalization Complexity for Large Language Models Paper • 2410.01769 • Published Oct 2, 2024 • 13
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases Paper • 2407.12784 • Published Jul 17, 2024 • 48
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages Paper • 2407.05975 • Published Jul 8, 2024 • 35
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models Paper • 2407.05131 • Published Jul 6, 2024 • 24
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5, 2024 • 53
VideoLLaMA 2 Collection Optimized VideoLLaMA with improved spatial-temporal modeling and better audio understanding capability • 13 items • Updated 3 days ago • 20
AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition Paper • 2402.11452 • Published Feb 18, 2024 • 1
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding Paper • 2403.00425 • Published Mar 1, 2024 • 1