TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation Paper • 2412.03069 • Published Dec 4, 2024 • 30
Are Emergent Abilities of Large Language Models a Mirage? Paper • 2304.15004 • Published Apr 28, 2023 • 6
Scaling Image Tokenizers with Grouped Spherical Quantization Paper • 2412.02632 • Published Dec 3, 2024 • 10
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation Paper • 2410.13848 • Published Oct 17, 2024 • 32
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 105
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Paper • 2412.04432 • Published Dec 5, 2024 • 14
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces Paper • 2412.14171 • Published 19 days ago • 23
Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published 19 days ago • 14