Imagine360: Immersive 360 Video Generation from Perspective Anchor Paper • 2412.03552 • Published Dec 4, 2024 • 26
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters Paper • 2412.00174 • Published Nov 29, 2024 • 22
alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta Image-to-Image • Updated Oct 12, 2024 • 13.6k • 246
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models Paper • 2410.09732 • Published Oct 13, 2024 • 54
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models Paper • 2410.09732 • Published Oct 13, 2024 • 54
MinerU: An Open-Source Solution for Precise Document Content Extraction Paper • 2409.18839 • Published Sep 27, 2024 • 27
CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis Paper • 2408.14765 • Published Aug 27, 2024 • 15
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios Paper • 2408.17267 • Published Aug 30, 2024 • 23
nvidia/segformer-b1-finetuned-cityscapes-1024-1024 Image Segmentation • Updated Aug 9, 2022 • 7.52k • 12