VLM 👁️👁️ - a thehandsomefrog4825 Collection

thehandsomefrog4825 's Collections

Object detection 🔍

VLM 👁️👁️

Object segmentation 🧩

Reinforce learning 🔃

GAN

Robotic 🤖🔧

TTI ⌨️➡️🖼️

TTS ⌨️➡️🗣️

TTV 📝➡️📺

Generative 🎨

VLM 👁️👁️

updated 4 days ago

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published 16 days ago • 95
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published 21 days ago • 79
VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published Dec 5, 2024 • 105
PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published Dec 4, 2024 • 124
ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 78
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Paper • 2501.04003 • Published 10 days ago • 23
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published 17 days ago • 41
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published 7 days ago • 55
DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests

Paper • 2501.04671 • Published 9 days ago