mllm applications - a plmsmile Collection

plmsmile 's Collections

vision foundation modesl

image-video llm

llm

video generation

mllm applications

mllm applications

updated Jun 17, 2024

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Paper • 2404.05719 • Published Apr 8, 2024 • 83
OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 37