Zhang Yuanhan's picture

Zhang Yuanhan

ZhangYuanhan

·

https://zhangyuanhan-ai.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper 21 days ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

upvoted a paper about 1 month ago

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

new activity about 2 months ago

lmms-lab/LLaVA-Video-178K:Query about how many frames are used to generate each caption?

View all activity

Organizations

ZhangYuanhan's activity

upvoted a paper 21 days ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published 24 days ago • 136

upvoted a paper about 1 month ago

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 77

upvoted a paper about 2 months ago

HourVideo: 1-Hour Video-Language Understanding

Paper • 2411.04998 • Published Nov 7, 2024 • 1

upvoted 3 papers 3 months ago

Contrastive Localized Language-Image Pre-Training

Paper • 2410.02746 • Published Oct 3, 2024 • 33

LLaVA-Critic: Learning to Evaluate Multimodal Models

Paper • 2410.02712 • Published Oct 3, 2024 • 35

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 38

upvoted a collection 4 months ago

LLaVA-Video

Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 6 items • Updated Oct 5, 2024 • 56

upvoted a paper 5 months ago

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 60

upvoted a paper 6 months ago

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 33

upvoted a paper 7 months ago

Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published Jun 24, 2024 • 32

upvoted a paper 10 months ago

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

Paper • 2402.07865 • Published Feb 12, 2024 • 12

upvoted 3 papers over 1 year ago

Aligning Large Multimodal Models with Factually Augmented RLHF

Paper • 2309.14525 • Published Sep 25, 2023 • 30

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Paper • 2308.01390 • Published Aug 2, 2023 • 33

DIALGEN: Collaborative Human-LM Generated Dialogues for Improved Understanding of Human-Human Conversations

Paper • 2307.07047 • Published Jul 13, 2023 • 15