2 6 3

Wanrong Zhu

VegB

https://wanrong-zhu.com/

AI & ML interests

None yet

Recent Activity

upvoted a paper 26 days ago

I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

View all activity

Organizations

VegB's activity

upvoted a paper 26 days ago

I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

Paper • 2412.06676 • Published 29 days ago • 9

upvoted a paper 4 months ago

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

Paper • 2409.03420 • Published Sep 5, 2024 • 26

authored a paper 7 months ago

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Paper • 2406.08407 • Published Jun 12, 2024 • 24

upvoted a paper 7 months ago

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87

upvoted a paper 9 months ago

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Paper • 2404.16375 • Published Apr 25, 2024 • 16

authored a paper 9 months ago

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Paper • 2404.16375 • Published Apr 25, 2024 • 16

liked a dataset 10 months ago

Lin-Chen/ShareGPT4V

Viewer • Updated Jun 6, 2024 • 1.35M • 589 • 271

upvoted a paper about 1 year ago

VILA: On Pre-training for Visual Language Models

Paper • 2312.07533 • Published Dec 12, 2023 • 20

authored a paper about 1 year ago

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Paper • 2311.07562 • Published Nov 13, 2023 • 13

upvoted a paper about 1 year ago

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Paper • 2311.07562 • Published Nov 13, 2023 • 13

liked a dataset over 1 year ago

allenai/dolma

Updated Apr 17, 2024 • 650 • 862

authored 3 papers over 1 year ago

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text

Paper • 2304.06939 • Published Apr 14, 2023

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Paper • 2308.01390 • Published Aug 2, 2023 • 33

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

Paper • 2308.06595 • Published Aug 12, 2023 • 5

liked a model about 2 years ago

prompthero/openjourney

Text-to-Image • Updated May 15, 2023 • 17.7k • 3.11k