17 37 23

HAODONG DUAN

KennyUTC

https://kennymckormick.github.io

AI & ML interests

Video Understanding; Multi-Modal Learning

Recent Activity

updated a dataset 7 days ago

VLMEval/OpenVLMRecords

updated a Space 13 days ago

opencompass/open_vlm_leaderboard

updated a Space 17 days ago

opencompass/Open_LMM_Reasoning_Leaderboard

View all activity

Articles

Claude-3.5 Evaluation Results on Open VLM Leaderboard

Jun 24, 2024

• 6

RealWorldQA, What's New?

Apr 25, 2024

• 5

Organizations

KennyUTC's activity

upvoted a paper 17 days ago

Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published 17 days ago • 91

upvoted a paper 22 days ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published 22 days ago • 92

upvoted 4 papers about 1 month ago

Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline

Paper • 2411.12814 • Published Nov 19, 2024 • 21

SegBook: A Simple Baseline and Cookbook for Volumetric Medical Image Segmentation

Paper • 2411.14525 • Published Nov 21, 2024 • 19

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

Paper • 2411.14522 • Published Nov 21, 2024 • 31

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

Paper • 2411.15296 • Published Nov 22, 2024 • 19

upvoted 5 papers 2 months ago

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published Oct 23, 2024 • 34

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Paper • 2410.17247 • Published Oct 22, 2024 • 45

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Paper • 2410.13861 • Published Oct 17, 2024 • 53

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

Paper • 2410.16256 • Published Oct 21, 2024 • 59

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published Oct 21, 2024 • 66

upvoted a paper 3 months ago

ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

Paper • 2410.12405 • Published Oct 16, 2024 • 13

upvoted a collection 4 months ago

VisionLM

Collection

591 items • Updated 1 day ago • 39

upvoted a paper 4 months ago

POINTS: Improving Your Vision-language Model with Affordable Strategies

Paper • 2409.04828 • Published Sep 7, 2024 • 22

upvoted a collection 5 months ago

VILA: On Pre-training for Visual Language Models

Collection

10 items • Updated Oct 31, 2024 • 47

upvoted a paper 5 months ago

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

Paper • 2408.03361 • Published Aug 6, 2024 • 85

upvoted 2 papers 6 months ago

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Paper • 2407.11691 • Published Jul 16, 2024 • 13

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3, 2024 • 93

upvoted a collection 6 months ago

InternVL2.0

Collection

Expanding Performance Boundaries of Open-Source MLLM • 15 items • Updated 7 days ago • 88

upvoted a paper 6 months ago

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

Paper • 2406.17770 • Published Jun 25, 2024 • 18