8 15 10

Yuhang Zang

yuhangzang

https://yuhangzang.github.io/

AI & ML interests

Open-source, CV, and NLP :-)

Recent Activity

authored a paper 24 days ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

authored a paper about 1 month ago

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

upvoted a paper about 1 month ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

View all activity

Organizations

yuhangzang's activity

authored a paper 24 days ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published 25 days ago • 92

authored a paper about 1 month ago

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

Paper • 2412.01824 • Published Dec 2, 2024 • 65

upvoted 3 papers about 1 month ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 58

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 22

Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published Nov 21, 2024 • 43

liked 2 datasets about 2 months ago

microsoft/orca-agentinstruct-1M-v1

Viewer • Updated Nov 1, 2024 • 1.05M • 4.97k • 410

HuggingFaceTB/smoltalk

Viewer • Updated Nov 26, 2024 • 2.2M • 8.42k • 264

upvoted a paper about 2 months ago

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Paper • 2411.13503 • Published Nov 20, 2024 • 30

authored a paper 2 months ago

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published Oct 23, 2024 • 34

upvoted 2 papers 2 months ago

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published Oct 23, 2024 • 34

Aligning Large Language Models via Self-Steering Optimization

Paper • 2410.17131 • Published Oct 22, 2024 • 21

authored 2 papers 3 months ago

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Paper • 2410.17247 • Published Oct 22, 2024 • 45

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published Oct 21, 2024 • 66

upvoted a paper 3 months ago

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published Oct 21, 2024 • 66

authored 2 papers 3 months ago

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way

Paper • 2410.06241 • Published Oct 8, 2024 • 10

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate

Paper • 2410.07167 • Published Oct 9, 2024 • 37

liked a dataset 4 months ago

nvidia/ChatQA2-Long-SFT-data

Viewer • Updated Sep 9, 2024 • 117k • 107 • 20

New activity in internlm/internlm-xcomposer2d5-7b-4bit 5 months ago

Update modeling_internlm_xcomposer2.py

#4 opened 5 months ago by

yuhangzang

authored a paper 6 months ago

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

Paper • 2407.01523 • Published Jul 1, 2024

New activity in internlm/internlm-xcomposer2d5-7b 6 months ago

We couldn't connect to 'https://huggingface.co' to load this file

#15 opened 6 months ago by

Jianyu