2 216 7

Jaehyun Jun

btjhjeon

https://btjhjeon.github.io/

btjhjeon

AI & ML interests

Multimodal

Recent Activity

updated a collection 1 day ago

Multimodal LLM

upvoted a paper 1 day ago

FAST: Efficient Action Tokenization for Vision-Language-Action Models

updated a collection 3 days ago

Multimodal Dataset

View all activity

Organizations

btjhjeon's activity

upvoted a paper 1 day ago

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Paper • 2501.09747 • Published 3 days ago • 16

upvoted 4 papers 3 days ago

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Paper • 2501.08326 • Published 5 days ago • 31

Multimodal LLMs Can Reason about Aesthetics in Zero-Shot

Paper • 2501.09012 • Published 4 days ago • 10

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Paper • 2501.07783 • Published 6 days ago • 7

MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents

Paper • 2501.08828 • Published 4 days ago • 26

upvoted 2 papers 4 days ago

Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding

Paper • 2501.07888 • Published 5 days ago • 13

A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following

Paper • 2501.08187 • Published 5 days ago • 24

upvoted a paper 5 days ago

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published 9 days ago • 32

upvoted 2 papers 6 days ago

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published 9 days ago • 56

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Paper • 2501.05510 • Published 10 days ago • 35

upvoted 2 papers 7 days ago

On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis

Paper • 2501.04377 • Published 11 days ago • 13

An Empirical Study of Autoregressive Pre-training from Videos

Paper • 2501.05453 • Published 10 days ago • 36

upvoted a paper 10 days ago

URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics

Paper • 2501.04686 • Published 11 days ago • 49

upvoted 2 papers 11 days ago

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published 12 days ago • 48

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Paper • 2501.02955 • Published 13 days ago • 40

upvoted a paper 12 days ago

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Paper • 2501.02976 • Published 13 days ago • 51

upvoted 2 papers 13 days ago

Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

Paper • 2501.01904 • Published 16 days ago • 31

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Paper • 2501.01957 • Published 16 days ago • 41

upvoted 2 papers 16 days ago

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published 20 days ago • 24

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published 18 days ago • 95