-
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 40 -
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Paper • 2501.03895 • Published • 48 -
An Empirical Study of Autoregressive Pre-training from Videos
Paper • 2501.05453 • Published • 36 -
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training
Paper • 2501.07556 • Published • 5
Oğuzhan Ercan
oguzhanercan
AI & ML interests
Computer Vision, Generative Vision, first trajectory bender
Recent Activity
updated
a collection
2 days ago
Image Editting
updated
a collection
2 days ago
Image-Video MultiModal Understanding
updated
a collection
2 days ago
Video Generation
Organizations
None yet
Collections
18
-
VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control
Paper • 2412.20800 • Published • 10 -
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Paper • 2501.06751 • Published • 31 -
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 52 -
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Paper • 2501.09755 • Published • 27
models
None public yet