12 13 40

Flo Schneider

floschne

https://www.inf.uni-hamburg.de/en/inst/ab/lt/people/florian-schneider.html

floschne

AI & ML interests

Multi Modal Information Retrieval and Representation Learning

Recent Activity

updated a dataset 2 days ago

floschne/gimmick-vvqa

liked a model 3 days ago

facebook/dinov2-with-registers-giant

upvoted a paper 16 days ago

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

View all activity

Organizations

floschne's activity

upvoted 3 papers 16 days ago

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Paper • 2412.13871 • Published 19 days ago • 17

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 17 days ago • 335

Progressive Multimodal Reasoning via Active Retrieval

Paper • 2412.14835 • Published 18 days ago • 69

upvoted a paper 3 months ago

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8, 2024 • 107

upvoted a collection 4 months ago

LLaVA-Onevision

Collection

LLaVa_Onevision models for single-image, multi-image, and video scenarios • 9 items • Updated Sep 18, 2024 • 12

upvoted an article 4 months ago

Article

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Aug 22, 2023

• 28

upvoted a paper 4 months ago

M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks

Paper • 2407.03791 • Published Jul 4, 2024 • 1

upvoted a paper 5 months ago

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 59

upvoted 2 papers 7 months ago

What If We Recaption Billions of Web Images with LLaMA-3?

Paper • 2406.08478 • Published Jun 12, 2024 • 39

Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 31

upvoted 2 papers 9 months ago

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 125

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27, 2024 • 45