matlok
's Collections
Papers - Image - Understanding
updated
Veagle: Advancements in Multimodal Representation Learning
Paper
•
2403.08773
•
Published
•
7
mPLUG-Owl: Modularization Empowers Large Language Models with
Multimodality
Paper
•
2304.14178
•
Published
•
3
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Paper
•
2403.12596
•
Published
•
9
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Paper
•
2403.11703
•
Published
•
16
GQA: A New Dataset for Real-World Visual Reasoning and Compositional
Question Answering
Paper
•
1902.09506
•
Published
•
2
MyVLM: Personalizing VLMs for User-Specific Queries
Paper
•
2403.14599
•
Published
•
15
Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling
Paper
•
2403.14551
•
Published
•
2
Prompt me a Dataset: An investigation of text-image prompting for
historical image dataset creation using foundation models
Paper
•
2309.01674
•
Published
•
2
Ferret-v2: An Improved Baseline for Referring and Grounding with Large
Language Models
Paper
•
2404.07973
•
Published
•
30
RegionGPT: Towards Region Understanding Vision Language Model
Paper
•
2403.02330
•
Published
•
2
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper
•
2404.12803
•
Published
•
29
Pegasus-v1 Technical Report
Paper
•
2404.14687
•
Published
•
30