view article Article Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions By mikelabs • Nov 19, 2024 • 3
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions Paper • 2411.09018 • Published Nov 13, 2024
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content Paper • 2410.10783 • Published Oct 14, 2024 • 26
ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation Paper • 2403.01306 • Published Mar 2, 2024
MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations Paper • 2312.03631 • Published Dec 6, 2023