General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3, 2024 • 83
Xwin-LM: Strong and Scalable Alignment Practice for LLMs Paper • 2405.20335 • Published May 30, 2024 • 18
OneChart: Purify the Chart Structural Extraction via One Auxiliary Token Paper • 2404.09987 • Published Apr 15, 2024 • 2
Common 7B Language Models Already Possess Strong Math Capabilities Paper • 2403.04706 • Published Mar 7, 2024 • 17
Small Language Model Meets with Reinforced Vision Vocabulary Paper • 2401.12503 • Published Jan 23, 2024 • 32
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning Paper • 2307.09474 • Published Jul 18, 2023 • 1
DreamLLM: Synergistic Multimodal Comprehension and Creation Paper • 2309.11499 • Published Sep 20, 2023 • 58
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models Paper • 2312.06109 • Published Dec 11, 2023 • 21
Merlin:Empowering Multimodal LLMs with Foresight Minds Paper • 2312.00589 • Published Nov 30, 2023 • 25