Weiyun1025
commited on
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -16,24 +16,20 @@ tags:
|
|
16 |
---
|
17 |
# InternVL2-8B-MPO
|
18 |
|
19 |
-
[\[๐ GitHub\]](https://github.com/OpenGVLab/InternVL) [\[๐ Blog\]](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/) [\[๐ Paper\]](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/) [\[๐ Documents\]](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html)
|
20 |
|
21 |
[ๅๆข่ณไธญๆ็](#็ฎไป)
|
22 |
|
23 |
-
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/619507e7b74b6c591f794340/
|
24 |
|
25 |
## Introduction
|
26 |
|
27 |
Existing open-source multimodal large language models (MLLMs) generally follow a training process involving pre-training and supervised fine-tuning. However, these models suffer from distribution shifts, which limit their multimodal reasoning, particularly in the Chain-of-Thought (CoT) performance.
|
28 |
|
29 |
-
To address this, we introduce a preference optimization (PO) process to enhance the multimodal reasoning capabilities of MLLMs.
|
30 |
-
|
31 |
-
|
32 |
-
and (2) on the model side, we explore integrating PO with MLLMs, developing a simple yet effective method, termed Mixed Preference Optimization (MPO), that boosts multimodal CoT performance.
|
33 |
|
34 |
-
Our approach demonstrates improved performance across multiple benchmarks, particularly in multimodal reasoning tasks.
|
35 |
-
Notably, our model, [InternVL2-8B-MPO](https://huggingface.co/OpenGVLab/InternVL2-8B), achieves an accuracy of 67.0 on MathVista, outperforming InternVL2-8B by 8.7 points and achieving performance comparable to the 10$\times$ larger InternVL2-76B.
|
36 |
-
We hope this study could inspire further advancements in MLLMs.
|
37 |
|
38 |
## Model Details
|
39 |
|
|
|
16 |
---
|
17 |
# InternVL2-8B-MPO
|
18 |
|
19 |
+
[\[๐ GitHub\]](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/internvl2.0_mpo) [\[๐ Blog\]](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/) [\[๐ Paper\]](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/) [\[๐ Documents\]](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html)
|
20 |
|
21 |
[ๅๆข่ณไธญๆ็](#็ฎไป)
|
22 |
|
23 |
+
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/619507e7b74b6c591f794340/sy8aVC1Y5wtAjG-OQzrDI.jpeg)
|
24 |
|
25 |
## Introduction
|
26 |
|
27 |
Existing open-source multimodal large language models (MLLMs) generally follow a training process involving pre-training and supervised fine-tuning. However, these models suffer from distribution shifts, which limit their multimodal reasoning, particularly in the Chain-of-Thought (CoT) performance.
|
28 |
|
29 |
+
To address this, we introduce a preference optimization (PO) process to enhance the multimodal reasoning capabilities of MLLMs. Specifically, (1) on the data side, we design an automated preference data construction pipeline to create [MMPR](https://huggingface.co/datasets/OpenGVLab/MMPR), a high-quality, large-scale multimodal reasoning preference dataset. and (2) on the model side, we explore integrating PO with MLLMs, developing a simple yet effective method, termed Mixed Preference Optimization (MPO), which boosts multimodal CoT performance.
|
30 |
+
|
31 |
+
Our approach demonstrates improved performance across multiple benchmarks, particularly in multimodal reasoning tasks. Notably, our model, [InternVL2-8B-MPO](https://huggingface.co/OpenGVLab/InternVL2-8B), achieves an accuracy of 67.0 on MathVista, outperforming InternVL2-8B by 8.7 points and achieving performance comparable to the 10$\times$ larger InternVL2-76B. We hope this study could inspire further advancements in MLLMs.
|
|
|
32 |
|
|
|
|
|
|
|
33 |
|
34 |
## Model Details
|
35 |
|