Update README.md
Browse files
README.md
CHANGED
@@ -35,7 +35,7 @@ For better training reproducibility, we follow the minimalist design and data ef
|
|
35 |
|
36 |
Inspired by LLaVA-NeXT, we adopted a data-efficient SFT strategy to train InternVL-Chat-V1.2, utilizing approximately 1.2M of visual instruction tuning samples in total, all of which are fully open-source. In a macro sense, we build upon [ShareGPT-4V](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md#prepare-images) and additionally integrate [LLaVA-ZH](https://huggingface.co/datasets/openbmb/llava_zh), [DVQA](https://github.com/kushalkafle/DVQA_dataset), [ChartQA](https://github.com/vis-nlp/ChartQA), [AI2D](https://allenai.org/data/diagrams), [DocVQA](https://www.docvqa.org/datasets), [GeoQA+](https://github.com/SCNU203/GeoQA-Plus), and [SynthDoG-EN](https://huggingface.co/datasets/naver-clova-ix/synthdog-en). Most of the data remains consistent with LLaVA-NeXT.
|
37 |
|
38 |
-
For more details about data preparation, please see [here](
|
39 |
|
40 |
### Performance
|
41 |
|
@@ -57,9 +57,9 @@ For more details about data preparation, please see [here](./internvl_chat#prepa
|
|
57 |
|
58 |
### Training (SFT)
|
59 |
|
60 |
-
We provide [slurm scripts](
|
61 |
|
62 |
-
For more details about training, please see [here](
|
63 |
|
64 |
The hyperparameters used for finetuning are listed in the following table.
|
65 |
|
|
|
35 |
|
36 |
Inspired by LLaVA-NeXT, we adopted a data-efficient SFT strategy to train InternVL-Chat-V1.2, utilizing approximately 1.2M of visual instruction tuning samples in total, all of which are fully open-source. In a macro sense, we build upon [ShareGPT-4V](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md#prepare-images) and additionally integrate [LLaVA-ZH](https://huggingface.co/datasets/openbmb/llava_zh), [DVQA](https://github.com/kushalkafle/DVQA_dataset), [ChartQA](https://github.com/vis-nlp/ChartQA), [AI2D](https://allenai.org/data/diagrams), [DocVQA](https://www.docvqa.org/datasets), [GeoQA+](https://github.com/SCNU203/GeoQA-Plus), and [SynthDoG-EN](https://huggingface.co/datasets/naver-clova-ix/synthdog-en). Most of the data remains consistent with LLaVA-NeXT.
|
37 |
|
38 |
+
For more details about data preparation, please see [here](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat#prepare-training-datasets).
|
39 |
|
40 |
### Performance
|
41 |
|
|
|
57 |
|
58 |
### Training (SFT)
|
59 |
|
60 |
+
We provide [slurm scripts](https://github.com/OpenGVLab/InternVL/tree/main//internvl_chat/shell/hermes2_yi34b/internvl_chat_v1_2_hermes2_yi34b_448_finetune.sh) for multi-node multi-GPU training. You can use either 32 or 64 GPUs to train this model. If you use 64 GPUs, training will take approximately 18 hours.
|
61 |
|
62 |
+
For more details about training, please see [here](https://github.com/OpenGVLab/InternVL/tree/main//internvl_chat#start-training).
|
63 |
|
64 |
The hyperparameters used for finetuning are listed in the following table.
|
65 |
|