kaist-ai
/

volcano-7b

text-generation

visual-question-answering

image-captioning

Inference Endpoints

Model card Files Files and versions Community

Seongyun commited on Nov 13, 2023

Commit

15b9e6a

·

1 Parent(s): 15ffc3a

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ Volcano employs a single LMM to generate initial responses, feedback, and revisi
 # Model details
 **Model type:**
-Volcano is a multimodal self-feedback guided revision model that was fine-tuned by mixing the visual instruction tuning dataset used in LLaVA-1.5 with multimodal feedback and revision data collected through gpt-3.5-turbo, applied to the vicuna model.
 **Model date:**
 Volcano-7b was trained in October 2023.
@@ -13,7 +13,7 @@ Volcano-7b was trained in October 2023.
 **Paper or resources for more information:**
 # Training dataset
-- 274k multimodal feedback and revision data
 - 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
 - 158K GPT-generated multimodal instruction-following data.
 - 450K academic-task-oriented VQA data mixture.

 # Model details
 **Model type:**
+Volcano-7b is a multimodal self-feedback guided revision model that was fine-tuned by mixing the visual instruction tuning dataset used in [LLaVA-v1.5](https://llava-vl.github.io/) with multimodal feedback and revision data collected through gpt-3.5-turbo, applied to the [vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) model.
 **Model date:**
 Volcano-7b was trained in October 2023.
 **Paper or resources for more information:**
 # Training dataset
+- 274K multimodal feedback and revision data
 - 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
 - 158K GPT-generated multimodal instruction-following data.
 - 450K academic-task-oriented VQA data mixture.