license: apache-2.0 | |
tags: | |
- llava | |
pipeline_tag: image-text-to-text | |
**Base Model**: BLIP2-t5 pretrained version | |
**Finetune data**: | |
* LLAVA 150k (sample one pair of instruction-answer if multi-round conversations) | |
* MiniGPT4 3500 pairs | |
**Hyper-parameters**: | |
* BLIP2-flant5-xl + LLAVA (initial commits) | |
* **v0**: | |
* lr = 2e-5 --> 0.0 with cosine lr scheduler | |
* gbs = 32 | |
* image size = 480 | |
* weight decay = 0.05 | |
* **v1 (same as LLAVA)**: | |
* lr = 2e-5 | |
* gbs = 32 | |
* image size = 224 | |
* weight decay = 0.0 | |
* Others | |
* lr = 2e-5 | |
* gbs = 32 | |
* image size = 224 | |
* weight decay = 0.0 | |