blip2-t5-llava / README.md
rulins's picture
Specify right model card metadata (#1)
85669ef verified
---
license: apache-2.0
tags:
- llava
pipeline_tag: image-text-to-text
---
**Base Model**: BLIP2-t5 pretrained version
**Finetune data**:
* LLAVA 150k (sample one pair of instruction-answer if multi-round conversations)
* MiniGPT4 3500 pairs
**Hyper-parameters**:
* BLIP2-flant5-xl + LLAVA (initial commits)
* **v0**:
* lr = 2e-5 --> 0.0 with cosine lr scheduler
* gbs = 32
* image size = 480
* weight decay = 0.05
* **v1 (same as LLAVA)**:
* lr = 2e-5
* gbs = 32
* image size = 224
* weight decay = 0.0
* Others
* lr = 2e-5
* gbs = 32
* image size = 224
* weight decay = 0.0