rulins
/

blip2-t5-llava

Image-Text-to-Text

Model card Files Files and versions Community

blip2-t5-llava / README.md

rulins's picture

Specify right model card metadata (#1)

85669ef verified 9 months ago

|

history blame contribute delete

611 Bytes

	---
	license: apache-2.0
	tags:
	- llava
	pipeline_tag: image-text-to-text
	---
	Base Model: BLIP2-t5 pretrained version

	Finetune data:
	* LLAVA 150k (sample one pair of instruction-answer if multi-round conversations)
	* MiniGPT4 3500 pairs

	Hyper-parameters:

	* BLIP2-flant5-xl + LLAVA (initial commits)
	* v0:
	* lr = 2e-5 --> 0.0 with cosine lr scheduler
	* gbs = 32
	* image size = 480
	* weight decay = 0.05

	* v1 (same as LLAVA):
	* lr = 2e-5
	* gbs = 32
	* image size = 224
	* weight decay = 0.0

	* Others
	* lr = 2e-5
	* gbs = 32
	* image size = 224
	* weight decay = 0.0