Model Card for Snoopy 1.0

This model aims to detect visual manipulation in bar charts.

Model Details

Model Description

  • Developed by: Arif Syraj
  • Model type: Multi-Modal LLM
  • Finetuned from model: llava-1.6-mistral-7b

How to Get Started with the Model

This is not a HuggingFace-based model, please refer to this Colab notebook to run inference. Only works on GPU.

Training Details

Finetuned with LoRA for 1 epoch on ~2700 images of misleading and non misleading bar charts

Training Procedure

learning_rate = 1e-5 bf16 = True num_train_epochs = 1 optim = "adamw_torch" per_device_train_batch_size = 3 gradient_accumulation_steps = 16 gradient_checkpointing = True

LoRA config: rank = 32, lora_alpha = 32, Using rank stabilized lora target_modules=[q_proj, out_proj, v_proj, k_proj, down_proj, up_proj, o_proj, gate_proj] lora_dropout=0.05, bias="none"

Training Hyperparameters

  • Training regime: bf16 non-mixed precision

Citation

BibTeX:

  • Liu, Haotian, Li, Chunyuan, Li, Yuheng, Li, Bo, Zhang, Yuanhan, Shen, Sheng, & Lee, Yong Jae. (2024, January). LLaVA-NeXT: Improved reasoning, OCR, and world knowledge. Retrieved from https://llava-vl.github.io/blog/2024-01-30-llava-next/.

  • Liu, Haotian, Li, Chunyuan, Li, Yuheng, & Lee, Yong Jae. (2023). Improved Baselines with Visual Instruction Tuning. arXiv:2310.03744.

  • Liu, Haotian, Li, Chunyuan, Wu, Qingyang, & Lee, Yong Jae. (2023). Visual Instruction Tuning. NeurIPS.

Downloads last month
0
Safetensors
Model size
7.57B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train chart-misinformation-detection/llava-1.6-mistral-7b-snoopy-1.0-post-finetune-full-folder