image-captioning-output

This model is a fine-tuned version of nlpconnect/vit-gpt2-image-captioning on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5164
  • Rouge1: 35.5267
  • Rouge2: 12.254
  • Rougel: 32.968
  • Rougelsum: 32.9723
  • Gen Len: 12.395

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.5193 0.25 500 0.5171 33.0319 10.364 30.6939 30.6888 12.1
0.4842 0.5 1000 0.5102 33.7318 10.8199 31.1842 31.18 11.3
0.4724 0.75 1500 0.5028 34.6981 11.4074 31.9128 31.9158 12.02
0.4632 1.0 2000 0.5012 35.9443 12.8742 33.4061 33.377 11.04
0.377 1.25 2500 0.5026 35.7745 12.2309 33.3234 33.3353 11.735
0.3819 1.5 3000 0.5018 36.0145 13.0296 33.5985 33.6182 12.285
0.3788 1.75 3500 0.5030 35.9016 12.5276 33.4995 33.5033 11.305
0.3654 2.0 4000 0.5020 36.2476 12.945 33.6453 33.6595 11.9
0.3102 2.25 4500 0.5146 36.1507 13.0072 33.3889 33.3786 12.305
0.3137 2.5 5000 0.5166 35.7413 12.5693 33.2646 33.2508 12.71
0.3111 2.75 5500 0.5171 35.5658 12.511 33.0581 33.0518 12.55
0.3023 3.0 6000 0.5164 35.5267 12.254 32.968 32.9723 12.395

Framework versions

  • Transformers 4.40.0
  • Pytorch 2.2.1+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
239M params
Tensor type
F32
·
Inference API
Inference API (serverless) does not yet support transformers models for this pipeline type.

Model tree for NourFakih/image-captioning-output

Finetuned
(10)
this model