mt5-large-gramatika161k-b16-lr0.001

This model is a fine-tuned version of google/mt5-large on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1429
  • Rouge1: 71.0622
  • Rouge2: 65.0219
  • Rougel: 70.921
  • Rougelsum: 70.9407
  • Gen Len: 18.3295

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adafactor
  • lr_scheduler_type: linear
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.3954 0.63 5000 0.1851 69.5715 62.3503 69.3784 69.3899 18.3461
0.1746 1.27 10000 0.1537 70.6244 64.1779 70.4518 70.4717 18.3410
0.123 1.9 15000 0.1429 71.0622 65.0219 70.921 70.9407 18.3295
0.0758 2.54 20000 0.1468 71.5151 65.7486 71.3742 71.3959 18.3246
0.0568 3.17 25000 0.1603 71.6869 66.1031 71.5594 71.5794 18.3302
0.0327 3.81 30000 0.1556 71.9011 66.4738 71.7817 71.8013 18.3311
0.0196 4.44 35000 0.1782 72.0041 66.6645 71.886 71.9038 18.3293

Framework versions

  • Transformers 4.30.1
  • Pytorch 1.11.0a0+b6df043
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.