|
--- |
|
tags: |
|
- generated_from_trainer |
|
metrics: |
|
- rouge |
|
model-index: |
|
- name: beto2beto_tied_ |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# beto2beto_tied_ |
|
|
|
This model is a fine-tuned version of [dccuchile/bert-base-spanish-wwm-cased](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased) on an unknown dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.2598 |
|
- Rouge1: 89.2337 |
|
- Rouge2: 82.4747 |
|
- Rougel: 87.8665 |
|
- Rougelsum: 88.2241 |
|
- Gen Len: 44.8081 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 5e-05 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- num_epochs: 20.0 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len | |
|
|:-------------:|:-----:|:-----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:| |
|
| 4.1482 | 1.0 | 970 | 2.9847 | 43.2986 | 18.5008 | 31.2325 | 32.3987 | 39.1515 | |
|
| 2.0469 | 2.0 | 1940 | 0.9035 | 76.414 | 63.7537 | 73.4939 | 73.7343 | 43.6667 | |
|
| 0.5005 | 3.0 | 2910 | 0.4559 | 84.5357 | 75.751 | 82.546 | 82.9252 | 45.0202 | |
|
| 0.2288 | 4.0 | 3880 | 0.3991 | 86.1217 | 78.448 | 84.6786 | 85.0375 | 44.6465 | |
|
| 0.1614 | 5.0 | 4850 | 0.3616 | 87.1616 | 80.0997 | 85.8677 | 86.2772 | 43.8990 | |
|
| 0.1203 | 6.0 | 5820 | 0.3152 | 87.6801 | 80.9719 | 85.8049 | 86.2797 | 44.2828 | |
|
| 0.1015 | 7.0 | 6790 | 0.2730 | 89.0506 | 82.7508 | 87.8147 | 88.1609 | 44.3232 | |
|
| 0.0845 | 8.0 | 7760 | 0.3020 | 88.0917 | 81.1925 | 86.8684 | 87.1056 | 45.0606 | |
|
| 0.0735 | 9.0 | 8730 | 0.2817 | 88.9092 | 82.7949 | 87.8403 | 88.0972 | 44.2525 | |
|
| 0.0639 | 10.0 | 9700 | 0.2741 | 88.9576 | 83.3882 | 88.0209 | 88.1885 | 44.2424 | |
|
| 0.0575 | 11.0 | 10670 | 0.2676 | 88.3211 | 81.5339 | 86.8743 | 87.2136 | 44.8889 | |
|
| 0.051 | 12.0 | 11640 | 0.2653 | 88.3985 | 81.8103 | 87.1116 | 87.4471 | 44.7879 | |
|
| 0.0443 | 13.0 | 12610 | 0.2802 | 88.5347 | 82.0431 | 87.0133 | 87.3694 | 45.2323 | |
|
| 0.0403 | 14.0 | 13580 | 0.2918 | 88.5383 | 82.0573 | 87.3892 | 87.6552 | 43.9495 | |
|
| 0.0361 | 15.0 | 14550 | 0.2715 | 89.0545 | 82.4233 | 87.6459 | 87.9061 | 45.2626 | |
|
| 0.0323 | 16.0 | 15520 | 0.2829 | 88.6251 | 82.3048 | 87.531 | 87.8254 | 44.5859 | |
|
| 0.0273 | 17.0 | 16490 | 0.2689 | 89.1621 | 82.8361 | 87.6537 | 87.9574 | 44.8687 | |
|
| 0.0254 | 18.0 | 17460 | 0.2611 | 88.9807 | 82.198 | 87.6945 | 87.9456 | 45.2929 | |
|
| 0.0229 | 19.0 | 18430 | 0.2701 | 89.432 | 82.6393 | 88.0695 | 88.4232 | 44.6869 | |
|
| 0.0213 | 20.0 | 19400 | 0.2598 | 89.2337 | 82.4747 | 87.8665 | 88.2241 | 44.8081 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.28.0.dev0 |
|
- Pytorch 1.13.1+cu117 |
|
- Datasets 2.9.0 |
|
- Tokenizers 0.13.2 |
|
|