|
--- |
|
license: apache-2.0 |
|
base_model: google/long-t5-tglobal-xl |
|
tags: |
|
- generated_from_trainer |
|
datasets: |
|
- learn3r/summ_screen_fd_bp |
|
metrics: |
|
- rouge |
|
model-index: |
|
- name: longt5_xl_sfd_bp_20 |
|
results: |
|
- task: |
|
name: Summarization |
|
type: summarization |
|
dataset: |
|
name: learn3r/summ_screen_fd_bp |
|
type: learn3r/summ_screen_fd_bp |
|
metrics: |
|
- name: Rouge1 |
|
type: rouge |
|
value: 22.11 |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# longt5_xl_sfd_bp_20 |
|
|
|
This model is a fine-tuned version of [google/long-t5-tglobal-xl](https://huggingface.co/google/long-t5-tglobal-xl) on the learn3r/summ_screen_fd_bp dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 1.5032 |
|
- Rouge1: 22.11 |
|
- Rouge2: 7.544 |
|
- Rougel: 19.7035 |
|
- Rougelsum: 20.2813 |
|
- Gen Len: 497.8783 |
|
|
|
## Model description |
|
|
|
More information needed |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.001 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 32 |
|
- total_train_batch_size: 256 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: constant |
|
- num_epochs: 20.0 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len | |
|
|:-------------:|:-----:|:----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:--------:| |
|
| 2.3973 | 0.97 | 14 | 1.9074 | 10.6164 | 2.4585 | 10.4856 | 9.8193 | 511.0 | |
|
| 1.9188 | 1.95 | 28 | 1.7082 | 17.4258 | 4.2128 | 16.5213 | 15.8377 | 511.0 | |
|
| 1.4297 | 2.99 | 43 | 1.5073 | 18.6504 | 5.4242 | 17.2648 | 17.0203 | 506.7745 | |
|
| 1.2759 | 3.97 | 57 | 1.5032 | 22.11 | 7.544 | 19.7035 | 20.2813 | 497.8783 | |
|
| 1.1421 | 4.94 | 71 | 1.5462 | 20.6049 | 6.7146 | 18.5084 | 19.0876 | 503.6024 | |
|
| 0.9605 | 5.98 | 86 | 1.6233 | 22.6777 | 7.9362 | 18.7936 | 21.41 | 510.2730 | |
|
| 0.8082 | 6.96 | 100 | 1.7575 | 26.5338 | 9.9474 | 20.3789 | 25.0767 | 511.0 | |
|
| 0.664 | 8.0 | 115 | 1.7702 | 35.1918 | 13.7223 | 26.1763 | 33.3997 | 329.7151 | |
|
| 0.5471 | 8.97 | 129 | 1.9383 | 27.0414 | 10.4166 | 20.1803 | 25.6283 | 506.8279 | |
|
| 0.4349 | 9.95 | 143 | 1.9608 | 29.5613 | 11.7633 | 22.7176 | 27.9563 | 454.7033 | |
|
| 0.4338 | 10.99 | 158 | 2.1197 | 31.2004 | 12.8569 | 22.1282 | 29.8827 | 493.3234 | |
|
| 0.2887 | 11.97 | 172 | 2.1205 | 34.9566 | 13.8574 | 25.1764 | 33.2914 | 381.3591 | |
|
| 0.2753 | 12.94 | 186 | 2.4299 | 36.3877 | 13.8584 | 25.7829 | 34.8601 | 338.7240 | |
|
| 0.2114 | 13.98 | 201 | 2.5799 | 39.7535 | 16.1209 | 27.8512 | 37.8553 | 302.4837 | |
|
| 0.1805 | 14.96 | 215 | 2.6123 | 33.3254 | 13.0868 | 23.3214 | 31.7901 | 442.9258 | |
|
| 0.1543 | 16.0 | 230 | 2.5635 | 31.7816 | 13.1085 | 22.9117 | 30.2286 | 463.0801 | |
|
| 0.5166 | 16.97 | 244 | 2.5134 | 30.3969 | 12.1295 | 21.6616 | 28.7606 | 511.0 | |
|
| 0.1117 | 17.95 | 258 | 2.8109 | 35.336 | 14.9492 | 24.1938 | 33.822 | 431.1157 | |
|
| 0.0895 | 18.99 | 273 | 2.7577 | 41.0982 | 16.3935 | 28.1073 | 39.1641 | 240.1365 | |
|
| 0.0779 | 19.48 | 280 | 2.8927 | 32.7788 | 13.9352 | 22.5175 | 31.548 | 488.5134 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.34.1 |
|
- Pytorch 2.1.0+cu121 |
|
- Datasets 2.14.5 |
|
- Tokenizers 0.14.1 |
|
|