T5 model for summarization far from SOTA results

cvillela · July 2, 2021, 8:26pm

Hello!

I am having a hard time achieving State of the Art results after fine-tuning T5-base for text summarization.

I am trying the full implementation with the Transformers library (meaning, using the Seq2SeqTrainer() class). I am also using the XSum Dataset for fine-tuning the model.

I have preprocessed the dataset according to the documentation:

"summarize: " prefix, and " " token at the end of the label texts
max_input_length == 512

I am also only using the attention_mask provided by the tokenization of the input sequence.

As to my training arguments:

I have tried using both AdamW and Adafactor, with a learning rate of 3e-4 and weight_decay of 5e-5.
my batch_size is currently 4, with gradient_accumulation_steps = 64, eval_accumulation_steps = 64
I am using predict_with_generate = True, for obvious reasons
For what I’ve read, FP16 had performance problems on this model, and so fp16=False.

Anyhow, I am only scoring aproximately 28 at the ROUGE-1 Score, where as the model reached ~43 ROUGE1 on the paper, even after I purposedly used 32 training examples in 100 epochs of training to force overfitting.

The Summarization example notebook by Huggingface also reaches only about 28 at the ROUGE1 Score.

Are there any tips for the fine tune? Should I implement my own trainer with PyTorch Lightning?
I have already checked the discussion at T5 Finetuning Tips, but the results I’m getting aren’t improving at all.

Thanks a lot!

Topic		Replies	Views
Finetuning T5 for Summarisation - Poor results Intermediate	1	407	April 28, 2024
Use Pretrained T5 for Summarization Beginners	3	606	July 2, 2021
T5 outperforms BART when fine-tuned for summarization task Intermediate	3	3842	August 8, 2022
Run_summarization.py t5 model output inconsistent results Models	0	231	September 22, 2023
Training the t5 Beginners	4	1189	August 16, 2022

T5 model for summarization far from SOTA results

Related topics