Model Card for Model ID
The model was fine-tuned on the CNN/DailyMail dataset, which consists of news articles paired with human-written summaries.
Model Details
Model Description
The model was fine-tuned on the CNN/DailyMail dataset, which consists of news articles paired with human-written summaries. The training process involved:
- Loading the pre-trained FLAN-T5 model
- Preprocessing the CNN/DailyMail dataset
- Fine-tuning the model using the Seq2SeqTrainer from Hugging Face's Transformers library
- Training parameters:
- Learning rate: 5e-5
- Batch size: 12
- Number of epochs: 4
- FP16 mixed precision
- Developed by: Preksha Joon
- Model type: [More Information Needed]
- Language(s) (NLP): English
- License: MIT
- Finetuned from model [optional]: FLAN-T5
Model Sources [optional]
- Repository: https://colab.research.google.com/drive/1utAHMxm1CSJIFUPZ9X4aXuIVZl3M3o6C?usp=sharing
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Uses
Here's an example of how to use the model for inference:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("PreshaJoon/flan-t5-finetuned-summarization")
tokenizer = AutoTokenizer.from_pretrained("PrekshaJoon/flan-t5-finetuned-summarization")
def generate_summary(article):
inputs = tokenizer("summarize: " + article, return_tensors="pt", max_length=512, truncation=True)
summary_ids = model.generate(inputs["input_ids"], max_length=128, num_beams=4, early_stopping=True)
summary = tokenizer.decode(summary_ids, skip_special_tokens=True)
return summary
## Deploy and use the model
from transformers import pipeline
summarizer = pipeline("summarization", model="PrekshaJoon/flan-t5-finetuned-summarization")
article = "Write your article here..."
summary = summarizer(article, max_length=128, min_length=30, length_penalty=2.0, num_beams=4, early_stopping=True)
print(summary[0]['summary_text'])
### Direct Use
article = "Your long article text here..."
summary = generate_summary(article)
print(summary)
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
## How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
[More Information Needed]
### Training Procedure
The training process involved:
1. Loading the pre-trained FLAN-T5 model
2. Preprocessing the CNN/DailyMail dataset
3. Fine-tuning the model using the Seq2SeqTrainer from Hugging Face's
#### Preprocessing
Preprocess the dataset by tokenizing it and preparing it for the FLAN-T5 model.
#### Training Hyperparameters
- **Training regime:** fp16 mixed precision
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Dataset Card if possible. -->
[More Information Needed]
#### Metrics
Use of rogue-score matric for evaluation
### EvaluationResults
The model was evaluated using ROUGE scores. Here are the results on the validation set:
rouge1: 0.3913
rouge2: 0.2889
rougeL: 0.3696
rougeLsum: 0.3696
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** A100 GPU
- **Hours used:** 7
- **Cloud Provider:** Google
- **Compute Region:** [More Information Needed]
## Technical Specifications [optional]
### Model Architecture and Objective
[More Information Needed]
### Compute Infrastructure
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
[More Information Needed]
## Model Card Contact
[email protected]
- Downloads last month
- 43
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for PrekshaJoon/flan-t5-finetuned-summarization
Base model
google/flan-t5-small