File size: 2,363 Bytes
4ce59a0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
---
datasets:
- stanfordnlp/imdb
language:
- en
metrics:
- perplexity
base_model:
- distilbert/distilbert-base-uncased
pipeline_tag: fill-mask
---
# DistilBERT Fine-Tuned on IMDB for Masked Language Modeling (Accelerate)
## Model Description
This model is a fine-tuned version of [**`distilbert-base-uncased`**](https://huggingface.co/distilbert/distilbert-base-uncased) for the masked language modeling (MLM) task. It has been trained on the IMDb dataset using the Hugging Face 🤗 Accelerate library.
---
## Model Training Details
### Training Dataset
- **Dataset:** [IMDB dataset](https://huggingface.co/datasets/imdb) from Hugging Face.
- **Dataset Splits:**
- Train: 25,000 samples
- Test: 25,000 samples
- Unsupervised: 50,000 samples
- **Training Strategy:**
- Combined the train and unsupervised splits for training, resulting in 75,000 training examples.
- Applied fixed random masking to the evaluation set to ensure consistent perplexity scores.
---
### Training Configuration
The model was trained using the following parameters:
- **Number of Training Epochs:** `10`
- **Batch Size:** `64` (per device).
- **Learning Rate:** `5e-5`
- **Weight Decay:** `0.01`
- **Evaluation Strategy:** After each epoch.
- **Early Stopping:** Enabled (Patience = `3`).
- **Metric for Best Model:** `eval_loss`
- **Direction:** Lower `eval_loss` is better (`greater_is_better = False`).
- **Learning Rate Scheduler:** Linear decay with no warmup steps.
- **Mixed Precision Training:** Enabled (FP16).
---
## Model Results
### Best Epoch Performance
- **Best Epoch:** `9`
- **Loss:** `2.0173`
- **Perplexity:** `7.5178`
### Early Stopping
- The training ran for the full `10` epochs as the evaluation loss continued to improve.
---
## Model Usage
This fine-tuned model can be used for masked language modeling tasks using the `fill-mask` pipeline from Hugging Face. Below is an example:
```python
from transformers import pipeline
mask_filler = pipeline("fill-mask", model="Prikshit7766/distilbert-finetuned-imdb-mlm-accelerate")
text = "This is a great [MASK]."
predictions = mask_filler(text)
for pred in predictions:
print(f">>> {pred['sequence']}")
```
**Example Output:**
```text
>>> This is a great movie.
>>> This is a great film.
>>> This is a great show.
>>> This is a great story.
>>> This is a great documentary.
``` |