File size: 2,363 Bytes
4ce59a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
datasets:
- stanfordnlp/imdb
language:
- en
metrics:
- perplexity
base_model:
- distilbert/distilbert-base-uncased
pipeline_tag: fill-mask
---

# DistilBERT Fine-Tuned on IMDB for Masked Language Modeling (Accelerate)

## Model Description

This model is a fine-tuned version of [**`distilbert-base-uncased`**](https://huggingface.co/distilbert/distilbert-base-uncased) for the masked language modeling (MLM) task. It has been trained on the IMDb dataset using the Hugging Face 🤗 Accelerate library.

---

## Model Training Details

### Training Dataset

- **Dataset:** [IMDB dataset](https://huggingface.co/datasets/imdb) from Hugging Face.
- **Dataset Splits:**
  - Train: 25,000 samples
  - Test: 25,000 samples
  - Unsupervised: 50,000 samples
- **Training Strategy:**
  - Combined the train and unsupervised splits for training, resulting in 75,000 training examples.
  - Applied fixed random masking to the evaluation set to ensure consistent perplexity scores.

---


### Training Configuration

The model was trained using the following parameters:

- **Number of Training Epochs:** `10`
- **Batch Size:** `64` (per device).
- **Learning Rate:** `5e-5`
- **Weight Decay:** `0.01`
- **Evaluation Strategy:** After each epoch.
- **Early Stopping:** Enabled (Patience = `3`).
- **Metric for Best Model:** `eval_loss`
  - **Direction:** Lower `eval_loss` is better (`greater_is_better = False`).
- **Learning Rate Scheduler:** Linear decay with no warmup steps.
- **Mixed Precision Training:** Enabled (FP16).

---

## Model Results

### Best Epoch Performance
- **Best Epoch:** `9`
- **Loss:** `2.0173`
- **Perplexity:** `7.5178`

### Early Stopping
- The training ran for the full `10` epochs as the evaluation loss continued to improve.

---

## Model Usage

This fine-tuned model can be used for masked language modeling tasks using the `fill-mask` pipeline from Hugging Face. Below is an example:

```python
from transformers import pipeline

mask_filler = pipeline("fill-mask", model="Prikshit7766/distilbert-finetuned-imdb-mlm-accelerate")

text = "This is a great [MASK]."
predictions = mask_filler(text)

for pred in predictions:
    print(f">>> {pred['sequence']}")
```

**Example Output:**

```text
>>> This is a great movie.
>>> This is a great film.
>>> This is a great show.
>>> This is a great story.
>>> This is a great documentary.
```