DistilBERT with a second step of distillation
Model description
This model replicates the "DistilBERT (D)" model from Table 2 of the DistilBERT paper. In this approach, a DistilBERT student is fine-tuned on SQuAD v1.1, but with a BERT model (also fine-tuned on SQuAD v1.1) acting as a teacher for a second step of task-specific distillation.
In this version, the following pre-trained models were used:
- Student:
distilbert-base-uncased
- Teacher:
lewtun/bert-base-uncased-finetuned-squad-v1
Training data
This model was trained on the SQuAD v1.1 dataset which can be obtained from the datasets
library as follows:
from datasets import load_dataset
squad = load_dataset('squad')
Training procedure
Eval results
Exact Match | F1 | |
---|---|---|
DistilBERT paper | 79.1 | 86.9 |
Ours | 78.4 | 86.5 |
The scores were calculated using the squad
metric from datasets
.
BibTeX entry and citation info
@misc{sanh2020distilbert,
title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf},
year={2020},
eprint={1910.01108},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 105
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.