BERT-base uncased model fine-tuned on SQuAD v1

This model is block-sparse.

That means that with the right runtime it can run roughly 3x faster than an dense network, with 25% of the original weights.

This of course has some impact on the accuracy (see below).

It uses a modified version of Victor Sanh Movement Pruning method.

This model was fine-tuned from the HuggingFace BERT base uncased checkpoint on SQuAD1.1. This model is case-insensitive: it does not make a difference between english and English.

Details

Dataset Split # samples
SQuAD1.1 train 90.6K
SQuAD1.1 eval 11.1k

Fine-tuning

  • Python: 3.8.5

  • Machine specs:

    CPU: Intel(R) Core(TM) i7-6700K CPU

    Memory: 64 GiB

    GPUs: 1 GeForce GTX 3090, with 24GiB memory

    GPU driver: 455.23.05, CUDA: 11.1

Results

Model size: 418M

Metric # Value # Original (Table 2)
EM 74.82 80.8
F1 83.7 88.5

Note that the above results didn't involve any hyperparameter search.

Example Usage

from transformers import pipeline

qa_pipeline = pipeline(
    "question-answering",
    model="madlag/bert-base-uncased-squad-v1-sparse0.25",
    tokenizer="madlag/bert-base-uncased-squad-v1-sparse0.25"
)

predictions = qa_pipeline({
    'context': "Frédéric François Chopin, born Fryderyk Franciszek Chopin (1 March 1810 – 17 October 1849), was a Polish composer and virtuoso pianist of the Romantic era who wrote primarily for solo piano.",
    'question': "Who is Frederic Chopin?",
})

print(predictions)
Downloads last month
18
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train madlag/bert-base-uncased-squad-v1-sparse0.25