File size: 1,122 Bytes
1d5e691 51615a0 15be350 b19e500 51615a0 1d5e691 89f3303 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
---
tags:
- distilbert
- sparsity
- pruning
- compression
language: en
datasets: sst2
---
Repo includes all necessary files for staging an Inference Endpoints API with [DeepSparse](https://github.com/neuralmagic/deepsparse) as discussed in this [BLOG](https://neuralmagic.com/blog/accelerate-hugging-face-inference-endpoints-with-deepsparse/).
This DistilBERT was sparsified using the [SparseML](https://github.com/neuralmagic/sparseml) library.
# Sparse Transfer 80% VNNI Pruned DistilBERT
This model is the result of pruning the DistilBERT model to 80% using the VNNI blocking (semi-structured), followed by fine-tuning and quantization on the SST2 dataset. Pruning is performed with the GMP algorithm and using the masked language modeling task based on the BookCorpus and Wikipedia datasets. It achieves 90.5% accuracy on the validation dataset, recovering over 99% of the accuracy of the baseline model. See the included [recipe](https://sparsezoo.neuralmagic.com/models/distilbert-sst2_wikipedia_bookcorpus-pruned80.4block_quantized?comparison=distilbert-sst2_wikipedia_bookcorpus-base) for training instructions. |