YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

BERT-tiny model finetuned with M-FAC

This model is finetuned on SQuAD version 2 dataset with state-of-the-art second-order optimizer M-FAC. Check NeurIPS 2021 paper for more details on M-FAC: https://arxiv.org/pdf/2107.03356.pdf.

Finetuning setup

For fair comparison against default Adam baseline, we finetune the model in the same framework as described here https://github.com/huggingface/transformers/tree/master/examples/pytorch/question-answering and just swap Adam optimizer with M-FAC. Hyperparameters used by M-FAC optimizer:

learning rate = 1e-4
number of gradients = 1024
dampening = 1e-6

Results

We share the best model out of 5 runs with the following score on SQuAD version 2 validation set:

exact_match = 50.29
f1 = 52.43

Mean and standard deviation for 5 runs on SQuAD version 2 validation set:

Exact Match F1
Adam 48.41 卤 0.57 49.99 卤 0.54
M-FAC 49.80 卤 0.43 52.18 卤 0.20

Results can be reproduced by adding M-FAC optimizer code in https://github.com/huggingface/transformers/blob/master/examples/pytorch/question-answering/run_qa.py and running the following bash script:

CUDA_VISIBLE_DEVICES=0 python run_qa.py \
    --seed 42 \
    --model_name_or_path prajjwal1/bert-tiny \
    --dataset_name squad_v2 \
    --version_2_with_negative \
    --do_train \
    --do_eval \
    --per_device_train_batch_size 12 \
    --learning_rate 1e-4 \
    --num_train_epochs 2 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --output_dir out_dir/ \
    --optim MFAC \
    --optim_args '{"lr": 1e-4, "num_grads": 1024, "damp": 1e-6}'

We believe these results could be improved with modest tuning of hyperparameters: per_device_train_batch_size, learning_rate, num_train_epochs, num_grads and damp. For the sake of fair comparison and a robust default setup we use the same hyperparameters across all models (bert-tiny, bert-mini) and all datasets (SQuAD version 2 and GLUE).

Our code for M-FAC can be found here: https://github.com/IST-DASLab/M-FAC. A step-by-step tutorial on how to integrate and use M-FAC with any repository can be found here: https://github.com/IST-DASLab/M-FAC/tree/master/tutorials.

BibTeX entry and citation info

@article{frantar2021m,
  title={M-FAC: Efficient Matrix-Free Approximations of Second-Order Information},
  author={Frantar, Elias and Kurtic, Eldar and Alistarh, Dan},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  year={2021}
}
Downloads last month
16
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.