metadata
language:
- sv-SE
license: apache-2.0
tags:
- automatic-speech-recognition
- mozilla-foundation/common_voice_7_0
- generated_from_trainer
- 'no'
- robust-speech-event
- model_for_talk
datasets:
- mozilla-foundation/common_voice_7_0
model-index:
- name: XLS-R-300M-LM - Norwegian
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: NPSC
type: NbAiLab/NPSC
args: sv-SE
metrics:
- name: Eval WER
type: wer
value: 21.1
- name: Eval CER
type: cer
value: 0.06
XLS-R-300M-LM - Norwegian
This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA-FOUNDATION/COMMON_VOICE_7_0 - SV-SE dataset.
Scores without Language Model
Without using a language model, it achieves the following scores on the NPSC Eval set It achieves the following results on the evaluation set without a language model:
- Loss: 0.1992
- WER: 0.2110
- CER: 0.0622
Scores with Language Model
A 5-gram KenLM was added to boost the models performance. After
Model description
This current version is based on checkpoint 8500 of NbAiLab/wav2vec2-xlsr-300M-NPSC-OH
Intended uses & limitations
Demo version only. The model will be updated later this week.
Training and evaluation data
The model is trained and evaluated on NPSC. Unfortunately there is no Norwegian test data in Common Voice, and currently the model is only evaluated on the validation set of NPSC..
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 7.5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2000
- num_epochs: 30.0 (But interrupted after 8500 steps, approx 6 epochs)
- mixed_precision_training: Native AMP