metadata

license: apache-2.0
tags:
  - generated_from_trainer
datasets:
  - common_voice_8_0
metrics:
  - wer
model-index:
  - name: wav2vec2-large-xls-r-1b-frisian-cv-8-large-train
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice_8_0
          type: common_voice_8_0
          config: fy-NL
          split: validation
          args: fy-NL
        metrics:
          - name: Wer
            type: wer
            value: 0.04206541922582488
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice_8_0
          type: common_voice_8_0
          config: fy-NL
          split: test
          args: fy-NL
        metrics:
          - name: Wer
            type: wer
            value: 0.04108252637664402

wav2vec2-large-xls-r-1b-frisian-cv-8-large-train

This model is a fine-tuned version of facebook/wav2vec2-xls-r-1b on the common_voice_8_0 dataset. It achieves the following results on the evaluation set:

Loss: 0.0444
Wer: 0.0421

And on the test set:

Wer: 0.0411

Model description

This model has been developed for my Master's thesis in "Voice Technology" at Rijksuniversiteit Groningen - Campus Fryslân. It corresponds to experiment 2 where I use as training set all validated data (~ 50 hours) except the test and evaluation sets (~ 4.5 hours each). The number of training hours adds up to 41 hours of Frisian speech.

Intended uses & limitations

The intended use is for recognizing Frisian speech.

Limitations include no LM rescoring and using version 8.0 of Common Voice instead of 13.0.

Training and evaluation data

The evaluation split used is the one available in the Common Voice 8.0 Frisian subset. The train split corresponds to all of the validated data except for the recordings found in the evaluation and test splits.

Training procedure

The script used for training this model can be found in this GitHub repository: link.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 36
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 40
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
7.2522	0.48	400	3.1028	1.0
3.0052	0.97	800	2.9334	1.0
2.0865	1.45	1200	0.7288	0.6646
1.1654	1.93	1600	0.4298	0.4196
0.9665	2.41	2000	0.3134	0.3162
0.7891	2.9	2400	0.2378	0.2587
0.8366	3.38	2800	0.1896	0.2016
0.8606	3.86	3200	0.1647	0.1903
0.7536	4.34	3600	0.1486	0.1573
0.632	4.83	4000	0.1341	0.1450
0.5198	5.31	4400	0.1223	0.1415
0.4998	5.79	4800	0.1155	0.1388
0.4273	6.27	5200	0.1132	0.1302
0.3982	6.76	5600	0.1036	0.1102
0.3964	7.24	6000	0.0988	0.1209
0.3848	7.72	6400	0.0995	0.0985
0.3702	8.2	6800	0.0969	0.0945
0.3612	8.69	7200	0.0899	0.0967
0.3518	9.17	7600	0.0856	0.1061
0.3371	9.65	8000	0.0902	0.0875
0.3295	10.13	8400	0.0819	0.0914
0.3157	10.62	8800	0.0785	0.0937
0.3025	11.1	9200	0.0782	0.0804
0.3092	11.58	9600	0.0758	0.0845
0.301	12.06	10000	0.0775	0.0847
0.3016	12.55	10400	0.0730	0.0776
0.2892	13.03	10800	0.0719	0.0735
0.283	13.51	11200	0.0728	0.0727
0.2806	13.99	11600	0.0694	0.0710
0.2639	14.48	12000	0.0705	0.0703
0.2606	14.96	12400	0.0652	0.0668
0.2595	15.44	12800	0.0638	0.0691
0.2611	15.92	13200	0.0636	0.0713
0.246	16.41	13600	0.0632	0.0653
0.2544	16.89	14000	0.0605	0.0638
0.2509	17.37	14400	0.0640	0.0646
0.2381	17.85	14800	0.0604	0.0663
0.2336	18.34	15200	0.0590	0.0628
0.2285	18.82	15600	0.0580	0.0612
0.2362	19.3	16000	0.0655	0.0638
0.2279	19.78	16400	0.0611	0.0669
0.2228	20.27	16800	0.0606	0.0621
0.2242	20.75	17200	0.0560	0.0575
0.2053	21.23	17600	0.0571	0.0572
0.2097	21.71	18000	0.0557	0.0555
0.2072	22.2	18400	0.0563	0.0576
0.2076	22.68	18800	0.0532	0.0562
0.2026	23.16	19200	0.0531	0.0540
0.1941	23.64	19600	0.0535	0.0534
0.1983	24.13	20000	0.0528	0.0541
0.2075	24.61	20400	0.0536	0.0538
0.1937	25.09	20800	0.0532	0.0569
0.1943	25.57	21200	0.0511	0.0507
0.1844	26.06	21600	0.0521	0.0521
0.181	26.54	22000	0.0506	0.0507
0.1877	27.02	22400	0.0529	0.0510
0.1825	27.5	22800	0.0527	0.0498
0.1872	27.99	23200	0.0506	0.0485
0.1857	28.47	23600	0.0497	0.0492
0.1766	28.95	24000	0.0504	0.0488
0.1756	29.43	24400	0.0496	0.0482
0.1701	29.92	24800	0.0479	0.0479
0.1717	30.4	25200	0.0499	0.0468
0.1624	30.88	25600	0.0492	0.0466
0.1671	31.36	26000	0.0490	0.0461
0.1704	31.85	26400	0.0482	0.0452
0.1653	32.33	26800	0.0467	0.0446
0.158	32.81	27200	0.0465	0.0449
0.1599	33.29	27600	0.0473	0.0445
0.1558	33.78	28000	0.0475	0.0453
0.1556	34.26	28400	0.0462	0.0445
0.1591	34.74	28800	0.0464	0.0431
0.1544	35.22	29200	0.0476	0.0433
0.1576	35.71	29600	0.0466	0.0434
0.1507	36.19	30000	0.0451	0.0435
0.1501	36.67	30400	0.0453	0.0429
0.1482	37.15	30800	0.0439	0.0432
0.1518	37.64	31200	0.0446	0.0424
0.1454	38.12	31600	0.0449	0.0417
0.145	38.6	32000	0.0440	0.0421
0.147	39.08	32400	0.0441	0.0424
0.141	39.57	32800	0.0444	0.0421

Framework versions

Transformers 4.28.1
Pytorch 2.0.0+cu117
Datasets 2.11.0
Tokenizers 0.13.3