atakarim's picture
Upload tokenizer
9c2a460 verified
metadata
base_model: facebook/wav2vec2-xls-r-300m
datasets:
  - common_voice_11_0
license: apache-2.0
metrics:
  - wer
tags:
  - generated_from_trainer
model-index:
  - name: wav2vec2-large-xls-r-300m-Hindi
    results:
      - task:
          type: automatic-speech-recognition
          name: Automatic Speech Recognition
        dataset:
          name: common_voice_11_0
          type: common_voice_11_0
          config: hi
          split: test
          args: hi
        metrics:
          - type: wer
            value: 0.5158313579410768
            name: Wer

wav2vec2-large-xls-r-300m-Hindi

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice_11_0 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7786
  • Wer: 0.5158

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 8
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
6.0146 0.7214 400 2.8409 0.9970
1.2129 1.4427 800 1.1656 0.7795
0.7592 2.1641 1200 1.0091 0.7199
0.5798 2.8855 1600 0.9187 0.6614
0.4609 3.6069 2000 0.8386 0.6084
0.3828 4.3282 2400 0.8442 0.6097
0.3242 5.0496 2800 0.7907 0.5744
0.2619 5.7710 3200 0.7661 0.5485
0.2132 6.4923 3600 0.7943 0.5388
0.1911 7.2137 4000 0.7835 0.5278
0.1637 7.9351 4400 0.7786 0.5158

Framework versions

  • Transformers 4.42.4
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1