asr-wav2vec2-commonvoice-15-fr : LeBenchmark/wav2vec2-FR-7K-large fine-tuned on CommonVoice 15.0 French

asr-wav2vec2-commonvoice-15-fr is an Automatic Speech Recognition model fine-tuned on CommonVoice 15.0 French set with LeBenchmark/wav2vec2-FR-7K-large as the pretrained wav2vec2 model.

The fine-tuned model achieves the following performance :

Release Valid WER Test WER GPUs Epochs
2023-09-08 9.14 11.21 4xV100 32GB 30

πŸ“ Model Details

The ASR system is composed of:

  • the Tokenizer (char) that transforms the input text into a sequence of characters ("cat" into ["c", "a", "t"]) and trained with the train transcriptions (train.tsv).
  • the Acoustic model (wav2vec2.0 + DNN + CTC greedy decode). The pretrained wav2vec 2.0 model LeBenchmark/wav2vec2-FR-7K-large is combined with two DNN layers and fine-tuned on CommonVoice FR. The final acoustic representation is given to the CTC greedy decode.

We used recordings sampled at 16kHz (single channel).

πŸ’» How to transcribe a file with the model

Install and import speechbrain

pip install speechbrain
from speechbrain.inference.ASR import EncoderASR

Pipeline

def transcribe(audio, model):
    return model.transcribe_file(audio).lower()


def save_transcript(transcript, audio, output_file):
    with open(output_file, 'w', encoding='utf-8') as file:
        file.write(f"{audio}\t{transcript}\n")


def main():
    model = EncoderASR.from_hparams("Propicto/asr-wav2vec2-commonvoice-15-fr", savedir="tmp/")
    transcript = transcribe(audio, model)
    save_transcript(transcript, audio, "out.txt")

βš™οΈ Training Details

Training Data

We use the train / valid / test splits provided by CommonVoice, which corresponds to:

Train Valid Test
# utterances 527,554 16,132 16,132
# hours 756.19 25.84 26.11

Training Procedure

We follow the training procedure provided in the ASR-CTC speechbrain recipe. The common_voice_prepare.py script handles the preprocessing of the dataset.

Training Hyperparameters

Refer to the hyperparams.yaml file to get the hyperparameters information.

Training time

With 4xV100 32GB, the training took ~ 81 hours.

Libraries

Speechbrain:

@misc{SB2021,
    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
    title = {SpeechBrain},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
  }

πŸ’‘ Information

  • Developed by: CΓ©cile Macaire
  • Funded by [optional]: GENCI-IDRIS (Grant 2023-AD011013625R1) PROPICTO ANR-20-CE93-0005
  • Language(s) (NLP): French
  • License: Apache-2.0
  • Finetuned from model: LeBenchmark/wav2vec2-FR-7K-large

πŸ“Œ Citation

@inproceedings{macaire24_interspeech,
  title     = {Towards Speech-to-Pictograms Translation},
  author    = {Cécile Macaire and Chloé Dion and Didier Schwab and Benjamin Lecouteux and Emmanuelle Esperança-Rodier},
  year      = {2024},
  booktitle = {Interspeech 2024},
  pages     = {857--861},
  doi       = {10.21437/Interspeech.2024-490},
  issn      = {2958-1796},
}
Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Propicto/asr-wav2vec2-commonvoice-15-fr

Finetuned
(2)
this model

Dataset used to train Propicto/asr-wav2vec2-commonvoice-15-fr