my-north-ai
/

whisper-medium-pt

Automatic Speech Recognition

contrastive-learning

synthetic-data-filtering

Inference Endpoints

Model card Files Files and versions Community

yuriyvnv commited on Jul 17, 2024

Commit

7351cbe

·

verified ·

1 Parent(s): 7fb8fff

Update README.md

Files changed (1) hide show

README.md +18 -7

README.md CHANGED Viewed

@@ -1,15 +1,29 @@
 ---
 library_name: transformers
-tags: [automatic-speech-recognition, contrastive-learning, synthetic-data-filtering]
 ---
 # Model Card for Finetuned Version of Whisper-Small
 This model was trained on a subset of the synthetically generated data that later on was filtered to increase the performance of Whisper Model.
-The approach involves aligning representations of synthetic audio and corresponding text transcripts to identify and remove low-quality samples, improving the overall training data quality
 In this Specific Model we used 96,08% of synthetic data generated by SeamllesMT4LargeV2, the rest was removed by the filtering model.
 The training set also contained, the CommonVoice Dataset, Multilibri Speach, and Bracarense (Fully Portuguese Dialect)
 ## Model Details
@@ -127,7 +141,4 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 - **Hardware Type:** NVIDIA A10G
 - **Hours used:** 15
 - **Cloud Provider:** AWS
-- **Compute Region:** US EAST

 ---
 library_name: transformers
+tags:
+- automatic-speech-recognition
+- contrastive-learning
+- synthetic-data-filtering
+license: apache-2.0
+datasets:
+- mozilla-foundation/common_voice_17_0
+- facebook/multilingual_librispeech
+language:
+- pt
+metrics:
+- wer
+- cer
+pipeline_tag: automatic-speech-recognition
 ---
 # Model Card for Finetuned Version of Whisper-Small
 This model was trained on a subset of the synthetically generated data that later on was filtered to increase the performance of Whisper Model.
+The approach involves aligning representations of synthetic audio and corresponding text transcripts to identify and remove low-quality samples, improving the overall training data quality.
+-------------------------------
 In this Specific Model we used 96,08% of synthetic data generated by SeamllesMT4LargeV2, the rest was removed by the filtering model.
 The training set also contained, the CommonVoice Dataset, Multilibri Speach, and Bracarense (Fully Portuguese Dialect)
+------------------------------
 ## Model Details
 - **Hardware Type:** NVIDIA A10G
 - **Hours used:** 15
 - **Cloud Provider:** AWS
+- **Compute Region:** US EAST