Automatic Speech Recognition
Transformers
Safetensors
Portuguese
whisper
contrastive-learning
synthetic-data-filtering
Inference Endpoints
yuriyvnv commited on
Commit
7351cbe
·
verified ·
1 Parent(s): 7fb8fff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -7
README.md CHANGED
@@ -1,15 +1,29 @@
1
  ---
2
  library_name: transformers
3
- tags: [automatic-speech-recognition, contrastive-learning, synthetic-data-filtering]
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
  # Model Card for Finetuned Version of Whisper-Small
7
 
8
  This model was trained on a subset of the synthetically generated data that later on was filtered to increase the performance of Whisper Model.
9
- The approach involves aligning representations of synthetic audio and corresponding text transcripts to identify and remove low-quality samples, improving the overall training data quality
 
10
  In this Specific Model we used 96,08% of synthetic data generated by SeamllesMT4LargeV2, the rest was removed by the filtering model.
11
  The training set also contained, the CommonVoice Dataset, Multilibri Speach, and Bracarense (Fully Portuguese Dialect)
12
-
13
 
14
 
15
  ## Model Details
@@ -127,7 +141,4 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
127
  - **Hardware Type:** NVIDIA A10G
128
  - **Hours used:** 15
129
  - **Cloud Provider:** AWS
130
- - **Compute Region:** US EAST
131
-
132
-
133
-
 
1
  ---
2
  library_name: transformers
3
+ tags:
4
+ - automatic-speech-recognition
5
+ - contrastive-learning
6
+ - synthetic-data-filtering
7
+ license: apache-2.0
8
+ datasets:
9
+ - mozilla-foundation/common_voice_17_0
10
+ - facebook/multilingual_librispeech
11
+ language:
12
+ - pt
13
+ metrics:
14
+ - wer
15
+ - cer
16
+ pipeline_tag: automatic-speech-recognition
17
  ---
18
 
19
  # Model Card for Finetuned Version of Whisper-Small
20
 
21
  This model was trained on a subset of the synthetically generated data that later on was filtered to increase the performance of Whisper Model.
22
+ The approach involves aligning representations of synthetic audio and corresponding text transcripts to identify and remove low-quality samples, improving the overall training data quality.
23
+ -------------------------------
24
  In this Specific Model we used 96,08% of synthetic data generated by SeamllesMT4LargeV2, the rest was removed by the filtering model.
25
  The training set also contained, the CommonVoice Dataset, Multilibri Speach, and Bracarense (Fully Portuguese Dialect)
26
+ ------------------------------
27
 
28
 
29
  ## Model Details
 
141
  - **Hardware Type:** NVIDIA A10G
142
  - **Hours used:** 15
143
  - **Cloud Provider:** AWS
144
+ - **Compute Region:** US EAST