neethuvm commited on
Commit
12878e1
·
verified ·
1 Parent(s): 576a3af

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -5
README.md CHANGED
@@ -31,27 +31,60 @@ pipeline_tag: automatic-speech-recognition
31
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
32
  should probably proofread and complete it, then remove this comment. -->
33
 
34
- # Whisper Small Ar - Neethu VM
35
 
36
- This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Common Voice 11.0 dataset.
37
  It achieves the following results on the evaluation set:
38
  - Loss: 0.3402
39
  - Wer: 44.8627
40
 
41
  ## Model description
42
 
43
- More information needed
44
 
45
  ## Intended uses & limitations
46
 
47
- More information needed
 
 
 
 
 
 
48
 
49
  ## Training and evaluation data
50
 
51
- More information needed
 
 
52
 
 
 
 
 
53
  ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
 
 
 
 
 
 
55
  ### Training hyperparameters
56
 
57
  The following hyperparameters were used during training:
@@ -67,6 +100,8 @@ The following hyperparameters were used during training:
67
 
68
  ### Training results
69
 
 
 
70
  | Training Loss | Epoch | Step | Validation Loss | Wer |
71
  |:-------------:|:------:|:----:|:---------------:|:-------:|
72
  | 0.3059 | 0.4156 | 1000 | 0.4141 | 49.8008 |
@@ -74,6 +109,17 @@ The following hyperparameters were used during training:
74
  | 0.1908 | 1.2469 | 3000 | 0.3519 | 46.4806 |
75
  | 0.1699 | 1.6625 | 4000 | 0.3402 | 44.8627 |
76
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
  ### Framework versions
79
 
 
31
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
32
  should probably proofread and complete it, then remove this comment. -->
33
 
34
+ # Whisper Small Arabic - Neethu VM
35
 
36
+ This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Arabic Common Voice 11.0 dataset .
37
  It achieves the following results on the evaluation set:
38
  - Loss: 0.3402
39
  - Wer: 44.8627
40
 
41
  ## Model description
42
 
43
+ This model is a fine-tuned version of openai/whisper-small, tailored specifically for Arabic speech recognition tasks. The model was trained using the Arabic subset of the Common Voice 11.0 dataset, which is a large-scale, open-source collection of transcribed speech data provided by the Mozilla Foundation.
44
 
45
  ## Intended uses & limitations
46
 
47
+ Speech-to-Text Conversion: This model is designed to transcribe spoken Arabic into written text. It is suitable for applications requiring accurate and efficient conversion of audio data to text.
48
+
49
+ Voice-Activated Interfaces: Enhance applications and devices with voice recognition capabilities, enabling users to interact with technology in Arabic.
50
+
51
+ Accessibility Tools: Assist in making audio content accessible to those with hearing impairments or in environments where audio cannot be played.
52
+
53
+ Content Creation and Archiving: Streamline the transcription process for content creators, journalists, and researchers working with Arabic audio materials.
54
 
55
  ## Training and evaluation data
56
 
57
+ Dataset: The model was fine-tuned using the Arabic subset of the Common Voice 11.0 dataset, a large-scale, open-source dataset created by Mozilla.
58
+
59
+ Data Characteristics: The Common Voice dataset is a diverse collection of voice recordings contributed by volunteers worldwide, encompassing a wide range of speakers, accents, and environments. The Arabic subset includes various dialects and speech styles, contributing to the model's ability to generalize across different Arabic-speaking regions.
60
 
61
+ Preprocessing: The audio data was preprocessed to standardize sampling rates and formats, ensuring compatibility with the Whisper model's input requirements.
62
+ Dataset: The evaluation was conducted using a designated test split of the Common Voice Arabic dataset. This ensures that the model's performance metrics are unbiased and reflective of its ability to generalize to new data.
63
+
64
+ Metrics: The primary metric used for evaluating the model's performance is the Word Error Rate (WER), which measures the accuracy of the transcriptions by comparing the predicted text to the ground truth.
65
  ## Training procedure
66
+ Steps Involved
67
+ Data Preparation:
68
+
69
+ Data Collection: Gathered the Arabic subset from the Common Voice 11.0 dataset.
70
+ Preprocessing: Standardized the audio data by normalizing sampling rates and formats. Transcriptions were cleaned and aligned with the audio files to ensure accurate training pairs.
71
+ Model Setup:
72
+
73
+ Base Model: The Whisper-small model was used as the base model due to its capability to handle diverse speech recognition tasks.
74
+ Environment Configuration: Training was conducted on a machine equipped with a suitable GPU to handle the model's computational requirements efficiently.
75
+ Fine-Tuning:
76
+
77
+ Hyperparameters: The learning rate, batch size, and other training hyperparameters were chosen to balance performance and training time.
78
+ Training Process: The model was trained over multiple epochs, with regular checkpoints to save progress and evaluate performance on the validation set.
79
+ Loss Function: Cross-entropy loss was used to optimize the model's predictions against the ground truth transcriptions.
80
+ Evaluation:
81
 
82
+ Validation Set: A portion of the dataset was reserved for validation to monitor the model's performance and avoid overfitting.
83
+ Metrics: Word Error Rate (WER) and validation loss were used as the primary metrics to assess the model's accuracy and generalization capability.
84
+ Optimization:
85
+
86
+ Early Stopping: Implemented to prevent overfitting, stopping the training when the validation loss ceased to improve significantly.
87
+ Fine-Tuning Adjustments: Hyperparameters and learning strategies were adjusted based on validation performance to enhance model accuracy.
88
  ### Training hyperparameters
89
 
90
  The following hyperparameters were used during training:
 
100
 
101
  ### Training results
102
 
103
+ The table below shows the model's training and validation progress over multiple epochs, highlighting improvements in both loss and Word Error Rate (WER) as training progressed.
104
+
105
  | Training Loss | Epoch | Step | Validation Loss | Wer |
106
  |:-------------:|:------:|:----:|:---------------:|:-------:|
107
  | 0.3059 | 0.4156 | 1000 | 0.4141 | 49.8008 |
 
109
  | 0.1908 | 1.2469 | 3000 | 0.3519 | 46.4806 |
110
  | 0.1699 | 1.6625 | 4000 | 0.3402 | 44.8627 |
111
 
112
+ Analysis
113
+ Training Loss: This metric reflects the model's performance on the training data. A decrease in training loss over time indicates that the model is learning to fit the training data more accurately.
114
+
115
+ Validation Loss: This metric indicates how well the model generalizes to unseen data. The consistent decrease in validation loss suggests improved generalization.
116
+
117
+ Word Error Rate (WER): This is the key metric for evaluating the model's accuracy in transcribing speech. A reduction in WER from 49.80% to 44.86% demonstrates significant improvements in the model's ability to accurately convert Arabic speech to text.
118
+
119
+ These results showcase the model's learning curve and highlight its increased proficiency with further training. This information can help users understand the model's training dynamics and its expected performance in practical applications.
120
+
121
+
122
+
123
 
124
  ### Framework versions
125