firdhokk
/

speech-emotion-recognition-with-openai-whisper-large-v3

Audio Classification

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

firdhokk commited on 24 days ago

Commit

1aa6ba5

•

1 Parent(s): e10f24a

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -53,6 +53,7 @@ This distribution reflects the balance of emotions in the dataset, with some emo
 The model used is the **Wav2Vec2 Large XLR-53** model, fine-tuned for **audio classification** tasks:
 - **Model**: [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
 - **Output**: Emotion labels (`Angry', 'Disgust', 'Fearful', 'Happy', 'Neutral', 'Sad', 'Surprised'`)
 I map the emotion labels to numeric IDs and use them for model training and evaluation.
@@ -69,6 +70,7 @@ The model is trained with the following parameters:
 - **Warmup Ratio for LR Scheduler**: `0.1`
 - **Number of Epochs**: `25`
 - **Mixed Precision Training**: Native AMP (Automatic Mixed Precision)
 These parameters ensure efficient model training and stability, especially when dealing with large datasets and deep models like **Whisper**.
 The training utilizes **Wandb** for experiment tracking and monitoring.
@@ -80,6 +82,7 @@ The following evaluation metrics were obtained after training the model:
 - **Precision**: `0.9230`
 - **Recall**: `0.9199`
 - **F1 Score**: `0.9198`
 These metrics demonstrate the model's performance on the speech emotion recognition task. The high values for accuracy, precision, recall, and F1 score indicate that the model is effectively identifying emotional states from speech data.

 The model used is the **Wav2Vec2 Large XLR-53** model, fine-tuned for **audio classification** tasks:
 - **Model**: [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3)
 - **Output**: Emotion labels (`Angry', 'Disgust', 'Fearful', 'Happy', 'Neutral', 'Sad', 'Surprised'`)
 I map the emotion labels to numeric IDs and use them for model training and evaluation.
 - **Warmup Ratio for LR Scheduler**: `0.1`
 - **Number of Epochs**: `25`
 - **Mixed Precision Training**: Native AMP (Automatic Mixed Precision)
 These parameters ensure efficient model training and stability, especially when dealing with large datasets and deep models like **Whisper**.
 The training utilizes **Wandb** for experiment tracking and monitoring.
 - **Precision**: `0.9230`
 - **Recall**: `0.9199`
 - **F1 Score**: `0.9198`
 These metrics demonstrate the model's performance on the speech emotion recognition task. The high values for accuracy, precision, recall, and F1 score indicate that the model is effectively identifying emotional states from speech data.