Model Card for uvegesistvan/wildmann_german_proposal_2b_GER_ENG_POL
Model Overview
This model is a multi-class emotion classifier trained on German text that was first machine-translated into English as an intermediary language and then into Polish. It identifies nine distinct emotional states in text. The training process utilizes a multilingual dataset to explore the effects of multi-step machine translation on emotion classification.
Emotion Classes
The model classifies the following emotional states:
- Anger (0)
- Fear (1)
- Disgust (2)
- Sadness (3)
- Joy (4)
- Enthusiasm (5)
- Hope (6)
- Pride (7)
- No emotion (8)
Dataset and Preprocessing
The dataset was created using a three-step machine translation process: German → English → Polish. Emotional annotations were applied after the final translation to ensure consistency. Preprocessing steps included:
- Balancing the dataset through undersampling overrepresented classes like "No emotion" and "Anger."
- Normalizing text to mitigate noise introduced by multi-step translations.
Evaluation Metrics
The model's performance was evaluated using standard classification metrics. Results are detailed below:
Class | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
Anger (0) | 0.53 | 0.57 | 0.55 | 777 |
Fear (1) | 0.80 | 0.78 | 0.79 | 776 |
Disgust (2) | 0.91 | 0.95 | 0.93 | 776 |
Sadness (3) | 0.80 | 0.86 | 0.83 | 775 |
Joy (4) | 0.75 | 0.85 | 0.80 | 777 |
Enthusiasm (5) | 0.72 | 0.52 | 0.60 | 776 |
Hope (6) | 0.46 | 0.63 | 0.53 | 777 |
Pride (7) | 0.71 | 0.80 | 0.75 | 776 |
No emotion (8) | 0.64 | 0.48 | 0.55 | 1553 |
Overall Metrics
- Accuracy: 0.69
- Macro Average: Precision = 0.70, Recall = 0.71, F1-Score = 0.70
- Weighted Average: Precision = 0.70, Recall = 0.69, F1-Score = 0.69
Performance Insights
The model demonstrates strong performance in classes such as "Disgust" and "Fear." However, the "Hope" and "Enthusiasm" classes show slightly lower scores, likely due to complexities introduced by the multi-step translation process. Despite these challenges, the model maintains robust overall accuracy.
Model Usage
Applications
- Emotion analysis of German texts via machine-translated Polish representations.
- Sentiment analysis for Polish-language datasets derived from multilingual sources.
- Research on the effects of multi-step machine translation in emotion classification.
Limitations
- The multi-step translation process introduces additional noise, potentially impacting classification accuracy for subtle or ambiguous emotions.
- Emotional nuances and cultural context might be lost during translation.
Ethical Considerations
The reliance on multi-step machine translation can amplify biases or inaccuracies introduced at each stage. Careful validation is recommended before applying the model in sensitive areas such as mental health, social research, or customer feedback analysis.
Citation
For further information, visit: uvegesistvan/wildmann_german_proposal_2b_GER_ENG_POL
- Downloads last month
- 8