--- license: mit datasets: - winvoker/turkish-sentiment-analysis-dataset language: - tr base_model: - answerdotai/ModernBERT-large --- Here's an updated **Model Card** in a **README format** based on the training results and the model you've used (ModernBERT-large for Turkish sentiment analysis): ```markdown # Turkish Sentiment ModernBERT-large ``` This is a fine-tuned **ModernBERT-large** model for **Turkish Sentiment Analysis**. The model was trained on the `winvoker/turkish-sentiment-analysis-dataset` and is designed to classify Turkish text into sentiment categories such as positive, negative, and neutral. ## Model Overview - **Model Type**: ModernBERT (BERT variant) - **Task**: Sentiment Analysis - **Languages**: Turkish - **Dataset**: [winvoker/turkish-sentiment-analysis-dataset](https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset) - **Labels**: Positive, Negative, Neutral - **Fine-Tuning**: Fine-tuned for sentiment classification. ## Performance Metrics The model was trained for **4 epochs** with the following results: | Epoch | Training Loss | Validation Loss | Accuracy | F1 Score | |-------|---------------|-----------------|----------|----------| | 1 | 0.2884 | 0.1133 | 95.72% | 92.18% | | 2 | 0.1759 | 0.1050 | 96.24% | 93.33% | | 3 | 0.0633 | 0.1233 | 96.14% | 93.19% | | 4 | 0.0623 | 0.1213 | 96.14% | 93.19% | - **Training Loss**: Measures how well the model fits the training data. - **Validation Loss**: Measures how well the model generalizes to unseen data. - **Accuracy**: Percentage of correct predictions over all examples. - **F1 Score**: A balanced metric between precision and recall, accounting for both false positives and false negatives. ## Model Inference Example You can use this model for sentiment analysis of Turkish text. Here’s an example of how to use it: ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch # Load the pre-trained model and tokenizer model_name = "bayrameker/Turkish-sentiment-ModernBERT-large" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Example texts for prediction texts = ["bu ürün çok iyi", "bu ürün berbat"] # Tokenize the inputs inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt") # Make predictions with torch.no_grad(): logits = model(**inputs).logits # Get the predicted sentiment labels predictions = torch.argmax(logits, dim=-1) labels = ["Negative", "Neutral", "Positive"] # Adjust based on your label mapping for text, pred in zip(texts, predictions): print(f"Text: {text} -> Sentiment: {labels[pred.item()]}") ``` ### Example Output: ``` Text: bu ürün çok iyi -> Sentiment: Positive Text: bu ürün berbat -> Sentiment: Negative ``` ## Installation To use this model, install the following dependencies: ```bash pip install transformers pip install torch pip install datasets ``` ## Model Card - **Model Name**: Turkish-sentiment-ModernBERT-large - **Hugging Face Repo**: [Link to Model Repository](https://huggingface.co/bayrameker/Turkish-sentiment-ModernBERT-large) - **License**: MIT (or any applicable license you choose) - **Author**: Bayram Eker - **Date**: 2024-12-21 ## Training Details - **Model**: ModernBERT-large - **Framework**: PyTorch - **Training Time**: Approximately 50 minutes (4 epochs) - **Batch Size**: 64 - **Learning Rate**: 8e-5 - **Optimizer**: AdamW - **Mixed Precision**: bf16 for A100 GPU ## Acknowledgments - The model was trained on the `winvoker/turkish-sentiment-analysis-dataset` dataset. - Special thanks to the Hugging Face community and the contributors to the transformers library. - Thanks to all contributors of the dataset and pretrained models. ## Future Work - Expand the model with more complex sentiment labels (e.g., multi-class sentiments, aspect-based sentiment analysis). - Fine-tune the model on a larger, more diverse dataset for better generalization across various domains. ## License This model is licensed under the MIT License. See the LICENSE file for more details.