--- library_name: transformers license: apache-2.0 base_model: tashrifmahmud/sentiment_analysis_model tags: - generated_from_trainer metrics: - accuracy - precision - recall - f1 model-index: - name: sentiment_analysis_model_v2 results: [] datasets: - stanfordnlp/imdb - cornell-movie-review-data/rotten_tomatoes language: - en --- # sentiment_analysis_model_v2 This model is a fine-tuned version of [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) model and second iteration over [tashrifmahmud/sentiment_analysis_model](https://huggingface.co/tashrifmahmud/sentiment_analysis_model) on [IMDB](https://huggingface.co/datasets/stanfordnlp/imdb) and ["Rotten Tomatoes"](https://huggingface.co/datasets/cornell-movie-review-data/rotten_tomatoes) dataset. It achieves the following results on the evaluation set (model has been updated at epoch 1): - Loss: 0.3682 - Accuracy: 0.8396 - Precision: 0.8267 - Recall: 0.8593 - F1: 0.8427 ## Model description This model is a fine-tuned version of the DistilBERT transformer architecture for sentiment analysis. It was trained on the IMDB dataset for binary classification, distinguishing between positive and negative sentiment in movie reviews. The model has been further fine-tuned on the Rotten Tomatoes dataset to improve its generalization and performance on movie-related text. - **Architecture:** DistilBERT (a distilled version of BERT for faster inference). - **Task:** Sentiment Analysis (binary classification: positive or negative sentiment). - **Pre-training:** The model was pre-trained on a large corpus (BERT's original training). - **Fine-tuning:** Fine-tuned using both IMDB and Rotten Tomatoes datasets. ## Intended uses & limitations **Intended uses:** This model is suitable for classifying the sentiment of text, particularly movie reviews. It can be used in various applications such as: Sentiment analysis for social media posts, customer reviews, or product feedback. Analyzing movie reviews, comments, or related textual data. As part of a sentiment-aware recommendation system, content moderation tool, or market research. **Limitations:** The model is specifically tuned for movie-related sentiment analysis. Its performance on non-movie-related text (e.g., general product reviews, news articles) may not be optimal. The model may not perform well on texts with highly domain-specific terminology outside of movie-related contexts. This model may struggle with sarcasm, irony, and nuanced expressions of sentiment, as is typical with many sentiment analysis models. ## Training and evaluation data **Training data:** - **IMDB dataset:** The model was initially trained on the IMDB movie reviews dataset, which consists of 25,000 reviews labeled as positive or negative. - **Rotten Tomatoes dataset:** To improve the model's performance and generalization, it was further fine-tuned using the Rotten Tomatoes dataset, which contains movie reviews and ratings. Evaluation data: **Test data from Rotten Tomatoes:** The model's evaluation was performed using the test set of the Rotten Tomatoes dataset to assess its ability to generalize to unseen movie reviews. **Improvement Metrics after fine-tuning on Rotten Tomatoes:** - Accuracy increased from 82.1% to 84.3%. - Precision improved from 81.61% to 84.81%. - F1 Score saw a boost from 83.62% to 85.37%. - Loss decreased from 0.4268 to 0.3621. - Runtime was reduced from 17.7 seconds to 15.61 seconds. - The model's throughput improved, with samples per second increasing from 56.49 to 64.05, and steps per second from 7.06 to 8.01. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - num_epochs: 3 ### Training results As we can see, after Epoch 1.0 the Loss goes up indicating overfitting and thus the best model of Epock 1.0 (checkpoint-534) is pushed on the hub. | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 | |:-------------:|:-----:|:----:|:---------------:|:--------:|:---------:|:------:|:------:| | 0.365 | 1.0 | 534 | 0.3682 | 0.8396 | 0.8267 | 0.8593 | 0.8427 | | 0.2804 | 2.0 | 1068 | 0.3892 | 0.8452 | 0.8525 | 0.8349 | 0.8436 | | 0.2301 | 3.0 | 1602 | 0.4342 | 0.8443 | 0.8404 | 0.8499 | 0.8451 | ### Framework versions - Transformers 4.46.2 - Pytorch 2.5.1+cu121 - Datasets 3.1.0 - Tokenizers 0.20.3