Update README.md

a6af5ea verified 9 months ago

7.84 kB

	---
	language:
	- en
	metrics:
	- accuracy
	tags:
	- bert
	- sentiment
	- emotion
	- feeling
	- label
	license: mit
	---

	### Model Description
	This model, "bert-43-multilabel-emotion-detection", is a fine-tuned version of "bert-base-uncased", trained to classify sentences based on their emotional content into one of 43 categories in the English language. The model was trained on a combination of datasets including tweet_emotions, GoEmotions, and synthetic data, amounting to approximately 271,000 records with around 6,306 records per label.

	### Intended Use
	This model is intended for any application that requires understanding or categorizing the emotional content of English text. This could include sentiment analysis, social media monitoring, customer feedback analysis, and more.

	### Training Data
	The training data comprises the following datasets:
	- Tweet Emotions
	- GoEmotions
	- Synthetic data

	### Training Procedure
	The model was trained over 20 epochs, taking about 6 hours on a Google Colab V100 GPU with 16 GB RAM.

	The following settings have been used:

	```python
	from transformers import TrainingArguments

	training_args = TrainingArguments(
	output_dir='results',
	optim="adamw_torch",
	learning_rate=2e-5, # learning rate
	num_train_epochs=20, # total number of training epochs
	per_device_train_batch_size=128, # batch size per device during training
	per_device_eval_batch_size=128, # batch size for evaluation
	warmup_steps=500, # number of warmup steps for learning rate scheduler
	weight_decay=0.01, # strength of weight decay
	logging_dir='./logs', # directory for storing logs
	logging_steps=100,
	)
	```
	### Performance
	The model achieved the following performance metrics on the validation set:
	- Accuracy: 92.02%
	- Weighted F1-Score: 91.93%
	- Weighted Precision: 91.88%
	- Weighted Recall: 92.02%

	Performance details for each of the 43 labels.

	### Labels Mapping
	\| Label ID \| Emotion \|
	\|----------\|----------------------\|
	\| 0 \| admiration \|
	\| 1 \| amusement \|
	\| 2 \| anger \|
	\| 3 \| annoyance \|
	\| 4 \| approval \|
	\| 5 \| caring \|
	\| 6 \| confusion \|
	\| 7 \| curiosity \|
	\| 8 \| desire \|
	\| 9 \| disappointment \|
	\| 10 \| disapproval \|
	\| 11 \| disgust \|
	\| 12 \| embarrassment \|
	\| 13 \| excitement \|
	\| 14 \| fear \|
	\| 15 \| gratitude \|
	\| 16 \| grief \|
	\| 17 \| joy \|
	\| 18 \| love \|
	\| 19 \| nervousness \|
	\| 20 \| optimism \|
	\| 21 \| pride \|
	\| 22 \| realization \|
	\| 23 \| relief \|
	\| 24 \| remorse \|
	\| 25 \| sadness \|
	\| 26 \| surprise \|
	\| 27 \| neutral \|
	\| 28 \| worry \|
	\| 29 \| happiness \|
	\| 30 \| fun \|
	\| 31 \| hate \|
	\| 32 \| autonomy \|
	\| 33 \| safety \|
	\| 34 \| understanding \|
	\| 35 \| empty \|
	\| 36 \| enthusiasm \|
	\| 37 \| recreation \|
	\| 38 \| sense of belonging \|
	\| 39 \| meaning \|
	\| 40 \| sustenance \|
	\| 41 \| creativity \|
	\| 42 \| boredom \|


	### Accuracy Report
	\| Label \| Precision \| Recall \| F1-Score \|
	\|---------------\|----------------\|----------------\|----------------\|
	\| 0 \| 0.8625 \| 0.7969 \| 0.8284 \|
	\| 1 \| 0.9128 \| 0.9558 \| 0.9338 \|
	\| 2 \| 0.9028 \| 0.8749 \| 0.8886 \|
	\| 3 \| 0.8570 \| 0.8639 \| 0.8605 \|
	\| 4 \| 0.8584 \| 0.8449 \| 0.8516 \|
	\| 5 \| 0.9343 \| 0.9667 \| 0.9502 \|
	\| 6 \| 0.9492 \| 0.9696 \| 0.9593 \|
	\| 7 \| 0.9234 \| 0.9462 \| 0.9347 \|
	\| 8 \| 0.9644 \| 0.9924 \| 0.9782 \|
	\| 9 \| 0.9481 \| 0.9377 \| 0.9428 \|
	\| 10 \| 0.9250 \| 0.9267 \| 0.9259 \|
	\| 11 \| 0.9653 \| 0.9914 \| 0.9782 \|
	\| 12 \| 0.9948 \| 0.9976 \| 0.9962 \|
	\| 13 \| 0.9474 \| 0.9676 \| 0.9574 \|
	\| 14 \| 0.8926 \| 0.8853 \| 0.8889 \|
	\| 15 \| 0.9501 \| 0.9515 \| 0.9508 \|
	\| 16 \| 0.9976 \| 0.9990 \| 0.9983 \|
	\| 17 \| 0.9114 \| 0.8716 \| 0.8911 \|
	\| 18 \| 0.7825 \| 0.7821 \| 0.7823 \|
	\| 19 \| 0.9962 \| 0.9990 \| 0.9976 \|
	\| 20 \| 0.9516 \| 0.9638 \| 0.9577 \|
	\| 21 \| 0.9953 \| 0.9995 \| 0.9974 \|
	\| 22 \| 0.9630 \| 0.9791 \| 0.9710 \|
	\| 23 \| 0.9134 \| 0.9134 \| 0.9134 \|
	\| 24 \| 0.9753 \| 0.9948 \| 0.9849 \|
	\| 25 \| 0.7374 \| 0.7469 \| 0.7421 \|
	\| 26 \| 0.7864 \| 0.7583 \| 0.7721 \|
	\| 27 \| 0.6000 \| 0.5666 \| 0.5828 \|
	\| 28 \| 0.7369 \| 0.6836 \| 0.7093 \|
	\| 29 \| 0.8066 \| 0.7222 \| 0.7620 \|
	\| 30 \| 0.9116 \| 0.9225 \| 0.9170 \|
	\| 31 \| 0.9108 \| 0.9524 \| 0.9312 \|
	\| 32 \| 0.9611 \| 0.9634 \| 0.9622 \|
	\| 33 \| 0.9592 \| 0.9724 \| 0.9657 \|
	\| 34 \| 0.9700 \| 0.9686 \| 0.9693 \|
	\| 35 \| 0.9459 \| 0.9734 \| 0.9594 \|
	\| 36 \| 0.9359 \| 0.9857 \| 0.9601 \|
	\| 37 \| 0.9986 \| 0.9986 \| 0.9986 \|
	\| 38 \| 0.9943 \| 0.9990 \| 0.9967 \|
	\| 39 \| 0.9990 \| 1.0000 \| 0.9995 \|
	\| 40 \| 0.9905 \| 0.9914 \| 0.9910 \|
	\| 41 \| 0.9981 \| 0.9948 \| 0.9964 \|
	\| 42 \| 0.9929 \| 0.9986 \| 0.9957 \|
	\| weighted avg \| 0.9188 \| 0.9202 \| 0.9193 \|

	### How to Use
	```python
	from transformers import pipeline

	# Load the pre-trained model and tokenizer
	model = 'borisn70/bert-43-multilabel-emotion-detection'
	tokenizer = 'borisn70/bert-43-multilabel-emotion-detection'

	# Create a pipeline for sentiment analysis
	nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

	# Test the model with a sentence
	result = nlp("I feel great about this!")

	# Print the result
	print(result)
	```

	### Limitations and Biases
	- The model's performance can vary significantly across different emotional categories, especially those with less representation in the training data.
	- Users should be cautious about potential biases in the training data, which may be reflected in the model's predictions.

	### Contact

	If you have any questions, feedback, or would like to report any issues regarding the model, please feel free to reach out.

	- Email: [[email protected]](mailto:[email protected])
	- LinkedIn: [Boris Atayan](https://www.linkedin.com/in/borisatayan/)

	---
	language:
	- en
	metrics:
	- accuracy
	tags:
	- bert
	- sentiment
	- emotion
	- feeling
	- label
	license: mit
	---

	### Model Description
	This model, "bert-43-multilabel-emotion-detection", is a fine-tuned version of "bert-base-uncased", trained to classify sentences based on their emotional content into one of 43 categories in the English language. The model was trained on a combination of datasets including tweet_emotions, GoEmotions, and synthetic data, amounting to approximately 271,000 records with around 6,306 records per label.

	### Intended Use
	This model is intended for any application that requires understanding or categorizing the emotional content of English text. This could include sentiment analysis, social media monitoring, customer feedback analysis, and more.

	### Training Data
	The training data comprises the following datasets:
	- Tweet Emotions
	- GoEmotions
	- Synthetic data

	### Training Procedure
	The model was trained over 20 epochs, taking about 6 hours on a Google Colab V100 GPU with 16 GB RAM.

	The following settings have been used:

	```python
	from transformers import TrainingArguments

	training_args = TrainingArguments(
	output_dir='results',
	optim="adamw_torch",
	learning_rate=2e-5, # learning rate
	num_train_epochs=20, # total number of training epochs
	per_device_train_batch_size=128, # batch size per device during training
	per_device_eval_batch_size=128, # batch size for evaluation
	warmup_steps=500, # number of warmup steps for learning rate scheduler
	weight_decay=0.01, # strength of weight decay
	logging_dir='./logs', # directory for storing logs
	logging_steps=100,
	)
	```
	### Performance
	The model achieved the following performance metrics on the validation set:
	- Accuracy: 92.02%
	- Weighted F1-Score: 91.93%
	- Weighted Precision: 91.88%
	- Weighted Recall: 92.02%

	Performance details for each of the 43 labels.

	### Labels Mapping
	\| Label ID \| Emotion \|
	\|----------\|----------------------\|
	\| 0 \| admiration \|
	\| 1 \| amusement \|
	\| 2 \| anger \|
	\| 3 \| annoyance \|
	\| 4 \| approval \|
	\| 5 \| caring \|
	\| 6 \| confusion \|
	\| 7 \| curiosity \|
	\| 8 \| desire \|
	\| 9 \| disappointment \|
	\| 10 \| disapproval \|
	\| 11 \| disgust \|
	\| 12 \| embarrassment \|
	\| 13 \| excitement \|
	\| 14 \| fear \|
	\| 15 \| gratitude \|
	\| 16 \| grief \|
	\| 17 \| joy \|
	\| 18 \| love \|
	\| 19 \| nervousness \|
	\| 20 \| optimism \|
	\| 21 \| pride \|
	\| 22 \| realization \|
	\| 23 \| relief \|
	\| 24 \| remorse \|
	\| 25 \| sadness \|
	\| 26 \| surprise \|
	\| 27 \| neutral \|
	\| 28 \| worry \|
	\| 29 \| happiness \|
	\| 30 \| fun \|
	\| 31 \| hate \|
	\| 32 \| autonomy \|
	\| 33 \| safety \|
	\| 34 \| understanding \|
	\| 35 \| empty \|
	\| 36 \| enthusiasm \|
	\| 37 \| recreation \|
	\| 38 \| sense of belonging \|
	\| 39 \| meaning \|
	\| 40 \| sustenance \|
	\| 41 \| creativity \|
	\| 42 \| boredom \|


	### Accuracy Report
	\| Label \| Precision \| Recall \| F1-Score \|
	\|---------------\|----------------\|----------------\|----------------\|
	\| 0 \| 0.8625 \| 0.7969 \| 0.8284 \|
	\| 1 \| 0.9128 \| 0.9558 \| 0.9338 \|
	\| 2 \| 0.9028 \| 0.8749 \| 0.8886 \|
	\| 3 \| 0.8570 \| 0.8639 \| 0.8605 \|
	\| 4 \| 0.8584 \| 0.8449 \| 0.8516 \|
	\| 5 \| 0.9343 \| 0.9667 \| 0.9502 \|
	\| 6 \| 0.9492 \| 0.9696 \| 0.9593 \|
	\| 7 \| 0.9234 \| 0.9462 \| 0.9347 \|
	\| 8 \| 0.9644 \| 0.9924 \| 0.9782 \|
	\| 9 \| 0.9481 \| 0.9377 \| 0.9428 \|
	\| 10 \| 0.9250 \| 0.9267 \| 0.9259 \|
	\| 11 \| 0.9653 \| 0.9914 \| 0.9782 \|
	\| 12 \| 0.9948 \| 0.9976 \| 0.9962 \|
	\| 13 \| 0.9474 \| 0.9676 \| 0.9574 \|
	\| 14 \| 0.8926 \| 0.8853 \| 0.8889 \|
	\| 15 \| 0.9501 \| 0.9515 \| 0.9508 \|
	\| 16 \| 0.9976 \| 0.9990 \| 0.9983 \|
	\| 17 \| 0.9114 \| 0.8716 \| 0.8911 \|
	\| 18 \| 0.7825 \| 0.7821 \| 0.7823 \|
	\| 19 \| 0.9962 \| 0.9990 \| 0.9976 \|
	\| 20 \| 0.9516 \| 0.9638 \| 0.9577 \|
	\| 21 \| 0.9953 \| 0.9995 \| 0.9974 \|
	\| 22 \| 0.9630 \| 0.9791 \| 0.9710 \|
	\| 23 \| 0.9134 \| 0.9134 \| 0.9134 \|
	\| 24 \| 0.9753 \| 0.9948 \| 0.9849 \|
	\| 25 \| 0.7374 \| 0.7469 \| 0.7421 \|
	\| 26 \| 0.7864 \| 0.7583 \| 0.7721 \|
	\| 27 \| 0.6000 \| 0.5666 \| 0.5828 \|
	\| 28 \| 0.7369 \| 0.6836 \| 0.7093 \|
	\| 29 \| 0.8066 \| 0.7222 \| 0.7620 \|
	\| 30 \| 0.9116 \| 0.9225 \| 0.9170 \|
	\| 31 \| 0.9108 \| 0.9524 \| 0.9312 \|
	\| 32 \| 0.9611 \| 0.9634 \| 0.9622 \|
	\| 33 \| 0.9592 \| 0.9724 \| 0.9657 \|
	\| 34 \| 0.9700 \| 0.9686 \| 0.9693 \|
	\| 35 \| 0.9459 \| 0.9734 \| 0.9594 \|
	\| 36 \| 0.9359 \| 0.9857 \| 0.9601 \|
	\| 37 \| 0.9986 \| 0.9986 \| 0.9986 \|
	\| 38 \| 0.9943 \| 0.9990 \| 0.9967 \|
	\| 39 \| 0.9990 \| 1.0000 \| 0.9995 \|
	\| 40 \| 0.9905 \| 0.9914 \| 0.9910 \|
	\| 41 \| 0.9981 \| 0.9948 \| 0.9964 \|
	\| 42 \| 0.9929 \| 0.9986 \| 0.9957 \|
	\| weighted avg \| 0.9188 \| 0.9202 \| 0.9193 \|

	### How to Use
	```python
	from transformers import pipeline

	# Load the pre-trained model and tokenizer
	model = 'borisn70/bert-43-multilabel-emotion-detection'
	tokenizer = 'borisn70/bert-43-multilabel-emotion-detection'

	# Create a pipeline for sentiment analysis
	nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

	# Test the model with a sentence
	result = nlp("I feel great about this!")

	# Print the result
	print(result)
	```

	### Limitations and Biases
	- The model's performance can vary significantly across different emotional categories, especially those with less representation in the training data.
	- Users should be cautious about potential biases in the training data, which may be reflected in the model's predictions.

	### Contact

	If you have any questions, feedback, or would like to report any issues regarding the model, please feel free to reach out.

	- Email: [[email protected]](mailto:[email protected])
	- LinkedIn: [Boris Atayan](https://www.linkedin.com/in/borisatayan/)