Model Card for AngryBERT
This model is a finetuning of pdelobelle/robbert-v2-dutch-base for the classificaion of text as angry or non-angry.
Model Details
Model Description
This model is a finetuning of pdelobelle/robbert-v2-dutch-base on a selection of paragraphs mined from the Dutch novel "Ik ga leven" by Lale Gül. (Lale Gül,Ik ga leven. 2021. Amsterdam: Prometheus. ISBN 978-9044646870. An English translation of the novel exists: Lale Gül, I Will Live. 2023. London: Little, Brown Book Group. ISBN 978-1408716809). The intention of the model is to be able to classify sentences and paragraphs of the book as angry or non-angry. A selection of paragraph was annotated by two individual annotators for angriness (55 paragraphs, Cohen's Kappa of 0.48).
- Developed by: Joris J. van Zundert and Julia Neugarten
- Funded by [optional]: Huygens Institute
- Shared by [optional]: {{ shared_by | default("[More Information Needed]", true)}}
- Model type: text classification
- Language(s) (NLP): Dutch
- License: MIT
- Finetuned from model [optional]: robbert-v2-dutch-base
Uses
This model should really only be used in the context of research towards the full text of the Dutch version of Lale Güls "Ik ga leven". Any other application is disadvised as the model has only been fine tuned on this specific novel. All results obtained with this model otherwise should be treated witht the greatest care and skeptism.
Bias, Risks, and Limitations
The model is biased towards the language of Lale Gül in her novel "Ik ga leven". This may include skew towards explicit and aggressive language.
Recommendations
This model should really only be used in the context of research towards the full text of the Dutch version of Lale Güls "Ik ga leven". Any other application is disadvised as the model has only been fine tuned on this specific novel. All results obtained with this model otherwise should be treated witht the greatest care and skeptism.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import RobertaTokenizer, RobertaForSequenceClassification
from transformers import TextClassificationPipeline
model = RobertaForSequenceClassification.from_pretrained( "./model/angryBERT-v1" )
tokenizer = RobertaTokenizer.from_pretrained( "./model/angryBERT-v1" )
# Just cheking if the model works
# LABEL_1 means angry
# LABEL-0 means non-angry
input_text = "Ik was kwaad." # en.: "I was angry."
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
pipe( input_text )
# =>
# [[{'label': 'LABEL_0', 'score': 0.026506226509809494},
# {'label': 'LABEL_1', 'score': 0.9734938144683838}]]
Training Details
Training Data
All paragraphs of Lale Gül's novel (Dutch) Ik ga leven. Paratext (copyright, title page, etc.) removed, also removed the section of poems at the back of the text.
Training Procedure
Trained on 55 paragraphs labeled as either angry (1) or non_angry (0).
Model Card Authors [optional]
Joris J. van Zundert, Julia Neugarten
Model Card Contact
Model tree for jorisvanzundert/AngryBERT
Base model
pdelobelle/robbert-v2-dutch-base