Model Card for AngryBERT

This model is a finetuning of pdelobelle/robbert-v2-dutch-base for the classificaion of text as angry or non-angry.

Model Details

Model Description

This model is a finetuning of pdelobelle/robbert-v2-dutch-base on a selection of paragraphs mined from the Dutch novel "Ik ga leven" by Lale Gül. (Lale Gül,Ik ga leven. 2021. Amsterdam: Prometheus. ISBN 978-9044646870. An English translation of the novel exists: Lale Gül, I Will Live. 2023. London: Little, Brown Book Group. ISBN 978-1408716809). The intention of the model is to be able to classify sentences and paragraphs of the book as angry or non-angry. A selection of paragraph was annotated by two individual annotators for angriness (55 paragraphs, Cohen's Kappa of 0.48).

Developed by: Joris J. van Zundert and Julia Neugarten
Funded by [optional]: Huygens Institute
Shared by [optional]: {{ shared_by | default("[More Information Needed]", true)}}
Model type: text classification
Language(s) (NLP): Dutch
License: MIT
Finetuned from model [optional]: robbert-v2-dutch-base

Uses

This model should really only be used in the context of research towards the full text of the Dutch version of Lale Güls "Ik ga leven". Any other application is disadvised as the model has only been fine tuned on this specific novel. All results obtained with this model otherwise should be treated witht the greatest care and skeptism.

Bias, Risks, and Limitations

The model is biased towards the language of Lale Gül in her novel "Ik ga leven". This may include skew towards explicit and aggressive language.

Recommendations

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import RobertaTokenizer, RobertaForSequenceClassification
from transformers import TextClassificationPipeline

model = RobertaForSequenceClassification.from_pretrained( "./model/angryBERT-v1" )
tokenizer = RobertaTokenizer.from_pretrained( "./model/angryBERT-v1" )

# Just cheking if the model works
# LABEL_1 means angry
# LABEL-0 means non-angry

input_text = "Ik was kwaad."  # en.: "I was angry."

pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
pipe( input_text )

#  =>
#  [[{'label': 'LABEL_0', 'score': 0.026506226509809494},
#    {'label': 'LABEL_1', 'score': 0.9734938144683838}]]

Training Details

Training Data

All paragraphs of Lale Gül's novel (Dutch) Ik ga leven. Paratext (copyright, title page, etc.) removed, also removed the section of poems at the back of the text.

Training Procedure

Trained on 55 paragraphs labeled as either angry (1) or non_angry (0).

Model Card Authors [optional]

Joris J. van Zundert, Julia Neugarten

Model Card Contact

Joris J. van Zundert

jorisvanzundert
/

AngryBERT