Ezi commited on
Commit
8aeb7aa
·
1 Parent(s): a1323e5

Model Card

Browse files

Hi! This PR has a preliminary model card, based on the format we are using as part of our effort to standardize model cards at Hugging Face. Feel free to merge if you are ok with the changes! (cc

@Marissa


@Meg


@Nazneen
)

Files changed (1) hide show
  1. README.md +74 -11
README.md CHANGED
@@ -5,15 +5,40 @@ datasets:
5
  - hatexplain
6
  ---
7
 
8
- The model is used for classifying a text as Abusive (Hatespeech and Offensive) or Normal. The model is trained using data from Gab and Twitter and Human Rationales were included as part of the training data to boost the performance. The model also has a rationale predictor head that can predict the rationales given an abusive sentence.
9
-
10
- The dataset and models are available here: https://github.com/punyajoy/HateXplain
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  **Details of usage**
13
 
14
  Please use the **Model_Rational_Label** class inside [models.py](models.py) to load the models. The default prediction in this hosted inference API may be wrong due to the use of different class initialisations.
15
 
16
- ~~~
17
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
18
  ### from models.py
19
  from models import *
@@ -21,15 +46,54 @@ tokenizer = AutoTokenizer.from_pretrained("Hate-speech-CNERG/bert-base-uncased-h
21
  model = Model_Rational_Label.from_pretrained("Hate-speech-CNERG/bert-base-uncased-hatexplain-rationale-two")
22
  inputs = tokenizer('He is a great guy", return_tensors="pt")
23
  prediction_logits, _ = model(input_ids=inputs['input_ids'],attention_mask=inputs['attention_mask'])
24
- ~~~
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- **For more details about our paper**
27
 
28
- Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, and Animesh Mukherjee "[HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection)". Accepted at AAAI 2021.
29
 
30
- ***Please cite our paper in any published work that uses any of these resources.***
31
- ~~~
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  @article{mathew2020hatexplain,
34
  title={HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection},
35
  author={Mathew, Binny and Saha, Punyajoy and Yimam, Seid Muhie and Biemann, Chris and Goyal, Pawan and Mukherjee, Animesh},
@@ -37,8 +101,7 @@ Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, and A
37
  year={2020}
38
 
39
  }
40
-
41
- ~~~
42
 
43
 
44
 
 
5
  - hatexplain
6
  ---
7
 
8
+
9
+ ## Table of Contents
10
+ - [Model Details](#model-details)
11
+ - [How to Get Started With the Model](#how-to-get-started-with-the-model)
12
+ - [Uses](#uses)
13
+ - [Risks, Limitations and Biases](#risks-limitations-and-biases)
14
+ - [Training](#training)
15
+ - [Evaluation](#evaluation)
16
+ - [Technical Specifications](#technical-specifications)
17
+ - [Citation Information](#citation-information)
18
+
19
+ ## Model Details
20
+ **Model Description:**
21
+ The model is used for classifying a text as Abusive (Hatespeech and Offensive) or Normal. The model is trained using data from Gab and Twitter and Human Rationales were included as part of the training data to boost the performance. The model also has a rationale predictor head that can predict the rationales given an abusive sentence
22
+
23
+
24
+ - **Developed by:** Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, and Animesh Mukherjee
25
+ - **Model Type:** Text Classification
26
+ - **Language(s):** English
27
+ - **License:** Apache-2.0
28
+ - **Parent Model:** See the [BERT base uncased model](https://huggingface.co/bert-base-uncased) for more information about the BERT base model.
29
+ - **Resources for more information:**
30
+ - [Research Paper](https://arxiv.org/abs/2012.10289) Accepted at AAAI 2021.
31
+ - [GitHub Repo with datatsets and models](https://github.com/punyajoy/HateXplain)
32
+
33
+
34
+
35
+ ## How to Get Started with the Model
36
 
37
  **Details of usage**
38
 
39
  Please use the **Model_Rational_Label** class inside [models.py](models.py) to load the models. The default prediction in this hosted inference API may be wrong due to the use of different class initialisations.
40
 
41
+ ```python
42
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
43
  ### from models.py
44
  from models import *
 
46
  model = Model_Rational_Label.from_pretrained("Hate-speech-CNERG/bert-base-uncased-hatexplain-rationale-two")
47
  inputs = tokenizer('He is a great guy", return_tensors="pt")
48
  prediction_logits, _ = model(input_ids=inputs['input_ids'],attention_mask=inputs['attention_mask'])
49
+ ```
50
+
51
+ ## Uses
52
+
53
+ #### Direct Use
54
+
55
+ This model can be used for Text Classification
56
+
57
+
58
+ #### Downstream Use
59
+
60
+ [More information needed]
61
+
62
+ #### Misuse and Out-of-scope Use
63
+
64
+ The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
65
 
66
+ ## Risks, Limitations and Biases
67
 
68
+ **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**
69
 
70
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
 
71
 
72
+ (and if you can generate an example of a biased prediction, also something like this):
73
+
74
+ Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For ![example:](https://github.com/hate-alert/HateXplain/blob/master/Figures/dataset_example.png)
75
+
76
+ The model author's also note in their HateXplain paper that they
77
+ > *have not considered any external context such as profile bio, user gender, history of posts etc., which might be helpful in the classification task. Also, in this work we have focused on the English language. It does not consider multilingual hate speech into account.*
78
+
79
+
80
+ #### Training Procedure
81
+
82
+ ##### Preprocessing
83
+
84
+ The authors detail their preprocessing procedure in the [Github repository](https://github.com/hate-alert/HateXplain/tree/master/Preprocess)
85
+
86
+
87
+ ## Evaluation
88
+ The mode authors detail the Hidden layer size and attention for the HateXplain fien tuned models in the [associated paper](https://arxiv.org/pdf/2012.10289.pdf)
89
+
90
+ #### Results
91
+
92
+ The model authors both in their paper and in the git repository provide the illustrative output of the BERT - HateXplain in comparison to BERT and and other HateXplain fine tuned ![models]( https://github.com/hate-alert/HateXplain/blob/master/Figures/bias-subgroup.pdf)
93
+
94
+ ## Citation Information
95
+
96
+ ```bibtex
97
  @article{mathew2020hatexplain,
98
  title={HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection},
99
  author={Mathew, Binny and Saha, Punyajoy and Yimam, Seid Muhie and Biemann, Chris and Goyal, Pawan and Mukherjee, Animesh},
 
101
  year={2020}
102
 
103
  }
104
+ ```
 
105
 
106
 
107