samidh commited on
Commit
9545645
·
verified ·
1 Parent(s): 189c02e

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +9 -6
app.py CHANGED
@@ -58,13 +58,13 @@ This policy is designed to determine whether or not content is hate speech.
58
 
59
  ## Definition of Terms
60
 
61
- - Protected Classes: Individuals or groups distinguished by race (e.g. Black, White), ethnicity (e.g. Hispanic), nationality (e.g. Canadian, Argentinian), caste (e.g. Brahmins, Kshatriyas, Vaishyas), immigration status (e.g. Immigrants), religion (e.g. Catholics, Jews), disability (e.g. blind people, deaf people), disease status (e.g. people with AIDS), sexual orientation (e.g. gays, bisexuals), transgender status (e.g. trans people, cis people), gender identity (e.g. women, men), gender expression (e.g. masculine, feminine), or sex (e.g. male, female).
62
  - Derogatory Slur: Any term or phrase used pejoratively or contemptuously towards a member or members of a protected class, including those historically reclaimed but used in a derogatory context.
63
- - Non-Protected Classes: Categories defined by, political party membership (e.g. Democrats, Tories), political beliefs (e.g. fascists, communists), hobbies (e.g. games, knitters), criminal history (e.g. criminals, convicts), or profession (e.g. doctors, farmers).
64
 
65
  ## Interpretation of Language:
66
 
67
- - Terminology or phrasing that can be construed in various ways IS hate speech UNLESS context clearly indicates it unambiguously IS NOT hate speech. So, for example, attacks on a religion as an institution (e.g. "Hinduism is a garbage religion") SHOULD be understood as attacks on the followers of that religion.
68
 
69
  ## Definition of Labels
70
 
@@ -85,9 +85,12 @@ This policy is designed to determine whether or not content is hate speech.
85
 
86
  - Attacks on Non-Protected Classes: Content that attacks or criticizes individuals or groups identified by their membership in a Non-Protected Class, EVEN if that attack is violent, threatening, or otherwise hateful (e.g. "Criminals should all be rounded up and shot!").
87
  - Criticism of Beliefs and Institutions: Constructive critique or discussion of political ideologies, religious doctrines, or institutions without resorting to hate speech or targeting individuals or groups identified by their membership in a protected class.
 
 
88
  - Neutrally Reporting Historical Events: Neutrally and descriptively reporting or discussion of factual events in the past that could be construed as negative about individuals or groups identified by their membership in a protected class.
89
  - Pushing Back on Hateful Language: Content where the writer pushes back on, condemns, questions, criticizes, or mocks a different person's hateful language or ideas.
90
- - Artistic and Educational Content: Content with legitimate artistic, educational, or documentary value that discusses or portrays issues related to hate speech in a context clearly aimed at enlightening or informing without promoting hate.
 
91
  """
92
 
93
  DEFAULT_CONTENT = "Put your content sample here."
@@ -113,9 +116,9 @@ def predict(content, policy):
113
  decoded_output = tokenizer.decode([predicted_token_id])
114
 
115
  if decoded_output == '1':
116
- return f'True (i.e., VIOLATING)'
117
  else:
118
- return f'False (i.e., NON-violating)'
119
 
120
 
121
  # Create the interface
 
58
 
59
  ## Definition of Terms
60
 
61
+ - Protected Classes: Individuals or groups distinguished by race (e.g. Black, White), ethnicity (e.g. Hispanic, Jewish), nationality (e.g. Canadian, Argentinian), caste (e.g. Brahmins, Kshatriyas, Vaishyas), immigration status (e.g. Immigrants), religion (e.g. Catholics, Jews), disability (e.g. blind people, deaf people), disease status (e.g. people with AIDS), sexual orientation (e.g. gays, bisexuals), transgender status (e.g. trans people, cis people), gender identity (e.g. women, men), gender expression (e.g. masculine, feminine), or sex (e.g. male, female).
62
  - Derogatory Slur: Any term or phrase used pejoratively or contemptuously towards a member or members of a protected class, including those historically reclaimed but used in a derogatory context.
63
+ - Non-Protected Classes: Categories defined by, political party membership (e.g. Democrats, Tories), political beliefs (e.g. fascists, communists), hobbies (e.g. games, knitters), criminal history (e.g. criminals, convicts), or profession (e.g. doctors, farmers), or weight or body size (e.g. fat people, thin people).
64
 
65
  ## Interpretation of Language:
66
 
67
+ - In interpreting this policy, ambiguous phrases should be understood based on their most reasonable contextual meaning. For instance, criticisms directed at a religious system (e.g. "Buddhism is a flawed ideology") should NOT be construed as attacks against the adherents of that faith.
68
 
69
  ## Definition of Labels
70
 
 
85
 
86
  - Attacks on Non-Protected Classes: Content that attacks or criticizes individuals or groups identified by their membership in a Non-Protected Class, EVEN if that attack is violent, threatening, or otherwise hateful (e.g. "Criminals should all be rounded up and shot!").
87
  - Criticism of Beliefs and Institutions: Constructive critique or discussion of political ideologies, religious doctrines, or institutions without resorting to hate speech or targeting individuals or groups identified by their membership in a protected class.
88
+ - Attacking Leaders: Content that critiques, mocks, or insults the leaders of religions, leaders of religious institutions, or religious prophets or deities, BUT does not contain the singular or plural noun for followers of that religion.
89
+ - Condemning Violent Extremism: Content that condemns, mocks, insults, dehumanizes, or calls for violence against terrorist organizations and violent hate groups, or their members.
90
  - Neutrally Reporting Historical Events: Neutrally and descriptively reporting or discussion of factual events in the past that could be construed as negative about individuals or groups identified by their membership in a protected class.
91
  - Pushing Back on Hateful Language: Content where the writer pushes back on, condemns, questions, criticizes, or mocks a different person's hateful language or ideas.
92
+ - Disease Discussion: Content in which the author discusses diseases without direct references to people with the disease.
93
+ - Quoting Hateful Language: Content in which the author quotes someone else's hateful language or ideas while discussing, explaining, or neutrally factually presenting those ideas.
94
  """
95
 
96
  DEFAULT_CONTENT = "Put your content sample here."
 
116
  decoded_output = tokenizer.decode([predicted_token_id])
117
 
118
  if decoded_output == '1':
119
+ return f'True (i.e., Meets Label Criteria)'
120
  else:
121
+ return f'False (i.e., Does NOT Meet Label Criteria)'
122
 
123
 
124
  # Create the interface