samidh commited on
Commit
4785c60
·
verified ·
1 Parent(s): 323a0f2

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +30 -0
app.py CHANGED
@@ -50,21 +50,32 @@ ANSWER
50
  """
51
 
52
  DEFAULT_POLICY = """# Policy
 
53
  ## Overview
 
54
  This policy guides the classification of content related to expressions that may be sensitive or controversial, aiming to foster a community where open dialogue is encouraged while maintaining respect for all individuals.
 
55
  ## Definition of Terms
 
56
  - Protected Classes: Individuals or groups of individuals defined by sharing a race, ethnicity, religion, disability, sex, gender identity, or sexual orientation. The scope is narrowed to focus on characteristics commonly recognized in anti-discrimination laws. Religions themselves, as concepts ARE NOT protected - only their members are.
57
  - Non-Protected Classes: Groups defined by other characteristics, such as age, socioeconomic status, and education, which are typically excluded partially or entirely from anti-discrimination laws.
 
58
  ## Interpretation of Language
 
59
  ## Definition of Labels
 
60
  ### (HS): Hate Speech
 
61
  #### Includes
 
62
  - Explicit Violence Advocacy: Direct and clear threats of or praise for violence or harm against individuals or groups based on their membership in a particular protected class. Stating an intention to defend against aggression DOES NOT qualify
63
  - Intentional Dehumanization: Statements that clearly depict individuals or groups as inherently ""other"", alien, animalistic, unintelligent, immoral, unclean, or less-than-fully-human based on their membership in a particular protected class in a way that justifies harm or discrimination.
64
  - Targeted Use of Derogatory Slurs: Targeting another person or group of people using a one-word name for a particular protected class that has an inherent negative connotation (e.g. Nigger, Kike, Cunt, Retard). Multi-word terms are never slurs.
65
  - Explicit Discrimination Advocacy: Direct and clear calls for exclusion, segregation, or discrimination against individuals or groups based on their membership in a particular protected class, with a clear intent to promote inequality.
66
  - Direct Hateful Insults: Content that directly addresses another person or group of people the second person (e.g. ""You over there"") and insults them based on their membership in a particular protected class
 
67
  #### Excludes
 
68
  - Artistic and Educational Content: Expressions intended for artistic, educational, or documentary purposes that discuss sensitive topics but do not advocate for violence or discrimination against individuals or groups based on their membership in a particular protected class.
69
  - Political and Social Commentary: Commentary on political issues, social issues, and political ideologies that does not directly incite violence or discrimination against individuals or groups based on their membership in a particular protected class.
70
  - Rebutting Hateful Language: Content that rebuts, condemns, questions, criticizes, or mocks a different person's hateful language or ideas OR that insults the person advocating those hateful
@@ -93,6 +104,25 @@ iface = gr.Interface(
93
  outputs="label",
94
  title="CoPE Alpha Preview",
95
  description="See if the given content violates your given policy."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
  )
97
 
98
  # Launch the app
 
50
  """
51
 
52
  DEFAULT_POLICY = """# Policy
53
+
54
  ## Overview
55
+
56
  This policy guides the classification of content related to expressions that may be sensitive or controversial, aiming to foster a community where open dialogue is encouraged while maintaining respect for all individuals.
57
+
58
  ## Definition of Terms
59
+
60
  - Protected Classes: Individuals or groups of individuals defined by sharing a race, ethnicity, religion, disability, sex, gender identity, or sexual orientation. The scope is narrowed to focus on characteristics commonly recognized in anti-discrimination laws. Religions themselves, as concepts ARE NOT protected - only their members are.
61
  - Non-Protected Classes: Groups defined by other characteristics, such as age, socioeconomic status, and education, which are typically excluded partially or entirely from anti-discrimination laws.
62
+
63
  ## Interpretation of Language
64
+
65
  ## Definition of Labels
66
+
67
  ### (HS): Hate Speech
68
+
69
  #### Includes
70
+
71
  - Explicit Violence Advocacy: Direct and clear threats of or praise for violence or harm against individuals or groups based on their membership in a particular protected class. Stating an intention to defend against aggression DOES NOT qualify
72
  - Intentional Dehumanization: Statements that clearly depict individuals or groups as inherently ""other"", alien, animalistic, unintelligent, immoral, unclean, or less-than-fully-human based on their membership in a particular protected class in a way that justifies harm or discrimination.
73
  - Targeted Use of Derogatory Slurs: Targeting another person or group of people using a one-word name for a particular protected class that has an inherent negative connotation (e.g. Nigger, Kike, Cunt, Retard). Multi-word terms are never slurs.
74
  - Explicit Discrimination Advocacy: Direct and clear calls for exclusion, segregation, or discrimination against individuals or groups based on their membership in a particular protected class, with a clear intent to promote inequality.
75
  - Direct Hateful Insults: Content that directly addresses another person or group of people the second person (e.g. ""You over there"") and insults them based on their membership in a particular protected class
76
+
77
  #### Excludes
78
+
79
  - Artistic and Educational Content: Expressions intended for artistic, educational, or documentary purposes that discuss sensitive topics but do not advocate for violence or discrimination against individuals or groups based on their membership in a particular protected class.
80
  - Political and Social Commentary: Commentary on political issues, social issues, and political ideologies that does not directly incite violence or discrimination against individuals or groups based on their membership in a particular protected class.
81
  - Rebutting Hateful Language: Content that rebuts, condemns, questions, criticizes, or mocks a different person's hateful language or ideas OR that insults the person advocating those hateful
 
104
  outputs="label",
105
  title="CoPE Alpha Preview",
106
  description="See if the given content violates your given policy."
107
+ article="""
108
+ ## About CoPE
109
+
110
+ CoPE (the COntent Policy Evaluation engine) is a small language model capable of accurate content policy labeling. This is a *preview* of our alpha release and is strictly for *research* purposes. This should *NOT* be used for any production use cases.
111
+
112
+ ### How to Use:
113
+
114
+ 1. Enter your content in the "Content" box.
115
+ 2. Specify your policy in the "Policy" box.
116
+ 3. Click "Submit" to see the results.
117
+
118
+ *Note*: Inference times are *very slow* (30-45 seconds) since this is built on dev infra and not yet optimized for live systems. Please be patient!
119
+
120
+ ### Tips:
121
+
122
+ - [Give us feedback](https://example.com) to help us improve
123
+ - Read our FAQ to learn more about CoPE
124
+ - Join our mailing list to keep in touch
125
+ """
126
  )
127
 
128
  # Launch the app