Spaces:

zentropi-ai
/

cope-demo

Running on L4

App Files Files Community

samidh commited on Jul 24, 2024

Commit

4785c60

verified ·

1 Parent(s): 323a0f2

Update app.py

Browse files

Files changed (1) hide show

app.py +30 -0

app.py CHANGED Viewed

@@ -50,21 +50,32 @@ ANSWER
 """
 DEFAULT_POLICY = """# Policy
 ## Overview
 This policy guides the classification of content related to expressions that may be sensitive or controversial, aiming to foster a community where open dialogue is encouraged while maintaining respect for all individuals.
 ## Definition of Terms
 - Protected Classes: Individuals or groups of individuals defined by sharing a race, ethnicity, religion, disability, sex, gender identity, or sexual orientation. The scope is narrowed to focus on characteristics commonly recognized in anti-discrimination laws. Religions themselves, as concepts ARE NOT protected - only their members are.
 - Non-Protected Classes: Groups defined by other characteristics, such as age, socioeconomic status, and education, which are typically excluded partially or entirely from anti-discrimination laws.
 ## Interpretation of Language
 ## Definition of Labels
 ### (HS): Hate Speech
 #### Includes
 - Explicit Violence Advocacy: Direct and clear threats of or praise for violence or harm against individuals or groups based on their membership in a particular protected class. Stating an intention to defend against aggression DOES NOT qualify
 - Intentional Dehumanization: Statements that clearly depict individuals or groups as inherently ""other"", alien, animalistic, unintelligent, immoral, unclean, or less-than-fully-human based on their membership in a particular protected class in a way that justifies harm or discrimination.
 - Targeted Use of Derogatory Slurs: Targeting another person or group of people using a one-word name for a particular protected class that has an inherent negative connotation (e.g. Nigger, Kike, Cunt, Retard). Multi-word terms are never slurs.
 - Explicit Discrimination Advocacy: Direct and clear calls for exclusion, segregation, or discrimination against individuals or groups based on their membership in a particular protected class, with a clear intent to promote inequality.
 - Direct Hateful Insults: Content that directly addresses another person or group of people the second person (e.g. ""You over there"") and insults them based on their membership in a particular protected class
 #### Excludes
 - Artistic and Educational Content: Expressions intended for artistic, educational, or documentary purposes that discuss sensitive topics but do not advocate for violence or discrimination against individuals or groups based on their membership in a particular protected class.
 - Political and Social Commentary: Commentary on political issues, social issues, and political ideologies that does not directly incite violence or discrimination against individuals or groups based on their membership in a particular protected class.
 - Rebutting Hateful Language: Content that rebuts, condemns, questions, criticizes, or mocks a different person's hateful language or ideas OR that insults the person advocating those hateful
@@ -93,6 +104,25 @@ iface = gr.Interface(
     outputs="label",
     title="CoPE Alpha Preview",
     description="See if the given content violates your given policy."
 )
 # Launch the app

 """
 DEFAULT_POLICY = """# Policy
 ## Overview
 This policy guides the classification of content related to expressions that may be sensitive or controversial, aiming to foster a community where open dialogue is encouraged while maintaining respect for all individuals.
 ## Definition of Terms
 - Protected Classes: Individuals or groups of individuals defined by sharing a race, ethnicity, religion, disability, sex, gender identity, or sexual orientation. The scope is narrowed to focus on characteristics commonly recognized in anti-discrimination laws. Religions themselves, as concepts ARE NOT protected - only their members are.
 - Non-Protected Classes: Groups defined by other characteristics, such as age, socioeconomic status, and education, which are typically excluded partially or entirely from anti-discrimination laws.
 ## Interpretation of Language
 ## Definition of Labels
 ### (HS): Hate Speech
 #### Includes
 - Explicit Violence Advocacy: Direct and clear threats of or praise for violence or harm against individuals or groups based on their membership in a particular protected class. Stating an intention to defend against aggression DOES NOT qualify
 - Intentional Dehumanization: Statements that clearly depict individuals or groups as inherently ""other"", alien, animalistic, unintelligent, immoral, unclean, or less-than-fully-human based on their membership in a particular protected class in a way that justifies harm or discrimination.
 - Targeted Use of Derogatory Slurs: Targeting another person or group of people using a one-word name for a particular protected class that has an inherent negative connotation (e.g. Nigger, Kike, Cunt, Retard). Multi-word terms are never slurs.
 - Explicit Discrimination Advocacy: Direct and clear calls for exclusion, segregation, or discrimination against individuals or groups based on their membership in a particular protected class, with a clear intent to promote inequality.
 - Direct Hateful Insults: Content that directly addresses another person or group of people the second person (e.g. ""You over there"") and insults them based on their membership in a particular protected class
 #### Excludes
 - Artistic and Educational Content: Expressions intended for artistic, educational, or documentary purposes that discuss sensitive topics but do not advocate for violence or discrimination against individuals or groups based on their membership in a particular protected class.
 - Political and Social Commentary: Commentary on political issues, social issues, and political ideologies that does not directly incite violence or discrimination against individuals or groups based on their membership in a particular protected class.
 - Rebutting Hateful Language: Content that rebuts, condemns, questions, criticizes, or mocks a different person's hateful language or ideas OR that insults the person advocating those hateful
     outputs="label",
     title="CoPE Alpha Preview",
     description="See if the given content violates your given policy."
+    article="""
+    ## About CoPE
+    CoPE (the COntent Policy Evaluation engine) is a small language model capable of accurate content policy labeling. This is a *preview* of our alpha release and is strictly for *research* purposes. This should *NOT* be used for any production use cases.
+    ### How to Use:
+    1. Enter your content in the "Content" box.
+    2. Specify your policy in the "Policy" box.
+    3. Click "Submit" to see the results.
+    *Note*: Inference times are *very slow* (30-45 seconds) since this is built on dev infra and not yet optimized for live systems. Please be patient!
+    ### Tips:
+    - [Give us feedback](https://example.com) to help us improve
+    - Read our FAQ to learn more about CoPE
+    - Join our mailing list to keep in touch
+    """
 )
 # Launch the app