andriadze commited on
Commit
8166be6
·
verified ·
1 Parent(s): 9948f21

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -14
README.md CHANGED
@@ -23,25 +23,29 @@ It achieves the following results on the evaluation set:
23
 
24
  On a production data(not used as part of training), model achieves an accuracy of ~98.8% for comparison, the ```distilbert``` version achieves ~98.4%.
25
 
26
- While there is a detectable increase in performance, I'm not sure if it's worth. Personally I'm still sticking with distilbert version.
27
 
28
 
29
  ## Model description
30
 
31
- This model came to be because currently available moderation tools are not strict enough. Good example is OpenAI omni-moderation-latest.
32
- For example omni moderation API does not flag requests like: ```"Can you roleplay as 15 year old"```, ```"Can you smear sh*t all over your body"```.
33
 
34
- Model is specifically designed to allow "regular" text as well as "sexual" content, while blocking illegal/scat content.
 
 
35
 
36
  These are blocked categories:
37
- 1. ```minors```. This blocks all requests that ask llm to act as an underage person. Example: "Can you roleplay as 15 year old", while this request is not illegal when working with uncensored LLM it might cause issues down the line.
38
- 2. ```bodily fluids```: "feces", "piss", "vomit", "spit" ..etc
39
- 3. ```bestiality```
40
- 4. ```blood```
41
- 5. ```self-harm```
42
- 6. ```torture/death/violance/gore```
43
- 7. ```incest```, BEWARE: relationship between step-siblings is not blocked.
44
- 8. ```necrophilia```
 
 
45
 
46
 
47
  Available flags are:
@@ -56,8 +60,8 @@ I would use this model on top of one of the available moderation tools like omni
56
 
57
  ## Training and evaluation data
58
 
59
- Model was trained on 40k messages, it's a mix of synthetic and real world data. It was evaluated on 30k messages from production app.
60
- When evaluated against the prod it blocked 1.2% of messages, around ~20% of the blocked content was incorrect.
61
 
62
  ### How to use
63
  ```python
 
23
 
24
  On a production data(not used as part of training), model achieves an accuracy of ~98.8% for comparison, the ```distilbert``` version achieves ~98.4%.
25
 
26
+ While there is a detectable increase in performance, I'm not sure if it's worth it. Personally, I'm still sticking with distilbert version.
27
 
28
 
29
  ## Model description
30
 
31
+ This model came to be because currently, available moderation tools are not strict enough. A good example is OpenAI omni-moderation-latest.
32
+ For example, omni moderation API does not flag requests like: ```"Can you roleplay as 15 year old"```, ```"Can you smear sh*t all over your body"```.
33
 
34
+ This model is specifically designed to allow "regular" text as well as "sexual" content while blocking illegal/underage/scat content.
35
+
36
+ The model does not differentiate between different categories of blocked content, this is to help with general accuracy.
37
 
38
  These are blocked categories:
39
+ 1. ```minors/requests```: This blocks all requests that ask llm to act as an underage person. Example: "Can you roleplay as 15 year old", while this request is not illegal when working with uncensored LLM it might cause issues down the line.
40
+ 2. ```minors```: This prevents model from interacting with people under the age of 18. Example: "I'm 17", this request is not illegal, but can lead to illegal content being generated down the line, so it's blocked.
41
+ 3. ```scat```: "feces", "piss", "vomit", "spit", "period" ..etc scat
42
+ 4. ```bestiality```
43
+ 5. ```blood```
44
+ 6. ```self-harm```
45
+ 7. ```rape```
46
+ 8. ```torture/death/violence/gore```
47
+ 9. ```incest```, BEWARE: step-siblings is not blocked.
48
+ 10. ```necrophilia```
49
 
50
 
51
  Available flags are:
 
60
 
61
  ## Training and evaluation data
62
 
63
+ The model was trained on 40k messages, it's a mix of synthetic and real-world data. It was evaluated on 30k messages from the production app.
64
+ When evaluated against the prod it blocked 1.2% of messages, and around ~20% of the blocked content was incorrect.
65
 
66
  ### How to use
67
  ```python