CodeHima commited on
Commit
58438a5
·
verified ·
1 Parent(s): 57579c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -2
README.md CHANGED
@@ -1,3 +1,118 @@
1
- # Tos-Roberta
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- This model is trained to classify clauses in Terms of Service (ToS) documents using RoBERTa-large.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ widget:
6
+ - text: "You have the right to use CommunityConnect for its intended purpose of connecting with others, sharing content responsibly, and engaging in constructive dialogue. You are responsible for the content you post and must respect the rights and privacy of others."
7
+ example_title: "Fair Clause"
8
+ - text: " We reserve the right to suspend, terminate, or restrict your access to the platform at any time and for any reason, without prior notice or explanation. This includes but is not limited to violations of our community guidelines or terms of service, as determined solely by ConnectWorld."
9
+ example_title: "Unfair Clause"
10
+ metrics:
11
+ - accuracy
12
+ - precision
13
+ - f1
14
+ - recall
15
+ library_name: transformers
16
+ pipeline_tag: text-classification
17
+ ---
18
+ # Tos-Roberta: Terms of Service Fairness Classifier
19
 
20
+ ## Model Description
21
+
22
+ Tos-Roberta is a fine-tuned RoBERTa-large model designed to classify clauses in Terms of Service (ToS) documents based on their fairness level. The model categorizes clauses into three classes: clearly fair, potentially unfair, and clearly unfair.
23
+
24
+ ## Key Features
25
+
26
+ - Based on the RoBERTa-large architecture
27
+ - Fine-tuned on a specialized dataset of ToS clauses
28
+ - Achieves high accuracy in distinguishing between fair and unfair clauses
29
+ - Suitable for legal text analysis and consumer rights applications
30
+
31
+ ## Performance
32
+
33
+ The model demonstrates strong performance on the task of ToS clause classification:
34
+
35
+ - Validation Accuracy: 89.64%
36
+ - Test Accuracy: 85.84%
37
+
38
+ Detailed performance metrics per epoch:
39
+
40
+ | Epoch | Training Loss | Validation Loss | Accuracy | F1 Score | Precision | Recall |
41
+ |-------|---------------|-----------------|----------|----------|-----------|--------|
42
+ | 1 | 0.443500 | 0.398950 | 0.874699 | 0.858838 | 0.862516 | 0.874699 |
43
+ | 2 | 0.416400 | 0.438409 | 0.853012 | 0.847317 | 0.849916 | 0.853012 |
44
+ | 3 | 0.227700 | 0.505879 | 0.896386 | 0.893325 | 0.891521 | 0.896386 |
45
+ | 4 | 0.052600 | 0.667532 | 0.891566 | 0.893167 | 0.895115 | 0.891566 |
46
+ | 5 | 0.124200 | 0.747090 | 0.884337 | 0.887412 | 0.891807 | 0.884337 |
47
+
48
+ ## Training Details
49
+
50
+ - **Base Model**: RoBERTa-large
51
+ - **Dataset**: CodeHima/TOS_DatasetV2
52
+ - **Training Time**: 3310.09 seconds
53
+ - **Epochs**: 5
54
+ - **Batch Size**: 8
55
+ - **Learning Rate**: Started at 2e-5 with a warmup period and decay
56
+ - **Optimizer**: AdamW
57
+ - **Loss Function**: Cross-Entropy
58
+ - **Training Strategy**: Mixed precision training (FP16)
59
+
60
+ ## Usage
61
+
62
+ To use this model for inference:
63
+
64
+ ```python
65
+ from transformers import RobertaTokenizer, RobertaForSequenceClassification
66
+ import torch
67
+
68
+ # Load model and tokenizer
69
+ model = RobertaForSequenceClassification.from_pretrained("YourHuggingFaceUsername/Tos-Roberta")
70
+ tokenizer = RobertaTokenizer.from_pretrained("YourHuggingFaceUsername/Tos-Roberta")
71
+
72
+ # Prepare input text
73
+ text = "Your Terms of Service clause here"
74
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
75
+
76
+ # Make prediction
77
+ with torch.no_grad():
78
+ outputs = model(**inputs)
79
+
80
+ probabilities = torch.softmax(outputs.logits, dim=-1)
81
+ predicted_class = torch.argmax(probabilities, dim=-1).item()
82
+
83
+ # Map prediction to label
84
+ label_map = {0: "clearly_fair", 1: "potentially_unfair", 2: "clearly_unfair"}
85
+ predicted_label = label_map[predicted_class]
86
+
87
+ print(f"Predicted class: {predicted_label}")
88
+ print(f"Probabilities: {probabilities[0].tolist()}")
89
+ ```
90
+
91
+ ## Limitations and Bias
92
+
93
+ - The model's performance may vary depending on the legal jurisdiction and specific domain of the ToS.
94
+ - It may not capture nuanced legal interpretations that require human expertise.
95
+ - The training data may contain biases present in existing ToS documents.
96
+
97
+ ## Ethical Considerations
98
+
99
+ While this model can assist in identifying potentially unfair clauses in ToS documents, it should not be used as a substitute for professional legal advice. The model's predictions should be reviewed by qualified legal professionals before making any decisions based on its output.
100
+
101
+ ## Citation
102
+
103
+ If you use this model in your research or application, please cite it as:
104
+
105
+ ```
106
+ @misc{Tos-Roberta,
107
+ author = {Himanshu Mohanty},
108
+ title = {Tos-Roberta: RoBERTa-large model for Terms of Service Fairness Classification},
109
+ year = {2024},
110
+ publisher = {HuggingFace},
111
+ journal = {CodeHima/Tos-Roberta},
112
+ howpublished = {\url{https://huggingface.co/CodeHima/Tos-Roberta}}
113
+ }
114
+ ```
115
+
116
+ ## Contact
117
+
118
+ For questions, feedback, or collaborations, please open an issue on the model's Hugging Face repository or contact [Your Contact Information].