cybersectony commited on
Commit
3a0bb7a
·
verified ·
1 Parent(s): 8ef7589

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -1
README.md CHANGED
@@ -7,4 +7,98 @@ language:
7
  base_model:
8
  - distilbert/distilbert-base-uncased
9
  library_name: transformers
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  base_model:
8
  - distilbert/distilbert-base-uncased
9
  library_name: transformers
10
+ ---
11
+
12
+ # A distilBERT based Phishing Email Detection Model
13
+
14
+ ## Model Overview
15
+ This model is based on DistilBERT and has been fine-tuned for multilabel classification of Emails and URLs as safe or potentially phishing.
16
+
17
+ ## Key Specifications
18
+ - __Base Architecture:__ DistilBERT
19
+ - __Task:__ Multilabel Classification
20
+ - __Fine-tuning Framework:__ Hugging Face Trainer API
21
+ - __Training Duration:__ 3 epochs
22
+
23
+ ## Performance Metrics
24
+ - __F1-score:__ 97.717
25
+ - __Accuracy:__ 97.716
26
+ - __Precision:__ 97.736
27
+ - __Recall:__ 97.717
28
+
29
+ ## Dataset Details
30
+
31
+ The model was trained on a custom dataset of Emails and URLs labeled as legitimate or phishing. The dataset is available at [`cybersectony/PhishingEmailDetectionv2.0`](https://huggingface.co/datasets/cybersectony/PhishingEmailDetectionv2.0) on the Hugging Face Hub.
32
+
33
+
34
+ ## Usage Guide
35
+
36
+ ## Installation
37
+
38
+ ```bash
39
+ pip install transformers
40
+ pip install torch
41
+ ```
42
+
43
+ ## Quick Start
44
+
45
+ ```python
46
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
47
+ tokenizer = AutoTokenizer.from_pretrained("your-username/model-name")
48
+ import torch
49
+
50
+ # Load model and tokenizer
51
+ model = AutoModelForSequenceClassification.from_pretrained("your-username/model-name")
52
+
53
+ def predict_email(email_text):
54
+ # Preprocess and tokenize
55
+ inputs = tokenizer(
56
+ email_text,
57
+ return_tensors="pt",
58
+ truncation=True,
59
+ max_length=512
60
+ )
61
+
62
+ # Get prediction
63
+ with torch.no_grad():
64
+ outputs = model(**inputs)
65
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
66
+
67
+ # Get probabilities for each class
68
+ probs = predictions[0].tolist()
69
+
70
+ # Create labels dictionary
71
+ labels = {
72
+ "legitimate_email": probs[0],
73
+ "phishing_url": probs[1],
74
+ "legitimate_url": probs[2],
75
+ "phishing_url_alt": probs[3]
76
+ }
77
+
78
+ # Determine the most likely classification
79
+ max_label = max(labels.items(), key=lambda x: x[1])
80
+
81
+ return {
82
+ "prediction": max_label[0],
83
+ "confidence": max_label[1],
84
+ "all_probabilities": labels
85
+ }
86
+ ```
87
+
88
+ ## Example Usage
89
+
90
+ ```python
91
+ # Example usage
92
+ email = """
93
+ Dear User,
94
+ Your account security needs immediate attention. Please verify your credentials.
95
+ Click here: http://suspicious-link.com
96
+ """
97
+
98
+ result = predict_email(email)
99
+ print(f"Prediction: {result['prediction']}")
100
+ print(f"Confidence: {result['confidence']:.2%}")
101
+ print("\nAll probabilities:")
102
+ for label, prob in result['all_probabilities'].items():
103
+ print(f"{label}: {prob:.2%}")
104
+ ```