cybersectony
/

phishing-email-detection-distilbert_v2.4.1

Text Classification

Inference Endpoints

Model card Files Files and versions Community

cybersectony commited on Oct 27, 2024

Commit

3a0bb7a

·

verified ·

1 Parent(s): 8ef7589

Update README.md

Files changed (1) hide show

README.md +95 -1

README.md CHANGED Viewed

@@ -7,4 +7,98 @@ language:
 base_model:
 - distilbert/distilbert-base-uncased
 library_name: transformers
----

 base_model:
 - distilbert/distilbert-base-uncased
 library_name: transformers
+---
+# A distilBERT based Phishing Email Detection Model
+## Model Overview
+This model is based on DistilBERT and has been fine-tuned for multilabel classification of Emails and URLs as safe or potentially phishing.
+## Key Specifications
+- __Base Architecture:__ DistilBERT
+- __Task:__ Multilabel Classification
+- __Fine-tuning Framework:__ Hugging Face Trainer API
+- __Training Duration:__ 3 epochs
+## Performance Metrics
+- __F1-score:__ 97.717
+- __Accuracy:__ 97.716
+- __Precision:__ 97.736
+- __Recall:__ 97.717
+## Dataset Details
+The model was trained on a custom dataset of Emails and URLs labeled as legitimate or phishing. The dataset is available at [`cybersectony/PhishingEmailDetectionv2.0`](https://huggingface.co/datasets/cybersectony/PhishingEmailDetectionv2.0) on the Hugging Face Hub.
+## Usage Guide
+## Installation
+```bash
+pip install transformers
+pip install torch
+```
+## Quick Start
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+tokenizer = AutoTokenizer.from_pretrained("your-username/model-name")
+import torch
+# Load model and tokenizer
+model = AutoModelForSequenceClassification.from_pretrained("your-username/model-name")
+def predict_email(email_text):
+    # Preprocess and tokenize
+    inputs = tokenizer(
+        email_text,
+        return_tensors="pt",
+        truncation=True,
+        max_length=512
+    )
+    # Get prediction
+    with torch.no_grad():
+        outputs = model(**inputs)
+        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+    # Get probabilities for each class
+    probs = predictions[0].tolist()
+    # Create labels dictionary
+    labels = {
+        "legitimate_email": probs[0],
+        "phishing_url": probs[1],
+        "legitimate_url": probs[2],
+        "phishing_url_alt": probs[3]
+    }
+    # Determine the most likely classification
+    max_label = max(labels.items(), key=lambda x: x[1])
+    return {
+        "prediction": max_label[0],
+        "confidence": max_label[1],
+        "all_probabilities": labels
+    }
+```
+## Example Usage
+```python
+# Example usage
+email = """
+Dear User,
+Your account security needs immediate attention. Please verify your credentials.
+Click here: http://suspicious-link.com
+"""
+result = predict_email(email)
+print(f"Prediction: {result['prediction']}")
+print(f"Confidence: {result['confidence']:.2%}")
+print("\nAll probabilities:")
+for label, prob in result['all_probabilities'].items():
+    print(f"{label}: {prob:.2%}")
+```