urlbert
Collection
A collection of bert-based models for URL analysis
•
6 items
•
Updated
This is a very small version of BERT, designed to categorize links into phishing and non-phishing links
An updated, lighter version of the old classification model for URL analysis
Old version: https://huggingface.co/CrabInHoney/urlbert-tiny-v1-phishing-classifier
Val score: 0.9622
Model size
3.7M params
Tensor type
F32
Dataset (urls.json only)
Example:
from transformers import BertTokenizerFast, BertForSequenceClassification, pipeline
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Используемое устройство: {device}")
model_path = "./urlbert-tiny-v2-phishing-classifier"
tokenizer = BertTokenizerFast.from_pretrained(model_path)
model = BertForSequenceClassification.from_pretrained(model_path)
model.to(device)
classifier = pipeline(
"text-classification",
model=model,
tokenizer=tokenizer,
device=0 if torch.cuda.is_available() else -1,
return_all_scores=True
)
test_urls = [
"huggingface.co/",
"p64.hu991ngface.co.com.ru/"
]
for url in test_urls:
results = classifier(url)
print(f"\nURL: {url}")
for result in results[0]:
label = result['label']
score = result['score']
print(f"Класс: {label}, вероятность: {score:.4f}")
Output:
Используемое устройство: cuda
URL: huggingface.co/
Класс: good, вероятность: 0.8515
Класс: phish, вероятность: 0.1485
URL: p64.hu991ngface.co.com.ru/
Класс: good, вероятность: 0.0289
Класс: phish, вероятность: 0.9711
Base model
CrabInHoney/urlbert-tiny-base-v2