anvilogic-admin
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -12,24 +12,23 @@ pinned: false
|
|
12 |
Welcome to the official Hugging Face organization for [Anvilogic's](https://www.anvilogic.com/) advanced cybersecurity AI models!
|
13 |
Founded in 2019, [Anvilogic](https://www.anvilogic.com/) specializes in AI-driven threat detection and automation, enhancing Security Operations Center (SOC) capabilities with scalable, data-driven solutions.
|
14 |
|
15 |
-
## Typosquatting
|
16 |
-
|
17 |
-
This collection aims at detecting typosquatted domains by identifying and flagging such domains :
|
18 |
-
It is comprised of the following:
|
19 |
|
20 |
### Models
|
21 |
|
22 |
-
- **
|
23 |
-
- **Cross-
|
24 |
-
- **T5
|
25 |
|
26 |
### Datasets
|
27 |
|
28 |
-
- **Embedder
|
29 |
-
- **Cross-Encoder
|
30 |
-
- **T5
|
31 |
|
32 |
### Spaces
|
33 |
-
|
34 |
-
- **
|
35 |
-
- **
|
|
|
|
12 |
Welcome to the official Hugging Face organization for [Anvilogic's](https://www.anvilogic.com/) advanced cybersecurity AI models!
|
13 |
Founded in 2019, [Anvilogic](https://www.anvilogic.com/) specializes in AI-driven threat detection and automation, enhancing Security Operations Center (SOC) capabilities with scalable, data-driven solutions.
|
14 |
|
15 |
+
## Typosquatting Collection
|
16 |
+
Typosquatting is a form of cyber attack where malicious actors create fake domain names that are visually or phonetically similar to legitimate domains, intending to deceive users into visiting these sites. This collection aims to detect typosquatted domains by identifying and flagging them. It is comprised of the following:
|
|
|
|
|
17 |
|
18 |
### Models
|
19 |
|
20 |
+
- **Embedder**: This model provides a representation for domain names and is used to mine similar domains. It is available in both a RoBERTa-based version (with BPE tokenization) and a CANINE-c version (with character-level encoding).
|
21 |
+
- **Cross-Encoder**: This model can compare two domain names and determine if one domain is a typosquat of another. It is available in both a RoBERTa-based version (with BPE tokenization) and a CANINE-c version (with character-level encoding).
|
22 |
+
- **T5 Typosquat Detection**: This model is a derived version of T5 trained on a new task, with the prefix "Is the first domain a typosquat of the second:" to which we append *TYPOSQUAT_DOMAIN* and *LEGITIMATE_DOMAIN*.
|
23 |
|
24 |
### Datasets
|
25 |
|
26 |
+
- **Embedder Training Dataset**: A dataset formatted to train the embedding model, containing pairs of (Anchor,Positive) domain examples.
|
27 |
+
- **Cross-Encoder Training Dataset**: A dataset formatted to train the Cross-Encoder model with (Anchor,Positive,label) samples.
|
28 |
+
- **T5 Training Dataset**: A dataset formatted to train the T5 model with (prompt,response) pairs.
|
29 |
|
30 |
### Spaces
|
31 |
+
|
32 |
+
- **Embedder Typosquat Detect**: Allows users to retrieve the most similar domains from a pool of 4,000 of the most common domains.
|
33 |
+
- **Cross-Encoder (CE) Typosquat Detect**: Allows users to compare two domains using the Cross-Encoder. The model outputs a probability of typosquatting.
|
34 |
+
- **T5 Typosquat Detect**: Allows users to compare two domains using the T5 model. The model outputs a boolean value indicating whether the domain is a typosquat.
|