MostafaAhmed98
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -19,4 +19,86 @@ metrics:
|
|
19 |
- precision
|
20 |
- accuracy
|
21 |
- recall
|
22 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
- precision
|
20 |
- accuracy
|
21 |
- recall
|
22 |
+
---
|
23 |
+
|
24 |
+
---
|
25 |
+
|
26 |
+
# Model Card for Arabic Named Entity Recognition with AraBERT
|
27 |
+
|
28 |
+
## Model Details
|
29 |
+
|
30 |
+
**Model Name:** AraBERT-NER
|
31 |
+
|
32 |
+
**Model Type:** AraBERT (Pre-trained on Arabic text and fine-tuned on Arabic Named Entity Recognition task)
|
33 |
+
|
34 |
+
**Language:** Arabic
|
35 |
+
|
36 |
+
**License:** MIT
|
37 |
+
|
38 |
+
**Model Creator:** Mostafa Ahmed
|
39 |
+
|
40 |
+
**Contact Information:** [email protected]
|
41 |
+
|
42 |
+
**Model Version:** 1.0
|
43 |
+
|
44 |
+
## Overview
|
45 |
+
|
46 |
+
AraBERT-NER is a fine-tuned version of the AraBERT model specifically designed for Named Entity Recognition (NER) tasks in Arabic. The model has been trained to identify and classify named entities such as persons, organizations, locations and MISC and more within Arabic text. This makes it suitable for various applications such as information extraction, document categorization, and data annotation in Arabic.
|
47 |
+
|
48 |
+
## Intended Use
|
49 |
+
|
50 |
+
The model is intended for use in:
|
51 |
+
|
52 |
+
- Named Entity Recognition systems for Arabic
|
53 |
+
- Information extraction from Arabic text
|
54 |
+
- Document categorization and annotation
|
55 |
+
- Arabic language processing research
|
56 |
+
|
57 |
+
## Training Data
|
58 |
+
|
59 |
+
The model was fine-tuned on the CoNLL-NER-AR dataset.
|
60 |
+
|
61 |
+
**Data Sources:**
|
62 |
+
|
63 |
+
- [CoNLL-NER-AR](https://huggingface.co/datasets/e-hossam96/conllpp-ner-ar): A dataset for named entity recognition tasks in Arabic.
|
64 |
+
|
65 |
+
## Training Procedure
|
66 |
+
|
67 |
+
The model was trained using the Hugging Face `transformers` library. The training process involved:
|
68 |
+
|
69 |
+
- Preprocessing the CoNLL-NER-AR to format the text and entity annotations for NER.
|
70 |
+
- Fine-tuning the pre-trained AraBERT model on the Arabic NER dataset.
|
71 |
+
- Evaluating the model's performance using standard NER metrics (e.g., Precision, Recall, F1 Score).
|
72 |
+
|
73 |
+
## Evaluation Results
|
74 |
+
|
75 |
+
The model was evaluated on a held-out test set from the CoNLL++-NER-AR dataset. Here are the key performance metrics:
|
76 |
+
|
77 |
+
- **Precision:** 0.8547
|
78 |
+
- **Recall:** 0.8633
|
79 |
+
- **F1 Score:** 0.8590
|
80 |
+
- **Accuracy:** 0.9542
|
81 |
+
|
82 |
+
These metrics indicate the model's effectiveness in accurately identifying and classifying named entities in Arabic text.
|
83 |
+
|
84 |
+
## How to Use
|
85 |
+
|
86 |
+
You can load and use the model with the Hugging Face `transformers` library as follows:
|
87 |
+
|
88 |
+
```python
|
89 |
+
from transformers import AutoTokenizer, AutoModelForTokenClassification
|
90 |
+
from transformers import pipeline
|
91 |
+
|
92 |
+
tokenizer = AutoTokenizer.from_pretrained("your-username/AraBERT-NER")
|
93 |
+
model = AutoModelForTokenClassification.from_pretrained("your-username/AraBERT-NER")
|
94 |
+
|
95 |
+
# Create a NER pipeline
|
96 |
+
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)
|
97 |
+
|
98 |
+
# Example usage
|
99 |
+
text = "ولد محمد علي في القاهرة وعمل في شركة مايكروسوفت."
|
100 |
+
ner_results = ner_pipeline(text)
|
101 |
+
|
102 |
+
for entity in ner_results:
|
103 |
+
print(f"Entity: {entity['word']}, Label: {entity['entity']}, Confidence: {entity['score']:.2f}")
|
104 |
+
```
|