MostafaAhmed98 commited on
Commit
666c26a
·
verified ·
1 Parent(s): 80d89ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -1
README.md CHANGED
@@ -19,4 +19,86 @@ metrics:
19
  - precision
20
  - accuracy
21
  - recall
22
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  - precision
20
  - accuracy
21
  - recall
22
+ ---
23
+
24
+ ---
25
+
26
+ # Model Card for Arabic Named Entity Recognition with AraBERT
27
+
28
+ ## Model Details
29
+
30
+ **Model Name:** AraBERT-NER
31
+
32
+ **Model Type:** AraBERT (Pre-trained on Arabic text and fine-tuned on Arabic Named Entity Recognition task)
33
+
34
+ **Language:** Arabic
35
+
36
+ **License:** MIT
37
+
38
+ **Model Creator:** Mostafa Ahmed
39
+
40
+ **Contact Information:** [email protected]
41
+
42
+ **Model Version:** 1.0
43
+
44
+ ## Overview
45
+
46
+ AraBERT-NER is a fine-tuned version of the AraBERT model specifically designed for Named Entity Recognition (NER) tasks in Arabic. The model has been trained to identify and classify named entities such as persons, organizations, locations and MISC and more within Arabic text. This makes it suitable for various applications such as information extraction, document categorization, and data annotation in Arabic.
47
+
48
+ ## Intended Use
49
+
50
+ The model is intended for use in:
51
+
52
+ - Named Entity Recognition systems for Arabic
53
+ - Information extraction from Arabic text
54
+ - Document categorization and annotation
55
+ - Arabic language processing research
56
+
57
+ ## Training Data
58
+
59
+ The model was fine-tuned on the CoNLL-NER-AR dataset.
60
+
61
+ **Data Sources:**
62
+
63
+ - [CoNLL-NER-AR](https://huggingface.co/datasets/e-hossam96/conllpp-ner-ar): A dataset for named entity recognition tasks in Arabic.
64
+
65
+ ## Training Procedure
66
+
67
+ The model was trained using the Hugging Face `transformers` library. The training process involved:
68
+
69
+ - Preprocessing the CoNLL-NER-AR to format the text and entity annotations for NER.
70
+ - Fine-tuning the pre-trained AraBERT model on the Arabic NER dataset.
71
+ - Evaluating the model's performance using standard NER metrics (e.g., Precision, Recall, F1 Score).
72
+
73
+ ## Evaluation Results
74
+
75
+ The model was evaluated on a held-out test set from the CoNLL++-NER-AR dataset. Here are the key performance metrics:
76
+
77
+ - **Precision:** 0.8547
78
+ - **Recall:** 0.8633
79
+ - **F1 Score:** 0.8590
80
+ - **Accuracy:** 0.9542
81
+
82
+ These metrics indicate the model's effectiveness in accurately identifying and classifying named entities in Arabic text.
83
+
84
+ ## How to Use
85
+
86
+ You can load and use the model with the Hugging Face `transformers` library as follows:
87
+
88
+ ```python
89
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
90
+ from transformers import pipeline
91
+
92
+ tokenizer = AutoTokenizer.from_pretrained("your-username/AraBERT-NER")
93
+ model = AutoModelForTokenClassification.from_pretrained("your-username/AraBERT-NER")
94
+
95
+ # Create a NER pipeline
96
+ ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)
97
+
98
+ # Example usage
99
+ text = "ولد محمد علي في القاهرة وعمل في شركة مايكروسوفت."
100
+ ner_results = ner_pipeline(text)
101
+
102
+ for entity in ner_results:
103
+ print(f"Entity: {entity['word']}, Label: {entity['entity']}, Confidence: {entity['score']:.2f}")
104
+ ```