Omartificial-Intelligence-Space commited on
Commit
cf78b51
·
verified ·
1 Parent(s): 470556d

Update readme.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md CHANGED
@@ -13,3 +13,46 @@ tags:
13
  - arabic
14
  ---
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  - arabic
14
  ---
15
 
16
+
17
+ # ModernBERT Arabic Model Card
18
+
19
+ ## Overview
20
+ This is an Arabic version of ModernBERT, a modernized bidirectional encoder-only Transformer model (BERT-style). ModernBERT was pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. You can find more about the base ModernBERT model here: [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base).
21
+
22
+ For this proof of concept, a tokenizer trained on Arabic Wikipedia was utilized:
23
+ - **Dataset:** Arabic Wikipedia
24
+ - **Size:** 1.8 GB
25
+ - **Tokens:** 228,788,529 tokens
26
+
27
+ This model demonstrates how ModernBERT can be adapted to Arabic for tasks like topic classification.
28
+
29
+ ## Model Details
30
+ - **Epochs:** 3
31
+ - **Evaluation Metrics:**
32
+ - **F1 Score:** 0.9587811491105839
33
+ - **Loss:** 0.19986020028591156
34
+ - **Runtime:** 46.4942 seconds
35
+ - **Samples per second:** 305.006
36
+ - **Steps per second:** 38.134
37
+ - **Training Step:** 47,862
38
+
39
+ ## How to Use
40
+ The model can be used for text classification using the `transformers` library. Below is an example:
41
+
42
+ ```python
43
+ from transformers import pipeline
44
+
45
+ # Load model from huggingface.co/models using our repository ID
46
+ classifier = pipeline(
47
+ task="text-classification",
48
+ model="ModernBERT-domain-classifier/checkpoint-47862",
49
+ )
50
+
51
+ sample = '''
52
+ اسلام عددا من الوافدين الى الممكلة العربية السعوديه
53
+ '''
54
+
55
+ classifier(sample)
56
+ # [{'label': 'health', 'score': 0.6779336333274841}]
57
+
58
+