Omartificial-Intelligence-Space commited on
Commit
cd49b52
·
verified ·
1 Parent(s): 2aeb68d

Update readme.md

Browse files
Files changed (1) hide show
  1. README.md +12 -11
README.md CHANGED
@@ -13,30 +13,29 @@ tags:
13
  - arabic
14
  ---
15
 
16
-
17
  # ModernBERT Arabic Model Card
18
 
19
  ## Overview
20
- This is an Arabic version of ModernBERT, a modernized bidirectional encoder-only Transformer model (BERT-style). ModernBERT was pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. You can find more about the base ModernBERT model here: [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base).
21
-
22
- For this proof of concept, a tokenizer trained on Arabic Wikipedia was utilized:
23
  - **Dataset:** Arabic Wikipedia
24
  - **Size:** 1.8 GB
25
  - **Tokens:** 228,788,529 tokens
26
 
27
  This model demonstrates how ModernBERT can be adapted to Arabic for tasks like topic classification.
28
 
29
- ## Model Details
30
  - **Epochs:** 3
31
  - **Evaluation Metrics:**
32
- - **F1 Score:** 0.9587811491105839
33
- - **Loss:** 0.19986020028591156
34
- - **Runtime:** 46.4942 seconds
35
- - **Samples per second:** 305.006
36
- - **Steps per second:** 38.134
37
  - **Training Step:** 47,862
38
 
 
 
 
 
39
  ## How to Use
 
40
  The model can be used for text classification using the `transformers` library. Below is an example:
41
 
42
  ```python
@@ -45,7 +44,7 @@ from transformers import pipeline
45
  # Load model from huggingface.co/models using our repository ID
46
  classifier = pipeline(
47
  task="text-classification",
48
- model="ModernBERT-domain-classifier/checkpoint-47862",
49
  )
50
 
51
  sample = '''
@@ -53,6 +52,8 @@ PUT SOME TEXT HERE TO CLASSIFY ITS TOPIC
53
  '''
54
 
55
  classifier(sample)
 
56
  # [{'label': 'health', 'score': 0.6779336333274841}]
 
57
 
58
 
 
13
  - arabic
14
  ---
15
 
 
16
  # ModernBERT Arabic Model Card
17
 
18
  ## Overview
19
+ This is an Arabic version of [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base),trained ONLY on Topic Classification Task using the base model of original modernbert with a custom Arabic trained tokenizer with the following details:
 
 
20
  - **Dataset:** Arabic Wikipedia
21
  - **Size:** 1.8 GB
22
  - **Tokens:** 228,788,529 tokens
23
 
24
  This model demonstrates how ModernBERT can be adapted to Arabic for tasks like topic classification.
25
 
26
+ ## Model Eval Details
27
  - **Epochs:** 3
28
  - **Evaluation Metrics:**
29
+ - **F1 Score:** 0.95
30
+ - **Loss:** 0.1998
 
 
 
31
  - **Training Step:** 47,862
32
 
33
+ ## Dataset Used For Training:
34
+
35
+ - [SANAD DATSET](https://huggingface.co/datasets/arbml/SANAD) was used for training and testing which contains 7 different topics such as Politics, Finance, Medical, Culture, Sport , Tech and Religion.
36
+
37
  ## How to Use
38
+
39
  The model can be used for text classification using the `transformers` library. Below is an example:
40
 
41
  ```python
 
44
  # Load model from huggingface.co/models using our repository ID
45
  classifier = pipeline(
46
  task="text-classification",
47
+ model="Omartificial-Intelligence-Space/AraModernBert-Topic-Classifier",
48
  )
49
 
50
  sample = '''
 
52
  '''
53
 
54
  classifier(sample)
55
+
56
  # [{'label': 'health', 'score': 0.6779336333274841}]
57
+ ```
58
 
59