NAMAA-Space
/

AraModernBert-Topic-Classifier

@@ -13,30 +13,29 @@ tags:
 - arabic
 ---
 # ModernBERT Arabic Model Card
 ## Overview
-This is an Arabic version of ModernBERT, a modernized bidirectional encoder-only Transformer model (BERT-style). ModernBERT was pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. You can find more about the base ModernBERT model here: [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base).
-For this proof of concept, a tokenizer trained on Arabic Wikipedia was utilized:
 - **Dataset:** Arabic Wikipedia
 - **Size:** 1.8 GB
 - **Tokens:** 228,788,529 tokens
 This model demonstrates how ModernBERT can be adapted to Arabic for tasks like topic classification.
-## Model Details
 - **Epochs:** 3
 - **Evaluation Metrics:**
-  - **F1 Score:** 0.9587811491105839
-  - **Loss:** 0.19986020028591156
-  - **Runtime:** 46.4942 seconds
-  - **Samples per second:** 305.006
-  - **Steps per second:** 38.134
 - **Training Step:** 47,862
 ## How to Use
 The model can be used for text classification using the `transformers` library. Below is an example:
 ```python
@@ -45,7 +44,7 @@ from transformers import pipeline
 # Load model from huggingface.co/models using our repository ID
 classifier = pipeline(
     task="text-classification",
-    model="ModernBERT-domain-classifier/checkpoint-47862",
 )
 sample = '''
@@ -53,6 +52,8 @@ PUT SOME TEXT HERE TO CLASSIFY ITS TOPIC
 '''
 classifier(sample)
 # [{'label': 'health', 'score': 0.6779336333274841}]

 - arabic
 ---
 # ModernBERT Arabic Model Card
 ## Overview
+This is an Arabic version of [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base),trained ONLY on Topic Classification Task using the base model of original modernbert with a custom Arabic trained tokenizer with the following details:
 - **Dataset:** Arabic Wikipedia
 - **Size:** 1.8 GB
 - **Tokens:** 228,788,529 tokens
 This model demonstrates how ModernBERT can be adapted to Arabic for tasks like topic classification.
+## Model Eval Details
 - **Epochs:** 3
 - **Evaluation Metrics:**
+  - **F1 Score:** 0.95
+  - **Loss:** 0.1998
 - **Training Step:** 47,862
+## Dataset Used For Training:
+- [SANAD DATSET](https://huggingface.co/datasets/arbml/SANAD) was used for training and testing which contains 7 different topics such as Politics, Finance, Medical, Culture, Sport , Tech and Religion.
 ## How to Use
 The model can be used for text classification using the `transformers` library. Below is an example:
 ```python
 # Load model from huggingface.co/models using our repository ID
 classifier = pipeline(
     task="text-classification",
+    model="Omartificial-Intelligence-Space/AraModernBert-Topic-Classifier",
 )
 sample = '''
 '''
 classifier(sample)
 # [{'label': 'health', 'score': 0.6779336333274841}]
+```