--- license: apache-2.0 datasets: - arbml/SANAD language: - ar base_model: - answerdotai/ModernBERT-base pipeline_tag: text-classification library_name: transformers tags: - modernbert - arabic --- # ModernBERT Arabic Model Card ## Overview > [!NOTE] > This is an Experimental Arabic Model demonstrates how ModernBERT can be adapted to Arabic for tasks like topic classification. This is an Experimental Arabic version of [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base),trained ONLY on Topic Classification Task using the base model of original modernbert with a custom Arabic trained tokenizer with the following details: - **Dataset:** Arabic Wikipedia - **Size:** 1.8 GB - **Tokens:** 228,788,529 tokens This model demonstrates how ModernBERT can be adapted to Arabic for tasks like topic classification. ## Model Eval Details - **Epochs:** 3 - **Evaluation Metrics:** - **F1 Score:** 0.95 - **Loss:** 0.1998 - **Training Step:** 47,862 ## Dataset Used For Training: - [SANAD DATASET](https://huggingface.co/datasets/arbml/SANAD) was used for training and testing which contains 7 different topics such as Politics, Finance, Medical, Culture, Sport , Tech and Religion. ## How to Use The model can be used for text classification using the `transformers` library. Below is an example: ```python from transformers import pipeline # Load model from huggingface.co/models using our repository ID classifier = pipeline( task="text-classification", model="Omartificial-Intelligence-Space/AraModernBert-Topic-Classifier", ) sample = ''' PUT SOME TEXT HERE TO CLASSIFY ITS TOPIC ''' classifier(sample) # [{'label': 'health', 'score': 0.6779336333274841}] ```