File size: 6,292 Bytes
e435084 f1a5777 e435084 206583b f1a5777 e435084 f1a5777 e435084 f1a5777 e435084 f1a5777 e435084 f1a5777 e435084 f1a5777 e435084 f1a5777 e435084 f1a5777 e435084 f1a5777 e435084 f1a5777 e435084 f1a5777 e435084 f1a5777 e435084 f1a5777 e435084 f1a5777 e435084 f1a5777 854a943 f1a5777 8ab0faf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
---
library_name: transformers
license: apache-2.0
base_model: answerdotai/ModernBERT-base
tags:
- generated_from_trainer
- text-classification
- news-classification
- english
- modernbert
metrics:
- f1
model-index:
- name: ModernBERT-NewsClassifier-EN-small
results: []
---
# ModernBERT-NewsClassifier-EN-small
This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an English **News Category** dataset covering 15 distinct topics (e.g., **Politics**, **Sports**, **Business**, etc.). It achieves the following results on the evaluation set:
- **Validation Loss**: `3.1201`
- **Weighted F1 Score**: `0.5475`
---
## Model Description
**Architecture**: This model is based on [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base), an advanced Transformer architecture featuring Rotary Position Embeddings (RoPE), Flash Attention, and a native long context window (up to 8,192 tokens). For the classification task, a linear classification head is added on top of the BERT encoder outputs.
**Task**: **Multi-class News Classification**
- The model classifies English news headlines or short texts into one of 15 categories.
**Use Cases**:
- Automatically tagging news headlines with appropriate categories in editorial pipelines.
- Classifying short text blurbs for social media or aggregator systems.
- Building a quick filter for content-based recommendation engines.
---
## Intended Uses & Limitations
- **Intended for**: Users who need to categorize short English news texts into broad topics.
- **Language**: Trained primarily on **English** texts. Performance on non-English text is not guaranteed.
- **Limitations**:
- Certain categories (e.g., `BLACK VOICES`, `QUEER VOICES`) may contain nuanced language that could lead to misclassification if context is limited or if the text is ambiguous.
---
## Training and Evaluation Data
- **Dataset**: Curated from an English news-category dataset with 15 labels (e.g., `POLITICS`, `ENTERTAINMENT`, `SPORTS`, `BUSINESS`, etc.).
- **Data Size**: ~30,000 samples in total, balanced at 2,000 samples per category.
- **Split**: 90% training (27,000 samples) and 10% testing (3,000 samples).
### Categories
1. POLITICS
2. WELLNESS
3. ENTERTAINMENT
4. TRAVEL
5. STYLE & BEAUTY
6. PARENTING
7. HEALTHY LIVING
8. QUEER VOICES
9. FOOD & DRINK
10. BUSINESS
11. COMEDY
12. SPORTS
13. BLACK VOICES
14. HOME & LIVING
15. PARENTS
---
## Training Procedure
### Hyperparameters
| Hyperparameter | Value |
|------------------------------:|:-----------------------|
| **learning_rate** | 5e-05 |
| **train_batch_size** | 8 |
| **eval_batch_size** | 4 |
| **seed** | 42 |
| **gradient_accumulation_steps** | 2 |
| **total_train_batch_size** | 16 (8 x 2) |
| **optimizer** | `adamw_torch_fused` (betas=(0.9,0.999), epsilon=1e-08) |
| **lr_scheduler_type** | linear |
| **lr_scheduler_warmup_steps**| 100 |
| **num_epochs** | 5 |
**Optimizer**: Used `AdamW` with fused kernels (`adamw_torch_fused`) for efficiency.
**Loss Function**: Cross-entropy (with weighted F1 as metric).
---
## Training Results
| Training Loss | Epoch | Step | Validation Loss | F1 (Weighted) |
|:-------------:|:------:|:----:|:---------------:|:-------------:|
| 2.6251 | 1.0 | 1688 | 1.3810 | 0.5543 |
| 1.9267 | 2.0 | 3376 | 1.4378 | 0.5588 |
| 0.6349 | 3.0 | 5064 | 2.1705 | 0.5415 |
| 0.1273 | 4.0 | 6752 | 2.9007 | 0.5402 |
| 0.0288 | 4.9973 | 8435 | 3.1201 | 0.5475 |
- **Best Weighted F1** observed near the final epochs is **~0.55** on the validation set.
---
## Inference Example
Below are two ways to use this model: via a **pipeline** and by using the **model & tokenizer** directly.
### 1) Quick Start with `pipeline`
```python
from transformers import pipeline
# Instantiate the pipeline
classifier = pipeline(
"text-classification",
model="Sengil/ModernBERT-NewsClassifier-EN-small"
)
# Sample text
text = "The President pledges new infrastructure initiatives amid economic concerns."
outputs = classifier(text)
# Output: [{'label': 'POLITICS', 'score': 0.95}, ...]
print(outputs)
```
### 2) Direct Model Usage
```python
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "Sengil/ModernBERT-NewsClassifier-EN-small"
# Load model & tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
sample_text = "Local authorities call for better healthcare policies."
inputs = tokenizer(sample_text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
# Convert logits to probabilities
probs = F.softmax(logits, dim=1)[0]
predicted_label_id = torch.argmax(probs).item()
# Get the label string
id2label = model.config.id2label
predicted_label = id2label[predicted_label_id]
confidence_score = probs[predicted_label_id].item()
print(f"Predicted Label: {predicted_label} | Score: {confidence_score:.4f}")
```
---
## Additional Information
- **Framework Versions**:
- **Transformers**: 4.49.0.dev0
- **PyTorch**: 2.5.1+cu121
- **Datasets**: 3.2.0
- **Tokenizers**: 0.21.0
- **License**: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
- **Intellectual Property**: The original ModernBERT base model is provided by [answerdotai](https://huggingface.co/answerdotai). This fine-tuned checkpoint inherits the same license.
---
**Citation** (If you use or extend this model in your research or applications, please consider citing it):
```
@misc{ModernBERTNewsClassifierENsmall,
title={ModernBERT-NewsClassifier-EN-small},
author={Mert Sengil},
year={2025},
howpublished={\url{https://huggingface.co/Sengil/ModernBERT-NewsClassifier-EN-small}},
}
``` |