File size: 6,292 Bytes
e435084
 
 
 
 
 
f1a5777
 
 
 
e435084
 
 
 
 
 
 
 
 
206583b
f1a5777
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e435084
f1a5777
 
 
 
 
 
 
 
 
 
 
 
e435084
f1a5777
 
e435084
f1a5777
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e435084
f1a5777
 
 
 
 
e435084
f1a5777
 
 
e435084
f1a5777
 
 
e435084
f1a5777
e435084
f1a5777
 
 
 
e435084
f1a5777
e435084
f1a5777
 
 
e435084
f1a5777
 
e435084
f1a5777
 
e435084
f1a5777
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e435084
f1a5777
 
 
 
854a943
f1a5777
 
 
8ab0faf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
library_name: transformers
license: apache-2.0
base_model: answerdotai/ModernBERT-base
tags:
- generated_from_trainer
- text-classification
- news-classification
- english
- modernbert
metrics:
- f1
model-index:
- name: ModernBERT-NewsClassifier-EN-small
  results: []
---

# ModernBERT-NewsClassifier-EN-small


This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an English **News Category** dataset covering 15 distinct topics (e.g., **Politics**, **Sports**, **Business**, etc.). It achieves the following results on the evaluation set:

- **Validation Loss**: `3.1201`  
- **Weighted F1 Score**: `0.5475`
---
## Model Description

**Architecture**: This model is based on [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base), an advanced Transformer architecture featuring Rotary Position Embeddings (RoPE), Flash Attention, and a native long context window (up to 8,192 tokens). For the classification task, a linear classification head is added on top of the BERT encoder outputs.

**Task**: **Multi-class News Classification**  
- The model classifies English news headlines or short texts into one of 15 categories.

**Use Cases**:
- Automatically tagging news headlines with appropriate categories in editorial pipelines.
- Classifying short text blurbs for social media or aggregator systems.
- Building a quick filter for content-based recommendation engines.
---
## Intended Uses & Limitations

- **Intended for**: Users who need to categorize short English news texts into broad topics.  
- **Language**: Trained primarily on **English** texts. Performance on non-English text is not guaranteed.  
- **Limitations**:
  - Certain categories (e.g., `BLACK VOICES`, `QUEER VOICES`) may contain nuanced language that could lead to misclassification if context is limited or if the text is ambiguous.
---

## Training and Evaluation Data

- **Dataset**: Curated from an English news-category dataset with 15 labels (e.g., `POLITICS`, `ENTERTAINMENT`, `SPORTS`, `BUSINESS`, etc.).  
- **Data Size**: ~30,000 samples in total, balanced at 2,000 samples per category.  
- **Split**: 90% training (27,000 samples) and 10% testing (3,000 samples).  

### Categories

1. POLITICS  
2. WELLNESS  
3. ENTERTAINMENT  
4. TRAVEL  
5. STYLE & BEAUTY  
6. PARENTING  
7. HEALTHY LIVING  
8. QUEER VOICES  
9. FOOD & DRINK  
10. BUSINESS  
11. COMEDY  
12. SPORTS  
13. BLACK VOICES  
14. HOME & LIVING  
15. PARENTS  

---

## Training Procedure

### Hyperparameters

| Hyperparameter                | Value                  |
|------------------------------:|:-----------------------|
| **learning_rate**            | 5e-05                  |
| **train_batch_size**         | 8                      |
| **eval_batch_size**          | 4                      |
| **seed**                     | 42                     |
| **gradient_accumulation_steps** | 2                  |
| **total_train_batch_size**   | 16 (8 x 2)             |
| **optimizer**                | `adamw_torch_fused` (betas=(0.9,0.999), epsilon=1e-08) |
| **lr_scheduler_type**        | linear                 |
| **lr_scheduler_warmup_steps**| 100                    |
| **num_epochs**               | 5                      |

**Optimizer**: Used `AdamW` with fused kernels (`adamw_torch_fused`) for efficiency.  
**Loss Function**: Cross-entropy (with weighted F1 as metric).

---

## Training Results

| Training Loss | Epoch  | Step | Validation Loss | F1 (Weighted) |
|:-------------:|:------:|:----:|:---------------:|:-------------:|
| 2.6251        | 1.0    | 1688 | 1.3810          | 0.5543        |
| 1.9267        | 2.0    | 3376 | 1.4378          | 0.5588        |
| 0.6349        | 3.0    | 5064 | 2.1705          | 0.5415        |
| 0.1273        | 4.0    | 6752 | 2.9007          | 0.5402        |
| 0.0288        | 4.9973 | 8435 | 3.1201          | 0.5475        |

- **Best Weighted F1** observed near the final epochs is **~0.55** on the validation set.

---

## Inference Example

Below are two ways to use this model: via a **pipeline** and by using the **model & tokenizer** directly.

### 1) Quick Start with `pipeline`

```python
from transformers import pipeline

# Instantiate the pipeline
classifier = pipeline(
    "text-classification",
    model="Sengil/ModernBERT-NewsClassifier-EN-small"
)

# Sample text
text = "The President pledges new infrastructure initiatives amid economic concerns."
outputs = classifier(text)

# Output: [{'label': 'POLITICS', 'score': 0.95}, ...]
print(outputs)
```

### 2) Direct Model Usage

```python
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "Sengil/ModernBERT-NewsClassifier-EN-small"

# Load model & tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

sample_text = "Local authorities call for better healthcare policies."
inputs = tokenizer(sample_text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs).logits

# Convert logits to probabilities
probs = F.softmax(logits, dim=1)[0]
predicted_label_id = torch.argmax(probs).item()

# Get the label string
id2label = model.config.id2label
predicted_label = id2label[predicted_label_id]
confidence_score = probs[predicted_label_id].item()

print(f"Predicted Label: {predicted_label} | Score: {confidence_score:.4f}")
```

---

## Additional Information

- **Framework Versions**:
  - **Transformers**: 4.49.0.dev0
  - **PyTorch**: 2.5.1+cu121
  - **Datasets**: 3.2.0
  - **Tokenizers**: 0.21.0

- **License**: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  
- **Intellectual Property**: The original ModernBERT base model is provided by [answerdotai](https://huggingface.co/answerdotai). This fine-tuned checkpoint inherits the same license.

---

**Citation** (If you use or extend this model in your research or applications, please consider citing it):
```
@misc{ModernBERTNewsClassifierENsmall,
  title={ModernBERT-NewsClassifier-EN-small},
  author={Mert Sengil},
  year={2025},
  howpublished={\url{https://huggingface.co/Sengil/ModernBERT-NewsClassifier-EN-small}},
}
```