|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
base_model: answerdotai/ModernBERT-base |
|
tags: |
|
- generated_from_trainer |
|
- text-classification |
|
- news-classification |
|
- english |
|
- modernbert |
|
metrics: |
|
- f1 |
|
model-index: |
|
- name: ModernBERT-NewsClassifier-EN-small |
|
results: [] |
|
--- |
|
|
|
# ModernBERT-NewsClassifier-EN-small |
|
|
|
|
|
This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an English **News Category** dataset covering 15 distinct topics (e.g., **Politics**, **Sports**, **Business**, etc.). It achieves the following results on the evaluation set: |
|
|
|
- **Validation Loss**: `3.1201` |
|
- **Weighted F1 Score**: `0.5475` |
|
--- |
|
## Model Description |
|
|
|
**Architecture**: This model is based on [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base), an advanced Transformer architecture featuring Rotary Position Embeddings (RoPE), Flash Attention, and a native long context window (up to 8,192 tokens). For the classification task, a linear classification head is added on top of the BERT encoder outputs. |
|
|
|
**Task**: **Multi-class News Classification** |
|
- The model classifies English news headlines or short texts into one of 15 categories. |
|
|
|
**Use Cases**: |
|
- Automatically tagging news headlines with appropriate categories in editorial pipelines. |
|
- Classifying short text blurbs for social media or aggregator systems. |
|
- Building a quick filter for content-based recommendation engines. |
|
--- |
|
## Intended Uses & Limitations |
|
|
|
- **Intended for**: Users who need to categorize short English news texts into broad topics. |
|
- **Language**: Trained primarily on **English** texts. Performance on non-English text is not guaranteed. |
|
- **Limitations**: |
|
- Certain categories (e.g., `BLACK VOICES`, `QUEER VOICES`) may contain nuanced language that could lead to misclassification if context is limited or if the text is ambiguous. |
|
--- |
|
|
|
## Training and Evaluation Data |
|
|
|
- **Dataset**: Curated from an English news-category dataset with 15 labels (e.g., `POLITICS`, `ENTERTAINMENT`, `SPORTS`, `BUSINESS`, etc.). |
|
- **Data Size**: ~30,000 samples in total, balanced at 2,000 samples per category. |
|
- **Split**: 90% training (27,000 samples) and 10% testing (3,000 samples). |
|
|
|
### Categories |
|
|
|
1. POLITICS |
|
2. WELLNESS |
|
3. ENTERTAINMENT |
|
4. TRAVEL |
|
5. STYLE & BEAUTY |
|
6. PARENTING |
|
7. HEALTHY LIVING |
|
8. QUEER VOICES |
|
9. FOOD & DRINK |
|
10. BUSINESS |
|
11. COMEDY |
|
12. SPORTS |
|
13. BLACK VOICES |
|
14. HOME & LIVING |
|
15. PARENTS |
|
|
|
--- |
|
|
|
## Training Procedure |
|
|
|
### Hyperparameters |
|
|
|
| Hyperparameter | Value | |
|
|------------------------------:|:-----------------------| |
|
| **learning_rate** | 5e-05 | |
|
| **train_batch_size** | 8 | |
|
| **eval_batch_size** | 4 | |
|
| **seed** | 42 | |
|
| **gradient_accumulation_steps** | 2 | |
|
| **total_train_batch_size** | 16 (8 x 2) | |
|
| **optimizer** | `adamw_torch_fused` (betas=(0.9,0.999), epsilon=1e-08) | |
|
| **lr_scheduler_type** | linear | |
|
| **lr_scheduler_warmup_steps**| 100 | |
|
| **num_epochs** | 5 | |
|
|
|
**Optimizer**: Used `AdamW` with fused kernels (`adamw_torch_fused`) for efficiency. |
|
**Loss Function**: Cross-entropy (with weighted F1 as metric). |
|
|
|
--- |
|
|
|
## Training Results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | F1 (Weighted) | |
|
|:-------------:|:------:|:----:|:---------------:|:-------------:| |
|
| 2.6251 | 1.0 | 1688 | 1.3810 | 0.5543 | |
|
| 1.9267 | 2.0 | 3376 | 1.4378 | 0.5588 | |
|
| 0.6349 | 3.0 | 5064 | 2.1705 | 0.5415 | |
|
| 0.1273 | 4.0 | 6752 | 2.9007 | 0.5402 | |
|
| 0.0288 | 4.9973 | 8435 | 3.1201 | 0.5475 | |
|
|
|
- **Best Weighted F1** observed near the final epochs is **~0.55** on the validation set. |
|
|
|
--- |
|
|
|
## Inference Example |
|
|
|
Below are two ways to use this model: via a **pipeline** and by using the **model & tokenizer** directly. |
|
|
|
### 1) Quick Start with `pipeline` |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
# Instantiate the pipeline |
|
classifier = pipeline( |
|
"text-classification", |
|
model="Sengil/ModernBERT-NewsClassifier-EN-small" |
|
) |
|
|
|
# Sample text |
|
text = "The President pledges new infrastructure initiatives amid economic concerns." |
|
outputs = classifier(text) |
|
|
|
# Output: [{'label': 'POLITICS', 'score': 0.95}, ...] |
|
print(outputs) |
|
``` |
|
|
|
### 2) Direct Model Usage |
|
|
|
```python |
|
import torch |
|
import torch.nn.functional as F |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
model_name = "Sengil/ModernBERT-NewsClassifier-EN-small" |
|
|
|
# Load model & tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
sample_text = "Local authorities call for better healthcare policies." |
|
inputs = tokenizer(sample_text, return_tensors="pt", truncation=True, max_length=512) |
|
|
|
with torch.no_grad(): |
|
logits = model(**inputs).logits |
|
|
|
# Convert logits to probabilities |
|
probs = F.softmax(logits, dim=1)[0] |
|
predicted_label_id = torch.argmax(probs).item() |
|
|
|
# Get the label string |
|
id2label = model.config.id2label |
|
predicted_label = id2label[predicted_label_id] |
|
confidence_score = probs[predicted_label_id].item() |
|
|
|
print(f"Predicted Label: {predicted_label} | Score: {confidence_score:.4f}") |
|
``` |
|
|
|
--- |
|
|
|
## Additional Information |
|
|
|
- **Framework Versions**: |
|
- **Transformers**: 4.49.0.dev0 |
|
- **PyTorch**: 2.5.1+cu121 |
|
- **Datasets**: 3.2.0 |
|
- **Tokenizers**: 0.21.0 |
|
|
|
- **License**: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
|
- **Intellectual Property**: The original ModernBERT base model is provided by [answerdotai](https://huggingface.co/answerdotai). This fine-tuned checkpoint inherits the same license. |
|
|
|
--- |
|
|
|
**Citation** (If you use or extend this model in your research or applications, please consider citing it): |
|
``` |
|
@misc{ModernBERTNewsClassifierENsmall, |
|
title={ModernBERT-NewsClassifier-EN-small}, |
|
author={Mert Sengil}, |
|
year={2025}, |
|
howpublished={\url{https://huggingface.co/Sengil/ModernBERT-NewsClassifier-EN-small}}, |
|
} |
|
``` |