ModernBERT-NewsClassifier-EN-small
This model is a fine-tuned version of answerdotai/ModernBERT-base on an English News Category dataset covering 15 distinct topics (e.g., Politics, Sports, Business, etc.). It achieves the following results on the evaluation set:
- Validation Loss:
3.1201
- Weighted F1 Score:
0.5475
Model Description
Architecture: This model is based on ModernBERT-base, an advanced Transformer architecture featuring Rotary Position Embeddings (RoPE), Flash Attention, and a native long context window (up to 8,192 tokens). For the classification task, a linear classification head is added on top of the BERT encoder outputs.
Task: Multi-class News Classification
- The model classifies English news headlines or short texts into one of 15 categories.
Use Cases: - Automatically tagging news headlines with appropriate categories in editorial pipelines. - Classifying short text blurbs for social media or aggregator systems. - Building a quick filter for content-based recommendation engines.
Intended Uses & Limitations
- Intended for: Users who need to categorize short English news texts into broad topics.
- Language: Trained primarily on English texts. Performance on non-English text is not guaranteed.
- Limitations:
- Certain categories (e.g.,
BLACK VOICES
,QUEER VOICES
) may contain nuanced language that could lead to misclassification if context is limited or if the text is ambiguous.
- Certain categories (e.g.,
Training and Evaluation Data
- Dataset: Curated from an English news-category dataset with 15 labels (e.g.,
POLITICS
,ENTERTAINMENT
,SPORTS
,BUSINESS
, etc.). - Data Size: ~30,000 samples in total, balanced at 2,000 samples per category.
- Split: 90% training (27,000 samples) and 10% testing (3,000 samples).
Categories
- POLITICS
- WELLNESS
- ENTERTAINMENT
- TRAVEL
- STYLE & BEAUTY
- PARENTING
- HEALTHY LIVING
- QUEER VOICES
- FOOD & DRINK
- BUSINESS
- COMEDY
- SPORTS
- BLACK VOICES
- HOME & LIVING
- PARENTS
Training Procedure
Hyperparameters
Hyperparameter | Value |
---|---|
learning_rate | 5e-05 |
train_batch_size | 8 |
eval_batch_size | 4 |
seed | 42 |
gradient_accumulation_steps | 2 |
total_train_batch_size | 16 (8 x 2) |
optimizer | adamw_torch_fused (betas=(0.9,0.999), epsilon=1e-08) |
lr_scheduler_type | linear |
lr_scheduler_warmup_steps | 100 |
num_epochs | 5 |
Optimizer: Used AdamW
with fused kernels (adamw_torch_fused
) for efficiency.
Loss Function: Cross-entropy (with weighted F1 as metric).
Training Results
Training Loss | Epoch | Step | Validation Loss | F1 (Weighted) |
---|---|---|---|---|
2.6251 | 1.0 | 1688 | 1.3810 | 0.5543 |
1.9267 | 2.0 | 3376 | 1.4378 | 0.5588 |
0.6349 | 3.0 | 5064 | 2.1705 | 0.5415 |
0.1273 | 4.0 | 6752 | 2.9007 | 0.5402 |
0.0288 | 4.9973 | 8435 | 3.1201 | 0.5475 |
- Best Weighted F1 observed near the final epochs is ~0.55 on the validation set.
Inference Example
Below are two ways to use this model: via a pipeline and by using the model & tokenizer directly.
1) Quick Start with pipeline
from transformers import pipeline
# Instantiate the pipeline
classifier = pipeline(
"text-classification",
model="Sengil/ModernBERT-NewsClassifier-EN-small"
)
# Sample text
text = "The President pledges new infrastructure initiatives amid economic concerns."
outputs = classifier(text)
# Output: [{'label': 'POLITICS', 'score': 0.95}, ...]
print(outputs)
2) Direct Model Usage
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "Sengil/ModernBERT-NewsClassifier-EN-small"
# Load model & tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
sample_text = "Local authorities call for better healthcare policies."
inputs = tokenizer(sample_text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
# Convert logits to probabilities
probs = F.softmax(logits, dim=1)[0]
predicted_label_id = torch.argmax(probs).item()
# Get the label string
id2label = model.config.id2label
predicted_label = id2label[predicted_label_id]
confidence_score = probs[predicted_label_id].item()
print(f"Predicted Label: {predicted_label} | Score: {confidence_score:.4f}")
Additional Information
Framework Versions:
- Transformers: 4.49.0.dev0
- PyTorch: 2.5.1+cu121
- Datasets: 3.2.0
- Tokenizers: 0.21.0
License: Apache 2.0
Intellectual Property: The original ModernBERT base model is provided by answerdotai. This fine-tuned checkpoint inherits the same license.
Citation (If you use or extend this model in your research or applications, please consider citing it):
@misc{ModernBERTNewsClassifierENsmall,
title={ModernBERT-NewsClassifier-EN-small},
author={Mert Sengil},
year={2025},
howpublished={\url{https://huggingface.co/Sengil/ModernBERT-NewsClassifier-EN-small}},
}
- Downloads last month
- 2
Model tree for Sengil/ModernBERT-NewsClassifier-EN-small
Base model
answerdotai/ModernBERT-base