ModernBERT-NewsClassifier-EN-small

This model is a fine-tuned version of answerdotai/ModernBERT-base on an English News Category dataset covering 15 distinct topics (e.g., Politics, Sports, Business, etc.). It achieves the following results on the evaluation set:

  • Validation Loss: 3.1201
  • Weighted F1 Score: 0.5475

Model Description

Architecture: This model is based on ModernBERT-base, an advanced Transformer architecture featuring Rotary Position Embeddings (RoPE), Flash Attention, and a native long context window (up to 8,192 tokens). For the classification task, a linear classification head is added on top of the BERT encoder outputs.

Task: Multi-class News Classification

  • The model classifies English news headlines or short texts into one of 15 categories.

Use Cases: - Automatically tagging news headlines with appropriate categories in editorial pipelines. - Classifying short text blurbs for social media or aggregator systems. - Building a quick filter for content-based recommendation engines.

Intended Uses & Limitations

  • Intended for: Users who need to categorize short English news texts into broad topics.
  • Language: Trained primarily on English texts. Performance on non-English text is not guaranteed.
  • Limitations:
    • Certain categories (e.g., BLACK VOICES, QUEER VOICES) may contain nuanced language that could lead to misclassification if context is limited or if the text is ambiguous.

Training and Evaluation Data

  • Dataset: Curated from an English news-category dataset with 15 labels (e.g., POLITICS, ENTERTAINMENT, SPORTS, BUSINESS, etc.).
  • Data Size: ~30,000 samples in total, balanced at 2,000 samples per category.
  • Split: 90% training (27,000 samples) and 10% testing (3,000 samples).

Categories

  1. POLITICS
  2. WELLNESS
  3. ENTERTAINMENT
  4. TRAVEL
  5. STYLE & BEAUTY
  6. PARENTING
  7. HEALTHY LIVING
  8. QUEER VOICES
  9. FOOD & DRINK
  10. BUSINESS
  11. COMEDY
  12. SPORTS
  13. BLACK VOICES
  14. HOME & LIVING
  15. PARENTS

Training Procedure

Hyperparameters

Hyperparameter Value
learning_rate 5e-05
train_batch_size 8
eval_batch_size 4
seed 42
gradient_accumulation_steps 2
total_train_batch_size 16 (8 x 2)
optimizer adamw_torch_fused (betas=(0.9,0.999), epsilon=1e-08)
lr_scheduler_type linear
lr_scheduler_warmup_steps 100
num_epochs 5

Optimizer: Used AdamW with fused kernels (adamw_torch_fused) for efficiency.
Loss Function: Cross-entropy (with weighted F1 as metric).


Training Results

Training Loss Epoch Step Validation Loss F1 (Weighted)
2.6251 1.0 1688 1.3810 0.5543
1.9267 2.0 3376 1.4378 0.5588
0.6349 3.0 5064 2.1705 0.5415
0.1273 4.0 6752 2.9007 0.5402
0.0288 4.9973 8435 3.1201 0.5475
  • Best Weighted F1 observed near the final epochs is ~0.55 on the validation set.

Inference Example

Below are two ways to use this model: via a pipeline and by using the model & tokenizer directly.

1) Quick Start with pipeline

from transformers import pipeline

# Instantiate the pipeline
classifier = pipeline(
    "text-classification",
    model="Sengil/ModernBERT-NewsClassifier-EN-small"
)

# Sample text
text = "The President pledges new infrastructure initiatives amid economic concerns."
outputs = classifier(text)

# Output: [{'label': 'POLITICS', 'score': 0.95}, ...]
print(outputs)

2) Direct Model Usage

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "Sengil/ModernBERT-NewsClassifier-EN-small"

# Load model & tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

sample_text = "Local authorities call for better healthcare policies."
inputs = tokenizer(sample_text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs).logits

# Convert logits to probabilities
probs = F.softmax(logits, dim=1)[0]
predicted_label_id = torch.argmax(probs).item()

# Get the label string
id2label = model.config.id2label
predicted_label = id2label[predicted_label_id]
confidence_score = probs[predicted_label_id].item()

print(f"Predicted Label: {predicted_label} | Score: {confidence_score:.4f}")

Additional Information

  • Framework Versions:

    • Transformers: 4.49.0.dev0
    • PyTorch: 2.5.1+cu121
    • Datasets: 3.2.0
    • Tokenizers: 0.21.0
  • License: Apache 2.0

  • Intellectual Property: The original ModernBERT base model is provided by answerdotai. This fine-tuned checkpoint inherits the same license.


Citation (If you use or extend this model in your research or applications, please consider citing it):

@misc{ModernBERTNewsClassifierENsmall,
  title={ModernBERT-NewsClassifier-EN-small},
  author={Mert Sengil},
  year={2025},
  howpublished={\url{https://huggingface.co/Sengil/ModernBERT-NewsClassifier-EN-small}},
}
Downloads last month
2
Safetensors
Model size
150M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Sengil/ModernBERT-NewsClassifier-EN-small

Finetuned
(189)
this model