File size: 2,538 Bytes

---
library_name: transformers
tags:
- bert
- berturk
language:
- tr
pipeline_tag: text-classification
---

# Model Card for Model ID
Turkish news classifier. 


### Model Description

11 classes are present:
'turkiye': 0, 'ekonomi': 1, 'dunya': 2, 'spor': 3, 'magazin': 4, 'guncel': 5, 'genel': 6, 'siyaset': 7, 'saglik': 8, 'kultur-sanat': 9, 'teknoloji': 10, 'yasam': 11

The model is a finetuned bert-base-multilingual-uncased model. 
The model is not originally a classifier model, so classifier weights were trained completely using the turkish dataset. 🤗

Eval loss: train_loss': 0.8327703781731708
Train loss:0.8896290063858032
Eval train split: 0.2/0.8

- **Developed by:** [Ezel Bayraktar]
- **Model type:** [Classifier]
- **Language(s) (NLP):** [Turkish]
- **License:** [MIT License]
- **Finetuned from model [optional]:** [bert-base-multilingual-uncased ]


## How to Get Started with the Model

Use the code below to get started with the model.

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "TerminatorPower/bert-news-classif-turkish"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

reverse_label_mapping = {
    0: "label_0",
    1: "label_1",
    2: "label_2",
    3: "label_3",
    4: "label_4",
    5: "label_5",
    6: "label_6",
    7: "label_7",
    8: "label_8",
    9: "label_9",
    10: "label_10",
    11: "label_11",
    12: "siyaset"  # Example: Map index 12 back to "siyaset"
}

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding="max_length", max_length=512)
    inputs = {key: value.to("cuda" if torch.cuda.is_available() else "cpu") for key, value in inputs.items()}
    model.to(inputs["input_ids"].device)
    with torch.no_grad():
        outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=1)
    predicted_label = reverse_label_mapping[predictions.item()]
    return predicted_label

if __name__ == "__main__":
    text = "Some example news text"
    print(f"Predicted label: {predict(text)}")


## Training Details
I used rtx 3060 12gb card to tain the training took 245 minutes in total

learning_rate=5e-5,
per_device_train_batch_size=20,
per_device_eval_batch_size=20,
num_train_epochs=7,

### Training Data

I used the kemik 42bin haber data set which you can access from this link
http://www.kemik.yildiz.edu.tr/veri_kumelerimiz.html


## Model Card Contact

[email protected]