TerminatorPower
/

bert-news-classif-turkish

Text Classification

Inference Endpoints

Model card Files Files and versions Community

bert-news-classif-turkish / README.md

TerminatorPower's picture

TerminatorPower

Update README.md

ed9687a verified 6 months ago

|

2.54 kB

	---
	library_name: transformers
	tags:
	- bert
	- berturk
	language:
	- tr
	pipeline_tag: text-classification
	---

	# Model Card for Model ID
	Turkish news classifier.


	### Model Description

	11 classes are present:
	'turkiye': 0, 'ekonomi': 1, 'dunya': 2, 'spor': 3, 'magazin': 4, 'guncel': 5, 'genel': 6, 'siyaset': 7, 'saglik': 8, 'kultur-sanat': 9, 'teknoloji': 10, 'yasam': 11

	The model is a finetuned bert-base-multilingual-uncased model.
	The model is not originally a classifier model, so classifier weights were trained completely using the turkish dataset. 🤗

	Eval loss: train_loss': 0.8327703781731708
	Train loss:0.8896290063858032
	Eval train split: 0.2/0.8

	- Developed by: [Ezel Bayraktar]
	- Model type: [Classifier]
	- Language(s) (NLP): [Turkish]
	- License: [MIT License]
	- Finetuned from model [optional]: [bert-base-multilingual-uncased ]


	## How to Get Started with the Model

	Use the code below to get started with the model.

	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	model_name = "TerminatorPower/bert-news-classif-turkish"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	model.eval()

	reverse_label_mapping = {
	0: "label_0",
	1: "label_1",
	2: "label_2",
	3: "label_3",
	4: "label_4",
	5: "label_5",
	6: "label_6",
	7: "label_7",
	8: "label_8",
	9: "label_9",
	10: "label_10",
	11: "label_11",
	12: "siyaset" # Example: Map index 12 back to "siyaset"
	}

	def predict(text):
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding="max_length", max_length=512)
	inputs = {key: value.to("cuda" if torch.cuda.is_available() else "cpu") for key, value in inputs.items()}
	model.to(inputs["input_ids"].device)
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.argmax(outputs.logits, dim=1)
	predicted_label = reverse_label_mapping[predictions.item()]
	return predicted_label

	if __name__ == "__main__":
	text = "Some example news text"
	print(f"Predicted label: {predict(text)}")


	## Training Details
	I used rtx 3060 12gb card to tain the training took 245 minutes in total

	learning_rate=5e-5,
	per_device_train_batch_size=20,
	per_device_eval_batch_size=20,
	num_train_epochs=7,

	### Training Data

	I used the kemik 42bin haber data set which you can access from this link
	http://www.kemik.yildiz.edu.tr/veri_kumelerimiz.html


	## Model Card Contact

	[email protected]