--- datasets: - jcblaise/fake_news_filipino - SEACrowd/ph_fake_news_corpus language: - tl - en base_model: - FacebookAI/xlm-roberta-base pipeline_tag: text-classification tags: - fake-news-detection - text-classification - tagalog - filipino metrics: - accuracy - f1 - precision - recall --- # Tagalog Fake News Detection Model ## Overview This project implements a fake news detection model for Tagalog/Filipino using the XLM-RoBERTa base model with an accuracy of **95.46%**. ### Dataset - Total Size: 18,522 samples - Composition: 50/50 split of real and fake news - Languages: Filipino, English #### Dataset Split - Train Set: ~12,968 samples - Validation Set: ~2,784 samples - Test Set: ~2,770 samples ### Performance Metrics (on Evaluation Set) - Accuracy: 95.46% - F1 Score: 95.40% - Precision: 95.40% - Recall: 95.40% ## Data Sources The model was trained on a combined dataset from two primary sources: 1. [Fake News Filipino Dataset](https://huggingface.co/datasets/jcblaise/fake_news_filipino) - 3,206 rows used 2. [Philippine Fake News Corpus](https://huggingface.co/datasets/SEACrowd/ph_fake_news_corpus) - 15,312 rows used out of 22,458 available