datasets:
- sanaa-11/fake-real-arabic-news
language:
- ar
pipeline_tag: text-classification
tags:
- sklearn
- text-classification
- random-forest
- fake-news-detection
Arabic Real Fake news Classification Models Repository
Welcome to the repository containing multiple classification models trained and evaluated on a arabic news classification task to fake and real. Below, you'll find details about each model, including its functionality, performance metrics, and potential use cases.
Models Overview and Performance
This repository includes the following models with their corresponding accuracies:
Model Name | Accuracy |
---|---|
Logistic Regression Model | 0.97 |
Decision Tree Model | 0.94 |
Gradient Boosting Classifier | 0.91 |
Random Forest Classifier | 0.99 |
Model Descriptions
1. Logistic Regression Model
Overview:
Logistic Regression is a statistical model used for binary or multiclass classification. It predicts the probability of an instance belonging to a specific class using a sigmoid function.
Features:
- Fast and computationally efficient.
- Performs well on linearly separable data.
- Provides probabilistic predictions.
Use Cases:
- Classifying Arabic news articles into predefined categories (e.g., politics, sports, technology).
- Interpretable model with coefficients indicating feature importance.
2. Decision Tree Model
Overview:
The Decision Tree model builds a tree-like structure where each node represents a decision rule and each leaf represents a class label. It is simple yet powerful for many classification tasks.
Features:
- Easy to interpret and visualize.
- Handles both numerical and categorical data.
- Prone to overfitting on noisy data.
Use Cases:
- Classifying Arabic news articles into different categories.
- Tasks where interpretability is crucial.
3. Gradient Boosting Classifier
Overview:
Gradient Boosting is an ensemble learning method that builds multiple weak learners (typically decision trees) and combines them to improve overall performance.
Features:
- Excellent for handling non-linear relationships.
- Robust to overfitting with proper hyperparameter tuning.
- Handles imbalanced datasets well.
Use Cases:
- Classifying complex Arabic news articles with nuanced patterns.
- Scenarios requiring high predictive performance.
4. Random Forest Classifier
Overview:
Random Forest is a powerful ensemble method that builds multiple decision trees and averages their predictions to improve accuracy and reduce overfitting.
Features:
- High accuracy and robustness to noise.
- Handles large datasets with higher dimensionality.
- Reduces overfitting compared to individual decision trees.
Use Cases:
- Predicting the category of Arabic news articles.
- Applications requiring feature importance insights.
How to Use the Models
All models are saved as .joblib
files and can be easily loaded into your machine learning pipeline. Below is an example of how to use the Random Forest Classifier with Arabic news data:
import joblib
# Load the model
model = joblib.load("RandomForestClassifier_model.joblib")
# Example input: Arabic news text
input_data = [
"أعلن المنتخب الوطني المغربي عن التشكيلة الرسمية التي ستشارك في المباراة القادمة ضمن تصفيات كأس العالم."
]
# Get prediction
prediction = model.predict(input_data)
print(f"Predicted class: {prediction}")