File size: 3,782 Bytes

---
datasets:
- sanaa-11/fake-real-arabic-news
language:
- ar
pipeline_tag: text-classification
tags:
- sklearn
- text-classification
- random-forest
- fake-news-detection
---
# **Arabic Real Fake news Classification Models Repository**

Welcome to the repository containing multiple classification models trained and evaluated on a arabic news classification task to fake and real. Below, you'll find details about each model, including its functionality, performance metrics, and potential use cases.

---

## **Models Overview and Performance**

This repository includes the following models with their corresponding accuracies:

| Model Name                   | Accuracy |
|------------------------------|----------|
| Logistic Regression Model    | 0.97     |
| Decision Tree Model          | 0.94     |
| Gradient Boosting Classifier | 0.91     |
| Random Forest Classifier     | 0.99     |

---

## **Model Descriptions**

### **1. Logistic Regression Model**
#### Overview:
Logistic Regression is a statistical model used for binary or multiclass classification. It predicts the probability of an instance belonging to a specific class using a sigmoid function.

#### Features:
- Fast and computationally efficient.
- Performs well on linearly separable data.
- Provides probabilistic predictions.

#### Use Cases:
- Classifying Arabic news articles into predefined categories (e.g., politics, sports, technology).
- Interpretable model with coefficients indicating feature importance.

---

### **2. Decision Tree Model**
#### Overview:
The Decision Tree model builds a tree-like structure where each node represents a decision rule and each leaf represents a class label. It is simple yet powerful for many classification tasks.

#### Features:
- Easy to interpret and visualize.
- Handles both numerical and categorical data.
- Prone to overfitting on noisy data.

#### Use Cases:
- Classifying Arabic news articles into different categories.
- Tasks where interpretability is crucial.

---

### **3. Gradient Boosting Classifier**
#### Overview:
Gradient Boosting is an ensemble learning method that builds multiple weak learners (typically decision trees) and combines them to improve overall performance.

#### Features:
- Excellent for handling non-linear relationships.
- Robust to overfitting with proper hyperparameter tuning.
- Handles imbalanced datasets well.

#### Use Cases:
- Classifying complex Arabic news articles with nuanced patterns.
- Scenarios requiring high predictive performance.

---

### **4. Random Forest Classifier**
#### Overview:
Random Forest is a powerful ensemble method that builds multiple decision trees and averages their predictions to improve accuracy and reduce overfitting.

#### Features:
- High accuracy and robustness to noise.
- Handles large datasets with higher dimensionality.
- Reduces overfitting compared to individual decision trees.

#### Use Cases:
- Predicting the category of Arabic news articles.
- Applications requiring feature importance insights.

---

## **How to Use the Models**

All models are saved as `.joblib` files and can be easily loaded into your machine learning pipeline. Below is an example of how to use the **Random Forest Classifier** with Arabic news data:

```python
import joblib

# Load the model
model = joblib.load("RandomForestClassifier_model.joblib")

# Example input: Arabic news text
input_data = [
    "أعلن المنتخب الوطني المغربي عن التشكيلة الرسمية التي ستشارك في المباراة القادمة ضمن تصفيات كأس العالم."
]

# Get prediction
prediction = model.predict(input_data)
print(f"Predicted class: {prediction}")
```
---
#### **This work was in collaboration between**: Sanaa ABRIL and Sihame Mouanid