File size: 3,782 Bytes
caef608 ce69e83 5c8f122 caef608 b3c7d21 caef608 b3c7d21 9ef4994 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
---
datasets:
- sanaa-11/fake-real-arabic-news
language:
- ar
pipeline_tag: text-classification
tags:
- sklearn
- text-classification
- random-forest
- fake-news-detection
---
# **Arabic Real Fake news Classification Models Repository**
Welcome to the repository containing multiple classification models trained and evaluated on a arabic news classification task to fake and real. Below, you'll find details about each model, including its functionality, performance metrics, and potential use cases.
---
## **Models Overview and Performance**
This repository includes the following models with their corresponding accuracies:
| Model Name | Accuracy |
|------------------------------|----------|
| Logistic Regression Model | 0.97 |
| Decision Tree Model | 0.94 |
| Gradient Boosting Classifier | 0.91 |
| Random Forest Classifier | 0.99 |
---
## **Model Descriptions**
### **1. Logistic Regression Model**
#### Overview:
Logistic Regression is a statistical model used for binary or multiclass classification. It predicts the probability of an instance belonging to a specific class using a sigmoid function.
#### Features:
- Fast and computationally efficient.
- Performs well on linearly separable data.
- Provides probabilistic predictions.
#### Use Cases:
- Classifying Arabic news articles into predefined categories (e.g., politics, sports, technology).
- Interpretable model with coefficients indicating feature importance.
---
### **2. Decision Tree Model**
#### Overview:
The Decision Tree model builds a tree-like structure where each node represents a decision rule and each leaf represents a class label. It is simple yet powerful for many classification tasks.
#### Features:
- Easy to interpret and visualize.
- Handles both numerical and categorical data.
- Prone to overfitting on noisy data.
#### Use Cases:
- Classifying Arabic news articles into different categories.
- Tasks where interpretability is crucial.
---
### **3. Gradient Boosting Classifier**
#### Overview:
Gradient Boosting is an ensemble learning method that builds multiple weak learners (typically decision trees) and combines them to improve overall performance.
#### Features:
- Excellent for handling non-linear relationships.
- Robust to overfitting with proper hyperparameter tuning.
- Handles imbalanced datasets well.
#### Use Cases:
- Classifying complex Arabic news articles with nuanced patterns.
- Scenarios requiring high predictive performance.
---
### **4. Random Forest Classifier**
#### Overview:
Random Forest is a powerful ensemble method that builds multiple decision trees and averages their predictions to improve accuracy and reduce overfitting.
#### Features:
- High accuracy and robustness to noise.
- Handles large datasets with higher dimensionality.
- Reduces overfitting compared to individual decision trees.
#### Use Cases:
- Predicting the category of Arabic news articles.
- Applications requiring feature importance insights.
---
## **How to Use the Models**
All models are saved as `.joblib` files and can be easily loaded into your machine learning pipeline. Below is an example of how to use the **Random Forest Classifier** with Arabic news data:
```python
import joblib
# Load the model
model = joblib.load("RandomForestClassifier_model.joblib")
# Example input: Arabic news text
input_data = [
"أعلن المنتخب الوطني المغربي عن التشكيلة الرسمية التي ستشارك في المباراة القادمة ضمن تصفيات كأس العالم."
]
# Get prediction
prediction = model.predict(input_data)
print(f"Predicted class: {prediction}")
```
---
#### **This work was in collaboration between**: Sanaa ABRIL and Sihame Mouanid
|