--- datasets: - sanaa-11/fake-real-arabic-news language: - ar pipeline_tag: text-classification tags: - sklearn - text-classification - random-forest - fake-news-detection --- # **Arabic Real Fake news Classification Models Repository** Welcome to the repository containing multiple classification models trained and evaluated on a arabic news classification task to fake and real. Below, you'll find details about each model, including its functionality, performance metrics, and potential use cases. --- ## **Models Overview and Performance** This repository includes the following models with their corresponding accuracies: | Model Name | Accuracy | |------------------------------|----------| | Logistic Regression Model | 0.97 | | Decision Tree Model | 0.94 | | Gradient Boosting Classifier | 0.91 | | Random Forest Classifier | 0.99 | --- ## **Model Descriptions** ### **1. Logistic Regression Model** #### Overview: Logistic Regression is a statistical model used for binary or multiclass classification. It predicts the probability of an instance belonging to a specific class using a sigmoid function. #### Features: - Fast and computationally efficient. - Performs well on linearly separable data. - Provides probabilistic predictions. #### Use Cases: - Classifying Arabic news articles into predefined categories (e.g., politics, sports, technology). - Interpretable model with coefficients indicating feature importance. --- ### **2. Decision Tree Model** #### Overview: The Decision Tree model builds a tree-like structure where each node represents a decision rule and each leaf represents a class label. It is simple yet powerful for many classification tasks. #### Features: - Easy to interpret and visualize. - Handles both numerical and categorical data. - Prone to overfitting on noisy data. #### Use Cases: - Classifying Arabic news articles into different categories. - Tasks where interpretability is crucial. --- ### **3. Gradient Boosting Classifier** #### Overview: Gradient Boosting is an ensemble learning method that builds multiple weak learners (typically decision trees) and combines them to improve overall performance. #### Features: - Excellent for handling non-linear relationships. - Robust to overfitting with proper hyperparameter tuning. - Handles imbalanced datasets well. #### Use Cases: - Classifying complex Arabic news articles with nuanced patterns. - Scenarios requiring high predictive performance. --- ### **4. Random Forest Classifier** #### Overview: Random Forest is a powerful ensemble method that builds multiple decision trees and averages their predictions to improve accuracy and reduce overfitting. #### Features: - High accuracy and robustness to noise. - Handles large datasets with higher dimensionality. - Reduces overfitting compared to individual decision trees. #### Use Cases: - Predicting the category of Arabic news articles. - Applications requiring feature importance insights. --- ## **How to Use the Models** All models are saved as `.joblib` files and can be easily loaded into your machine learning pipeline. Below is an example of how to use the **Random Forest Classifier** with Arabic news data: ```python import joblib # Load the model model = joblib.load("RandomForestClassifier_model.joblib") # Example input: Arabic news text input_data = [ "أعلن المنتخب الوطني المغربي عن التشكيلة الرسمية التي ستشارك في المباراة القادمة ضمن تصفيات كأس العالم." ] # Get prediction prediction = model.predict(input_data) print(f"Predicted class: {prediction}") ``` --- #### **This work was in collaboration between**: Sanaa ABRIL and Sihame Mouanid