Spaces:

miracle01
/

taiwo-spam-detection-project-hnd2

Sleeping

App Files Files Community

miracle01 commited on Mar 28, 2024

Commit

80a9c82

verified ·

1 Parent(s): d0d3778

Upload 6 files

Browse files

Files changed (6) hide show

Naive_Bayes_Spam_Detection.joblib +3 -0
README.md +5 -4
app.py +79 -0
fcahpt.jpg +0 -0
requirements.txt +5 -0
tfidf_vectorizer.joblib +3 -0

Naive_Bayes_Spam_Detection.joblib ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e389ad0221c97b8034a27857fcc0fb707e4712dc73f46e22b20bb769a7ae35cc
+size 1062583

README.md CHANGED Viewed

@@ -1,12 +1,13 @@
 ---
-title: Taiwo Spam Detection Project Hnd2
-emoji: 📚
 colorFrom: blue
-colorTo: indigo
 sdk: streamlit
-sdk_version: 1.32.2
 app_file: app.py
 pinned: false
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: SpamClassifierNaiveBayes
+emoji: 😻
 colorFrom: blue
+colorTo: red
 sdk: streamlit
+sdk_version: 1.29.0
 app_file: app.py
 pinned: false
+license: apache-2.0
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py ADDED Viewed

	@@ -0,0 +1,79 @@

+from joblib import load
+from sklearn.feature_extraction.text import TfidfVectorizer
+import numpy as np
+import streamlit as st
+info = [
+    {"title": "NAME", "detail": "AKINBITAN TAIWO EMMANUEL"},
+    {"title": "MATRIC NO", "detail": "HNDCOM/22/032"},
+    {"title": "CLASS", "detail": "HND2"},
+    {"title": "LEVEL", "detail": "400L"},
+    {"title": "PROJECT SUPERVISOR", "detail": ""},
+]
+st.title("Project Information")
+for item in info:
+    st.write(f"{item['title']}: {item['detail']}")
+st.image('fcahpt.jpg', caption='federal college of animal health and production technology')
+st.header('Spam Detection using Naive Bayes Classifier')
+st.write('This is spam detection developed with python using Naive Bayes Classifier')
+vectorizer = load('tfidf_vectorizer.joblib')
+user_input = st.text_area("Enter some text:", "")
+if user_input is not None:
+    x = vectorizer.transform([user_input])
+    model = load('Naive_Bayes_Spam_Detection.joblib')
+    pred = model.predict(x)
+    if pred[0] == 1:
+        st.markdown("<b>Prediction: <span style='color:red'>The entered text is likey to be a Spam, be careful </span></b>", unsafe_allow_html=True)
+    elif pred[0] == 0:
+        st.markdown("<b>Prediction: <span style='color:green'>The entered text is not a Spam and safe</span></b>", unsafe_allow_html=True)
+    else:
+        st.write('Error, Try again')
+st.header("Project Description")
+st.markdown("""
+    Spam Detection using Naive Bayes Classifier is a classic and effective approach for automatically identifying spam emails or messages.
+    In a comprehensive approach of how it works;
+""")
+st.header("1. Data Collection and Preprocessing:")
+st.markdown("""
+    - The process begins with collecting a dataset of emails or messages labeled as spam or non-spam (ham).
+    - Each message undergoes preprocessing steps such as removing HTML tags, punctuation, and stopwords (commonly occurring words like "and", "the", etc.).
+    - The text is then tokenized and transformed into numerical representations using techniques like TF-IDF (Term Frequency-Inverse Document Frequency) or Count Vectorization.
+""")
+st.header("2. Understanding Naive Bayes Classifier:")
+st.markdown("""
+    - Naive Bayes is a probabilistic classification algorithm based on Bayes' theorem, which calculates the probability of a certain event happening given the occurrence of another event.
+    - The "naive" assumption in Naive Bayes is that the features are conditionally independent given the class label. This simplifies the calculation and makes the algorithm computationally efficient.
+""")
+st.header("3. Training the Naive Bayes Model:")
+st.markdown("""
+    - The dataset is split into training and testing sets.
+    - During training, the Naive Bayes classifier learns the probability distribution of words or features given each class (spam or ham).
+    - It calculates the prior probabilities of spam and ham messages and the likelihood probabilities of each word occurring in spam and ham messages.
+    - These probabilities are estimated from the training data using maximum likelihood estimation or other smoothing techniques.
+""")
+st.header("4. Classification:")
+st.markdown("""
+    - Once the model is trained, it can classify new, unseen messages.
+    - Given a new message, the classifier calculates the probability that it belongs to each class (spam or ham) using Bayes' theorem.
+    - The final classification decision is based on the class with the highest probability. If the probability of a message being spam is higher than a predefined threshold, it's classified as spam; otherwise, it's classified as ham.
+""")
+st.header("5. Model Evaluation:")
+st.markdown("""
+    - The performance of the Naive Bayes classifier is evaluated using metrics such as accuracy, precision, recall, and F1-score on a separate test dataset.
+    - These metrics help assess how well the model generalizes to unseen data and its effectiveness in distinguishing between spam and non-spam messages.
+""")
+st.header("6. Deployment and Fine-Tuning:")
+st.markdown("""
+    - Once the model is trained and evaluated, it can be deployed for real-world use.
+    - Deployment may involve integrating the model into email systems or messaging platforms to automatically filter spam messages.
+    - Periodic updates and fine-tuning of the model may be necessary to adapt to changing spamming techniques and patterns.
+""")

fcahpt.jpg ADDED Viewed

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+scikit-learn
+joblib
+streamlit
+numpy
+pandas

tfidf_vectorizer.joblib ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2250f89134c52246b8898de941d5d36273433b5df1840d12379e459967e8e819
+size 1150476