Spaces:

Souha-BH
/

ResumeJobMatching

Runtime error

App Files Files Community

Souha Ben Hassine commited on Apr 23, 2024

Commit

4bd843b

1 Parent(s): 497f6e8

add tabs

Browse files

Files changed (2) hide show

README.md +9 -1
app.py +61 -42

README.md CHANGED Viewed

@@ -17,4 +17,12 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
 pip install -r requirements.txt
 python3 -m spacy download en_core_web_sm
-```

 pip install -r requirements.txt
 python3 -m spacy download en_core_web_sm
+```
+input_resume = "Abid Ali Awan Data Scientist I am a certified data scientist professional, who loves building machine learning models and blogs about the latest AI technologies. I am currently testing AI Products at PEC-PITC, which later gets approved for human trials. [email protected] +923456855126 Islamabad, Pakistan abidaliawan.me WORK EXPERIENCE Data Scientist Pakistan Innovation and Testing Center - PEC 04/2021 - Present, Islamabad, Pakistan Redesigned data of engineers that were mostly scattered and unavailable. Designed dashboard and data analysis report to help higher management make better decisions. Accessibility of key information has created a new culture of making data-driven decisions. Contact: Ali Raza Asif - [email protected] Data Scientist Freelancing/Kaggle 11/2020 - Present, Islamabad, Pakistan Engineered a healthcare system. Used machine learning to detect some of the common decisions. The project has paved the way for others to use new techniques to get better results. Participated in Kaggle machine learning competitions. Learned new techniques to get a better score and finally got to 1 percent rank. Researcher / Event Organizer CREDIT 02/2017 - 07/2017, Kuala Lumpur, Malaysia Marketing for newly build research lab. Organized technical events and successfully invited the multiple company's CEO for talks. Reduced the gap between industries and educational institutes. Research on new development in the IoT sector. Created research proposal for funding. Investigated the new communication protocol for IoT devices. Contact: Dr. Tan Chye Cheah - [email protected] EDUCATION MSc in Technology Management Staffordshire University 11/2015 - 04/2017, Postgraduate with Distinction Challenges in Implementing IoT-enabled Smart cities in Malaysia. Bachelors Electrical Telecommunication Engineering COMSATS Institute of Information Technology, Islamabad 08/2010 - 01/2014, CGPA: 3.09 Networking Satellite communications Programming/ Matlab Telecommunication Engineering SKILLS Designing Leadership Media/Marketing R/Python SQL Tableau NLP Data Analysis Machine learning Deep learning Webapp/Cloud Feature Engineering Ensembling Time Series Technology Management ACHIEVEMENTS 98th Hungry Geese Simulation Competition (08/2021) 2nd in Covid-19 vaccinations around the world (07/2021) 8th in Automatic Speech Recognition in WOLOF (06/2021) Top 10 in WiDS Datathon. (03/2021) 40th / 622 in MagNet: Model the Geomagnetic Field Hosted by NOAA (02/2021) 18th in Rock, Paper, Scissors/Designing AI Agent Competition. (02/2021) PROJECTS Goodreads Profile Analysis WebApp (09/2021) Data Analysis Web Scraping XLM Interactive Visualization Contributed in orchest.io (08/2021) Testing and Debuging Technical Article Proposing new was to Improve ML pipelines World Vaccine Update System (06/2021) Used sqlite3 for database Automated system for daily update the Kaggle DB and Analysis Interactive dashboard mRNA-Vaccine-Degradation-Prediction (06/2021) Explore our dataset and then preprocessed sequence, structure, and predicted loop type features Train deep learning GRU model Trip Advisor Data Analysis/ML (04/2021) Preprocessing Data, Exploratory Data analysis, Word clouds. Feature Engineering, Text processing. BiLSTM Model for predicting rating, evaluation, model performance. Jane Street Market Prediction (03/2021) EDA, Feature Engineering, experimenting with hyperparameters. Ensembling: Resnet, NN Embeddings, TF Simple NN model. Using simple MLP pytorch model. Achievements/Tasks Achievements/Tasks Achievements/Tasks Thesis Course"
+input_skills = "Data Science,Data Analysis,Database,Machine Learning,tableau"

app.py CHANGED Viewed

@@ -1,9 +1,6 @@
 import gradio as gr
 import pandas as pd
 import spacy
-from spacy.pipeline import EntityRuler
-from spacy.lang.en import English
-from spacy.tokens import Doc
 from spacy import displacy
 import plotly.express as px
 import numpy as np
@@ -14,26 +11,27 @@ from nltk.stem import WordNetLemmatizer
 nltk.download(['stopwords','wordnet'])
 nltk.download('omw-1.4')
 # Load the CSV file into a DataFrame
 dataset_path = "Resume.csv"
 df = pd.read_csv(dataset_path)
 df= df.reindex(np.random.permutation(df.index))
-data = df.copy().iloc[0:200,]
 # Load the spaCy English language model with large vocabulary and pre-trained word vectors
-nlp = spacy.load("en_core_web_lg")
-# Path to the file containing skill patterns in JSONL format
 skill_pattern_path = "jz_skill_patterns.jsonl"
 # Add an entity ruler to the spaCy pipeline
-ruler = nlp.add_pipe("entity_ruler")
 # Load skill patterns from disk into the entity ruler
 ruler.from_disk(skill_pattern_path)
 def get_unique_skills(text):
-    doc = nlp(text)
     skills = set()
     for ent in doc.ents:
         if ent.label_ == "SKILL":
@@ -61,12 +59,6 @@ data["Clean_Resume"] = data["Resume_str"].apply(preprocess_resume)
 # Extract skills from each preprocessed resume and store them in a new column
 data["skills"] = data["Clean_Resume"].str.lower().apply(get_unique_skills)
-print(data)
-Job_cat = data["Category"].unique()
-Job_cat = np.append(Job_cat, "ALL")
-Job_Category = "INFORMATION-TECHNOLOGY"
 def get_skills_distribution(Job_Category):
     if Job_Category != "ALL":
         filtered_data = data[data["Category"] == Job_Category]["skills"]
@@ -83,7 +75,6 @@ def get_skills_distribution(Job_Category):
     return fig.show()
-get_skills_distribution(Job_Category)
 # Apply the preprocess_resume function to each resume string and store the result in a new column
 data["Clean_Resume"] = data["Resume_str"].apply(preprocess_resume)
@@ -96,20 +87,8 @@ for a in patterns:
     ruler.add_patterns([{"label": "Job-Category", "pattern": a}])
-# Load the spaCy model
-nlp = spacy.load("en_core_web_sm")
-# Define the styles and options for highlighting entities
-colors = {
-    "Job-Category": "linear-gradient(90deg, #aa9cfc, #fc9ce7)",
-    "SKILL": "linear-gradient(90deg, #9BE15D, #00E3AE)",
-    "ORG": "#ffd966",
-    "PERSON": "#e06666",
-    "GPE": "#9fc5e8",
-    "DATE": "#c27ba0",
-    "ORDINAL": "#674ea7",
-    "PRODUCT": "#f9cb9c",
-}
 options = {
     "ents": [
         "Job-Category",
@@ -121,26 +100,66 @@ options = {
         "ORDINAL",
         "PRODUCT",
     ],
-    "colors": colors,
 }
 # Define a function to process the resume text and highlight entities
 def highlight_entities(resume_text):
     # Process the resume text with spaCy
-    doc = nlp(resume_text)
     # Render the entities with displacy and return the HTML
     html = displacy.render(doc, style="ent", options=options, jupyter=False)
     return html
-# Create the Gradio interface
-iface = gr.Interface(
-    fn=highlight_entities,
-    inputs=gr.Textbox(lines=10, label="Input Resume Text"),
-    outputs=gr.HTML(label="Highlighted Entities"),
-    title="Resume Entity Highlighter",
-    description="Enter your resume text and see entities highlighted.",
-    theme="compact"
-)
-# Launch the interface
-iface.launch()

 import gradio as gr
 import pandas as pd
 import spacy
 from spacy import displacy
 import plotly.express as px
 import numpy as np
 nltk.download(['stopwords','wordnet'])
 nltk.download('omw-1.4')
 # Load the CSV file into a DataFrame
 dataset_path = "Resume.csv"
 df = pd.read_csv(dataset_path)
 df= df.reindex(np.random.permutation(df.index))
+data = df.copy().iloc[0:500,]
 # Load the spaCy English language model with large vocabulary and pre-trained word vectors
+spacy_model = spacy.load("en_core_web_lg")
+# Path to the file containing skill patterns in JSONL format (2129 skills)
 skill_pattern_path = "jz_skill_patterns.jsonl"
 # Add an entity ruler to the spaCy pipeline
+ruler = spacy_model.add_pipe("entity_ruler")
 # Load skill patterns from disk into the entity ruler
 ruler.from_disk(skill_pattern_path)
 def get_unique_skills(text):
+    doc = spacy_model(text)
     skills = set()
     for ent in doc.ents:
         if ent.label_ == "SKILL":
 # Extract skills from each preprocessed resume and store them in a new column
 data["skills"] = data["Clean_Resume"].str.lower().apply(get_unique_skills)
 def get_skills_distribution(Job_Category):
     if Job_Category != "ALL":
         filtered_data = data[data["Category"] == Job_Category]["skills"]
     return fig.show()
 # Apply the preprocess_resume function to each resume string and store the result in a new column
 data["Clean_Resume"] = data["Resume_str"].apply(preprocess_resume)
     ruler.add_patterns([{"label": "Job-Category", "pattern": a}])
+# Define the options for highlighting entities
 options = {
     "ents": [
         "Job-Category",
         "ORDINAL",
         "PRODUCT",
     ],
 }
 # Define a function to process the resume text and highlight entities
 def highlight_entities(resume_text):
     # Process the resume text with spaCy
+    doc = spacy_model(resume_text)
     # Render the entities with displacy and return the HTML
     html = displacy.render(doc, style="ent", options=options, jupyter=False)
     return html
+def calculate_semantic_similarity(required_skills, resume_skills):
+    """
+    Calculate the semantic similarity between required skills and resume skills.
+    """
+    required_skills_str = " ".join(required_skills)
+    resume_skills_str = " ".join(resume_skills)
+    required_skills_doc = spacy_model(required_skills_str)
+    resume_skills_doc = spacy_model(resume_skills_str)
+    similarity_score = required_skills_doc.similarity(resume_skills_doc)
+    return similarity_score
+def find_matching_resumes(input_skills, n=5):
+    """
+    Find and rank the top matching resumes based on input skills.
+    """
+    req_skills = input_skills.lower().split(",")
+    ranked_resumes = []
+    for idx, row in data.iterrows():
+        resume_skills = row['skills']
+        similarity_score = calculate_semantic_similarity(req_skills, resume_skills)
+        ranked_resumes.append((idx, similarity_score))
+    # Sort resumes by similarity scores in descending order
+    ranked_resumes.sort(key=lambda x: x[1], reverse=True)
+    # Get the top N matching resumes
+    top_matching_resumes = ranked_resumes[:n]
+    # Construct output in a structured format
+    output = []
+    for resume_id, score in top_matching_resumes:
+        output.append(f"Similarity Score: {score}\nResume ID: {resume_id}")
+    return output
+with gr.Blocks() as demo:
+    gr.Markdown("Enter your resume text and perform NER, or enter the required skills and find the top matching resumes.")
+    with gr.Tab("Enter your resume text and perform NER"):
+        text_input = gr.Textbox(lines=10, label="Input Resume Text")
+        text_output = gr.HTML(label="Highlighted Entities")
+        text_button = gr.Button("Submit")
+    with gr.Tab("Enter the required skills (comma-separated) and find the top matching resumes."):
+        text_input2 = gr.Textbox(lines=5, label="Input Required Skills (comma-separated)")
+        text_output2 = gr.Textbox(label="Top Matching Resumes")
+        text_button2 = gr.Button("Submit")
+    text_button.click(highlight_entities, inputs=text_input, outputs=text_output)
+    text_button2.click(find_matching_resumes, inputs=text_input2, outputs=text_output2)
+demo.launch()