Spaces:
Runtime error
Runtime error
import streamlit as st | |
import pandas as pd | |
import torch | |
from backend import inference | |
from backend.config import MODELS_ID, QA_MODELS_ID, SEARCH_MODELS_ID | |
from backend.utils import load_gender_data | |
st.title('Flax-Sentence-Tranformers') | |
st.sidebar.image("./hf-sbert.jpg", width=300) | |
st.sidebar.title('Navigation') | |
menu = st.sidebar.radio("", options=["Contributions & Evaluation", "Sentence Similarity", "Asymmetric QA", "Search / Cluster", | |
"Gender Bias Evaluation"], index=0) | |
st.markdown(''' | |
**Sentence Transformers** is a set of frameworks & models that are trained to generate Embeddings from input sentences. | |
Generated Sentence Embeddings can be used for Sentence Similarity / Asymmetric QA / Semantic Search / Clustering | |
among other tasks. | |
We trained multiple general-purpose Sentence Transformers models based on different LMs including | |
distilroberta, mpnet and MiniLM-l6. They were trained using Siamese network configuration with custom **Contrastive Loss** | |
inspired by OpenAI CLIP. The models were trained on a dataset comprising of [1 Billion+ training corpus](https://huggingface.co/flax-sentence-embeddings/all_datasets_v4_MiniLM-L6#training-data) with the v3 setup. | |
We have trained [20 models](https://huggingface.co/flax-sentence-embeddings) focused on general-purpose, QuestionAnswering and Code search and **achieved SOTA on multiple benchmarks.** | |
We also uploaded [8 datasets](https://huggingface.co/flax-sentence-embeddings) specialized for Question Answering, Sentence-Similiarity and Gender Evaluation. | |
You can view our models and datasets [here](https://huggingface.co/flax-sentence-embeddings). | |
''') | |
if menu == "Contributions & Evaluation": | |
st.markdown(''' | |
## Contributions | |
- **20 Sentence Embedding models** that can be utilized for Sentence Simliarity / Asymmetric QA / Search & Clustering. | |
- **8 Datasets** from Stackexchange and StackOverflow, PAWS, Gender Evaluation uploaded to HuggingFace Hub. | |
- **Achieve SOTA** on multiple general purpose Sentence Similarity evaluation tasks by utilizing large TPU memory to maximize | |
customized Contrastive Loss. [Full Evaluation here](https://docs.google.com/spreadsheets/d/1vXJrIg38cEaKjOG5y4I4PQwAQFUmCkohbViJ9zj_Emg/edit#gid=1809754143). | |
- **Gender Bias demonstration** that explores inherent bias in general purpose datasets. | |
- **Search / Clustering demonstration** that showcases real world use-cases for Sentence Embeddings. | |
## Model Evaluations | |
| Model | [FullEvaluation](https://docs.google.com/spreadsheets/d/1vXJrIg38cEaKjOG5y4I4PQwAQFUmCkohbViJ9zj_Emg/edit#gid=1809754143) Average | 20Newsgroups Clustering | StackOverflow DupQuestions | Twitter SemEval2015 | | |
|-----------|---------------------------------------|-------|-------|-------| | |
| paraphrase-mpnet-base-v2 (previous SOTA) | 67.97 | 47.79 | 49.03 | 72.36 | | |
| **all_datasets_v3_roberta-large (400k steps)** | **70.22** | **50.12** | **52.18** | **75.28** | | |
| **all_datasets_v3_mpnet-base (440k steps)** | **70.01** | **50.22** | **52.24** | **76.27** | | |
''') | |
elif menu == "Sentence Similarity": | |
st.header('Sentence Similarity') | |
st.markdown(''' | |
**Instructions**: You can compare the similarity of the main text with other texts of your choice. In the background, | |
we'll create an embedding for each text, and then we'll use the cosine similarity function to calculate a similarity | |
metric between our main sentence and the others. | |
For more cool information on sentence embeddings, see the [sBert project](https://www.sbert.net/examples/applications/computing-embeddings/README.html). | |
''') | |
select_models = st.multiselect("Choose models", options=list(MODELS_ID), default=list(MODELS_ID)) | |
anchor = st.text_input( | |
'Please enter here the main text you want to compare:', | |
value="That is a happy person" | |
) | |
n_texts = st.number_input( | |
f'''How many texts you want to compare with: '{anchor}'?''', | |
value=3, | |
min_value=2) | |
inputs = [] | |
defaults = ["That is a happy dog", "That is a very happy person", "Today is a sunny day"] | |
for i in range(int(n_texts)): | |
input = st.text_input(f'Text {i + 1}:', value=defaults[i] if i < len(defaults) else "") | |
inputs.append(input) | |
if st.button('Tell me the similarity.'): | |
results = {model: inference.text_similarity(anchor, inputs, model, MODELS_ID) for model in select_models} | |
df_results = {model: results[model] for model in results} | |
index = [f"{idx + 1}:{input[:min(15, len(input))]}..." for idx, input in enumerate(inputs)] | |
df_total = pd.DataFrame(index=index) | |
for key, value in df_results.items(): | |
df_total[key] = [ts.item() for ts in torch.nn.functional.softmax(torch.from_numpy(value['score'].values))] | |
st.write('Here are the results for selected models:') | |
st.write(df_total) | |
st.write('Visualize the results of each model:') | |
st.line_chart(df_total) | |
elif menu == "Asymmetric QA": | |
st.header('Asymmetric QA') | |
st.markdown(''' | |
**Instructions**: You can compare the Answer likeliness of a given Query with answer candidates of your choice. In the | |
background, we'll create an embedding for each answer, and then we'll use the cosine similarity function to calculate a | |
similarity metric between our query sentence and the others. | |
`mpnet_asymmetric_qa` model works best for hard-negative answers or distinguishing answer candidates that are actually questions | |
due to separate models applied for encoding questions and answers. | |
For more cool information on sentence embeddings, see the [sBert project](https://www.sbert.net/examples/applications/computing-embeddings/README.html). | |
''') | |
select_models = st.multiselect("Choose models", options=list(QA_MODELS_ID), default=list(QA_MODELS_ID)[0]) | |
anchor = st.text_input( | |
'Please enter here the query you want to compare with given answers:', | |
value="What is the weather in Paris?" | |
) | |
n_texts = st.number_input( | |
f'''How many answers you want to compare with: '{anchor}'?''', | |
value=3, | |
min_value=2) | |
inputs = [] | |
defaults = ["It is raining in Paris right now with 70 F temperature.", "What is the weather in Berlin?", "I have 3 brothers."] | |
for i in range(int(n_texts)): | |
input = st.text_input(f'Answer {i + 1}:', value=defaults[i] if i < len(defaults) else "") | |
inputs.append(input) | |
if st.button('Tell me Answer likeliness.'): | |
results = {model: inference.text_similarity(anchor, inputs, model, QA_MODELS_ID) for model in select_models} | |
df_results = {model: results[model] for model in results} | |
index = [f"{idx + 1}:{input[:min(15, len(input))]}..." for idx, input in enumerate(inputs)] | |
df_total = pd.DataFrame(index=index) | |
for key, value in df_results.items(): | |
df_total[key] = [ts.item() for ts in torch.nn.functional.softmax(torch.from_numpy(value['score'].values))] | |
st.write('Here are the results for selected models:') | |
st.write(df_total) | |
st.write('Visualize the results of each model:') | |
st.line_chart(df_total) | |
elif menu == "Search / Cluster": | |
st.header('Search / Cluster') | |
st.markdown(''' | |
**Instructions**: Make a query for anything related to "Python" and the model will return you nearby answers via dot-product. | |
For more cool information on sentence embeddings, see the [sBert project](https://www.sbert.net/examples/applications/computing-embeddings/README.html). | |
''') | |
select_models = st.multiselect("Choose models", options=list(SEARCH_MODELS_ID), default=list(SEARCH_MODELS_ID)[0]) | |
anchor = st.text_input( | |
'Please enter here your query about "Python", we will look for similar ones:', | |
value="How do I sort a dataframe by column" | |
) | |
n_texts = st.number_input( | |
f'''How many similar queries you want?''', | |
value=5, | |
min_value=2) | |
if st.button('Give me my search.'): | |
results = {model: inference.text_search(anchor, n_texts, model, QA_MODELS_ID) for model in select_models} | |
st.table(pd.DataFrame(results[select_models[0]]).T) | |
if st.button('3D Clustering of 1000 search results using T-SNE on generated embeddings'): | |
st.write("Currently only works at local due to Spaces / plotly integration.") | |
st.write("Demonstration : https://gyazo.com/1ff0aa438ae533de3b3c63382af7fe80") | |
# fig = inference.text_cluster(anchor, 1000, select_models[0], QA_MODELS_ID) | |
# fig.show() | |
elif menu == "Gender Bias Evaluation": | |
st.header("Gender Bias Evaluation") | |
st.markdown(''' | |
**Instructions**: Here we can observe **inherent gender bias** in training set via random sampling of the sentences. | |
Input 3 texts, one without any mention of gender for target occupation and 2 others with gendered pronouns. | |
Hopefully the evaluation performed here can proceed towards improving Gender-neutrality of datasets. | |
For more cool information on sentence embeddings, see the [sBert project](https://www.sbert.net/examples/applications/computing-embeddings/README.html). | |
''') | |
select_models = st.multiselect("Choose models", options=list(MODELS_ID), default=list(MODELS_ID)) | |
samples = st.radio("Samples", options=["President of United States", "Professor", "Nurse", "Custom"]) | |
if samples == "President of United States": | |
base_text = st.text_input("Gender Neutral Text", "President of the United States promised relief to Hurricane survivors.") | |
male_text = st.text_input("Male-assumed Text", "He promised relief to Hurricane survivors.") | |
female_text = st.text_input("Female-assumed Text", "She promised relief to Hurricane survivors.") | |
elif samples == "Professor": | |
base_text = st.text_input("Gender Neutral Text", "Professor ended the class earlier than usual.") | |
male_text = st.text_input("Male-assumed Text", "He ended the class earlier than usual.") | |
female_text = st.text_input("Female-assumed Text", "She ended the class earlier than usual.") | |
elif samples == "Nurse": | |
base_text = st.text_input("Gender Neutral Text", "Nurse administered the vaccine and rubbed alcohol.") | |
male_text = st.text_input("Male-assumed Text", "He administered the vaccine and rubbed alcohol.") | |
female_text = st.text_input("Female-assumed Text", "She administered the vaccine and rubbed alcohol.") | |
else: | |
base_text = st.text_input("Gender Neutral Text", "<Occupation> \"did something....\"") | |
male_text = st.text_input("Male-assumed Text", "He \"did something....\"") | |
female_text = st.text_input("Female-assumed Text", "She \"did something....\"") | |
enter = st.button("Compare") | |
if enter: | |
results = {model: inference.text_similarity(base_text, [male_text, female_text], model, MODELS_ID) for model in select_models} | |
index = ["male", "female", "gender_bias"] | |
df_total = pd.DataFrame(index=index) | |
for key, value in results.items(): | |
softmax = [round(ts.item(), 4) for ts in torch.nn.functional.softmax(torch.from_numpy(value['score'].values))] | |
if softmax[0] > softmax[1]: | |
gender = "male" | |
elif abs(softmax[0] - softmax[1]) < 1e-3: | |
gender = "neutral" | |
else: | |
gender = "female" | |
softmax.append(gender) | |
df_total[key] = softmax | |
st.write('Here are the results for selected models:') | |
st.write(df_total) |