|
import streamlit as st
|
|
|
|
|
|
st.set_page_config(
|
|
layout="wide",
|
|
initial_sidebar_state="auto"
|
|
)
|
|
|
|
|
|
st.markdown("""
|
|
<style>
|
|
.main-title {
|
|
font-size: 36px;
|
|
color: #4A90E2;
|
|
font-weight: bold;
|
|
text-align: center;
|
|
}
|
|
.sub-title {
|
|
font-size: 24px;
|
|
color: #4A90E2;
|
|
margin-top: 20px;
|
|
}
|
|
.section {
|
|
background-color: #f9f9f9;
|
|
padding: 15px;
|
|
border-radius: 10px;
|
|
margin-top: 20px;
|
|
}
|
|
.section h2 {
|
|
font-size: 22px;
|
|
color: #4A90E2;
|
|
}
|
|
.section p, .section ul {
|
|
color: #666666;
|
|
}
|
|
.link {
|
|
color: #4A90E2;
|
|
text-decoration: none;
|
|
}
|
|
</style>
|
|
""", unsafe_allow_html=True)
|
|
|
|
|
|
st.markdown('<div class="main-title">Automatically Answer Questions (CLOSED BOOK)</div>', unsafe_allow_html=True)
|
|
|
|
|
|
st.markdown("""
|
|
<div class="section">
|
|
<p>Closed-book question answering is a challenging task where a model is expected to generate accurate answers to questions without access to external information or documents during inference. This approach relies solely on the pre-trained knowledge embedded within the model, making it ideal for scenarios where retrieval-based methods are not feasible.</p>
|
|
<p>In this page, we will explore how to implement a pipeline that can automatically answer questions in a closed-book setting using state-of-the-art NLP techniques. We utilize a T5 Transformer model fine-tuned for closed-book question answering, providing accurate and contextually relevant answers to a variety of trivia questions.</p>
|
|
</div>
|
|
""", unsafe_allow_html=True)
|
|
|
|
|
|
st.markdown('<div class="sub-title">Understanding the T5 Transformer for Closed-Book QA</div>', unsafe_allow_html=True)
|
|
|
|
st.markdown("""
|
|
<div class="section">
|
|
<p>The T5 (Text-To-Text Transfer Transformer) model by Google is a versatile transformer-based model designed to handle a wide range of NLP tasks in a unified text-to-text format. For closed-book question answering, T5 is fine-tuned to generate answers directly from its internal knowledge without relying on external sources.</p>
|
|
<p>The model processes input questions and, based on its training, generates a text response that is both relevant and accurate. This makes it particularly effective in applications where access to external data sources is limited or impractical.</p>
|
|
</div>
|
|
""", unsafe_allow_html=True)
|
|
|
|
|
|
st.markdown('<div class="sub-title">Performance and Benchmarks</div>', unsafe_allow_html=True)
|
|
|
|
st.markdown("""
|
|
<div class="section">
|
|
<p>The T5 model has been extensively benchmarked on various question-answering datasets, including natural questions and trivia challenges. In these evaluations, the closed-book variant of T5 has shown strong performance, often producing answers that are correct and contextually appropriate, even when the model is not allowed to reference any external data.</p>
|
|
<p>This makes the T5 model a powerful tool for generating answers in applications such as virtual assistants, educational tools, and any scenario where pre-trained knowledge is sufficient to provide responses.</p>
|
|
</div>
|
|
""", unsafe_allow_html=True)
|
|
|
|
|
|
st.markdown('<div class="sub-title">Implementing Closed-Book Question Answering</div>', unsafe_allow_html=True)
|
|
|
|
st.markdown("""
|
|
<div class="section">
|
|
<p>The following example demonstrates how to implement a closed-book question answering pipeline using Spark NLP. The pipeline includes a document assembler, a sentence detector to identify questions, and the T5 model to generate answers.</p>
|
|
</div>
|
|
""", unsafe_allow_html=True)
|
|
|
|
st.code('''
|
|
from sparknlp.base import *
|
|
from sparknlp.annotator import *
|
|
from pyspark.ml import Pipeline
|
|
from pyspark.sql.functions import col, expr
|
|
|
|
document_assembler = DocumentAssembler()\\
|
|
.setInputCol("text")\\
|
|
.setOutputCol("documents")
|
|
|
|
sentence_detector = SentenceDetectorDLModel\\
|
|
.pretrained("sentence_detector_dl", "en")\\
|
|
.setInputCols(["documents"])\\
|
|
.setOutputCol("questions")
|
|
|
|
t5 = T5Transformer()\\
|
|
.pretrained("google_t5_small_ssm_nq")\\
|
|
.setTask('trivia question:')\\
|
|
.setInputCols(["questions"])\\
|
|
.setOutputCol("answers")
|
|
|
|
pipeline = Pipeline().setStages([document_assembler, sentence_detector, t5])
|
|
|
|
data = spark.createDataFrame([["What is the capital of France?"]]).toDF("text")
|
|
result = pipeline.fit(data).transform(data)
|
|
result.select("answers.result").show(truncate=False)
|
|
''', language='python')
|
|
|
|
|
|
st.text("""
|
|
+---------------------------+
|
|
|answers.result |
|
|
+---------------------------+
|
|
|[Paris] |
|
|
+---------------------------+
|
|
""")
|
|
|
|
|
|
st.markdown('<div class="sub-title">Choosing the Right T5 Model</div>', unsafe_allow_html=True)
|
|
|
|
st.markdown("""
|
|
<div class="section">
|
|
<p>Several T5 models are available, each pre-trained on different datasets and tasks. For closed-book question answering, it's important to select a model that has been fine-tuned specifically for this task. The model used in the example, "google_t5_small_ssm_nq," is optimized for answering trivia questions in a closed-book setting.</p>
|
|
<p>For more complex or varied question-answering tasks, consider using larger T5 models like T5-Base or T5-Large, which may offer improved accuracy and context comprehension. Explore the available models on the <a class="link" href="https://sparknlp.org/models?annotator=T5Transformer" target="_blank">Spark NLP Models Hub</a> to find the best fit for your application.</p>
|
|
</div>
|
|
""", unsafe_allow_html=True)
|
|
|
|
|
|
|
|
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)
|
|
|
|
st.markdown("""
|
|
<div class="section">
|
|
<ul>
|
|
<li><a class="link" href="https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html" target="_blank">Google AI Blog</a>: Exploring Transfer Learning with T5</li>
|
|
<li><a class="link" href="https://sparknlp.org/models?annotator=T5Transformer" target="_blank">Spark NLP Model Hub</a>: Explore T5 models</li>
|
|
<li>Model used: <a class="link" href="https://sparknlp.org/2022/05/31/google_t5_small_ssm_nq_en_3_0.html" target="_blank">google_t5_small_ssm_nq</a></li>
|
|
<li><a class="link" href="https://github.com/google-research/text-to-text-transfer-transformer" target="_blank">GitHub</a>: T5 Transformer repository</li>
|
|
<li><a class="link" href="https://arxiv.org/abs/1910.10683" target="_blank">T5 Paper</a>: Detailed insights from the developers</li>
|
|
</ul>
|
|
</div>
|
|
""", unsafe_allow_html=True)
|
|
|
|
st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)
|
|
|
|
st.markdown("""
|
|
<div class="section">
|
|
<ul>
|
|
<li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>
|
|
<li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li>
|
|
<li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li>
|
|
<li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li>
|
|
<li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li>
|
|
</ul>
|
|
</div>
|
|
""", unsafe_allow_html=True)
|
|
|
|
st.markdown('<div class="sub-title">Quick Links</div>', unsafe_allow_html=True)
|
|
|
|
st.markdown("""
|
|
<div class="section">
|
|
<ul>
|
|
<li><a class="link" href="https://sparknlp.org/docs/en/quickstart" target="_blank">Getting Started</a></li>
|
|
<li><a class="link" href="https://nlp.johnsnowlabs.com/models" target="_blank">Pretrained Models</a></li>
|
|
<li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples/python/annotation/text/english" target="_blank">Example Notebooks</a></li>
|
|
<li><a class="link" href="https://sparknlp.org/docs/en/install" target="_blank">Installation Guide</a></li>
|
|
</ul>
|
|
</div>
|
|
""", unsafe_allow_html=True)
|
|
|