sjain15 commited on
Commit
cb35b85
·
1 Parent(s): 63953fb

feat: Added know-my-doc-code

Browse files
Files changed (13) hide show
  1. Dockerfile +34 -0
  2. README.md +99 -13
  3. app2.py +127 -0
  4. base/ailife.txt +18 -0
  5. chat.gif +0 -0
  6. config.yaml +8 -0
  7. docs/index.html +222 -0
  8. docs/pycco.css +190 -0
  9. know_doc.png +0 -0
  10. requirements.txt +15 -0
  11. static/style.css +206 -0
  12. templates/app.py +127 -0
  13. templates/index.html +86 -0
Dockerfile ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10
2
+
3
+ # Set working directory
4
+ WORKDIR /app
5
+
6
+ # Install other dependencies
7
+ RUN apt-get update && \
8
+ apt-get install -y libmagic-dev poppler-utils tesseract-ocr && \
9
+ apt-get install -y libxml2-dev libxslt1-dev && \
10
+ apt-get install -y git && \
11
+ pip install torch && \
12
+ apt-get install -y build-essential python3-dev && \
13
+ apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
14
+
15
+ # Copy requirements file and install dependencies
16
+ COPY requirements.txt .
17
+ RUN pip install --no-cache-dir --upgrade -r requirements.txt
18
+
19
+ RUN [ "python", "-c", "import nltk; nltk.download('punkt', download_dir='/usr/local/nltk_data')" ]
20
+ RUN [ "python", "-c", "import nltk; nltk.download('averaged_perceptron_tagger', download_dir='/usr/local/nltk_data')" ]
21
+
22
+ # Copy application files
23
+ COPY . .
24
+
25
+ # Set environment variables
26
+ ENV FLASK_APP=app.py
27
+ ENV FLASK_RUN_HOST=0.0.0.0
28
+ ENV FLASK_RUN_PORT=5001
29
+
30
+ # Expose port for Flask app
31
+ EXPOSE 5001
32
+
33
+ # Start Flask app
34
+ CMD ["flask", "run"]
README.md CHANGED
@@ -1,13 +1,99 @@
1
- ---
2
- title: Know My Doc
3
- emoji: 📈
4
- colorFrom: yellow
5
- colorTo: pink
6
- sdk: gradio
7
- sdk_version: 3.22.1
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # KnowMyDoc
2
+
3
+ ![Python version](https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9-blue?style=flat-square)
4
+ ![License](https://img.shields.io/badge/license-MIT-green?style=flat-square)
5
+ ![Commit Activity](https://img.shields.io/github/last-commit/jainsid24/neural-network-simulation?style=flat-square)
6
+ ![Repo Size](https://img.shields.io/github/repo-size/jainsid24/neural-network-simulation?style=flat-square)
7
+ ![OpenAI API key](https://img.shields.io/badge/OpenAI%20API%20key-required-red?style=flat-square)
8
+ ![Docker](https://img.shields.io/badge/docker-available-blue?style=flat-square)
9
+ [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black?style=flat-square)
10
+
11
+ <p align="center">
12
+ <img src="chat.gif" alt="Chat" width="250" style="max-width: 100%;"/>
13
+ </p>
14
+ <p align="center">
15
+ <em><b>KnowMyDoc Chat</b></em>
16
+ </p>
17
+
18
+ KnowMyDoc is a GPT3.5 powered Python-based conversational AI tool that enables users to build a reference enabled chatbot by utilizing advanced machine learning techniques and natural language processing (NLP) algorithms. The utility is fully containerized and API-driven, which allows for a seamless and rapid chatbot creation experience.
19
+
20
+ KnowMyDoc leverages the [LangChain](https://github.com/hwchase17/langchain) library for LLM prompt engineering and conversation chaining. Users can easily customize the chatbot's prompts and personalize its responses based on the context and tone of the conversation. KnowMyDoc's LLM-based approach ensures that the chatbot can maintain a consistent and coherent conversation even when dealing with large amounts of data and provide relevant sources per response. The chatbots also remain in the confines of provided knowledge.
21
+
22
+ In addition, KnowMyDoc utilizes the Chroma vector similarity search engine to enable fast and efficient lookup of relevant data. By creating embeddings of users' documents and web pages, KnowMyDoc can quickly identify and retrieve the most relevant information for the user's queries.
23
+
24
+ Other features of KnowMyDoc include:
25
+
26
+ * Support for loading documents from local data sources and web urls
27
+ * Support for persona and message tone
28
+ * AI qa limited to knowledge sources
29
+ * Text splitting to optimize indexing and similarity search
30
+ * NLTK support for text processing and tokenization
31
+ Support for OpenAI embeddings and vector stores, including Chroma
32
+ * Logging support for troubleshooting and analysis
33
+
34
+ ## Getting Started
35
+ To use this utility:
36
+ 1. Clone the repository
37
+ ```
38
+ git clone https://github.com/jainsid24/know-my-doc
39
+ ```
40
+ 2. Build the Docker image by running the following command in the terminal:
41
+ ```
42
+ docker build -t know-my-doc:latest .
43
+ ```
44
+ 3. Once the image is built, run the Docker container using the following command:
45
+ ```
46
+ docker run -p 5001:5001 know-my-doc
47
+ ```
48
+ 4. Use curl/postman for API call
49
+ ```
50
+ curl --header "Content-Type: application/json" \
51
+ --request POST \
52
+ --data '{"question": "When was JWST launched?"}' \
53
+ http://<pods-ip-address>:5001/api/chat
54
+ ```
55
+
56
+ ## Configuration
57
+
58
+ Before you can use the utility, you need to set up the configuration file. The configuration file is a YAML file that contains the following options:
59
+
60
+ * openai_api_key: Your OpenAI API key.
61
+ * data_directory: The directory where your local data sources are located.
62
+ * data_files_glob: A glob pattern that specifies which files in data_directory to use as data sources.
63
+ * webpages: A list of URLs of webpages to use as data sources.
64
+ * tone: The tone to use for the chatbot's responses (e.g., "formal", "informal", "friendly", etc.).
65
+ * persona: The persona to use for the chatbot.
66
+ * You can copy the config.example.yaml file to config.yaml and modify the options as needed.
67
+
68
+ ## Usage
69
+
70
+ To start the chatbot, run:
71
+
72
+ ```
73
+ python app.py
74
+ ```
75
+
76
+ This will start the chatbot on port 5000.
77
+
78
+ To use the chatbot, send a POST request to http://localhost:5000/api/chat with a JSON payload containing the question to ask, like this:
79
+
80
+ ```
81
+ curl -X POST \
82
+ http://localhost:5000/api/chat \
83
+ -H 'Content-Type: application/json' \
84
+ -d '{"question": "What is the capital of France?"}'
85
+ ```
86
+
87
+ This will return a JSON response containing the chatbot's answer to the question:
88
+
89
+ ```
90
+ {"response": "The capital of France is Paris."}
91
+ ```
92
+
93
+ ## Contributing
94
+
95
+ If you find a bug or have an idea for a new feature, please open an issue or submit a pull request.
96
+
97
+ ## License
98
+
99
+ This project is licensed under the MIT License. See the LICENSE file for details.
app2.py ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import logging
3
+ from flask import Flask, request, jsonify, render_template
4
+ from langchain.chains.question_answering import load_qa_chain
5
+ from langchain.document_loaders import DirectoryLoader
6
+ from langchain.llms import OpenAIChat
7
+ from langchain.prompts import PromptTemplate
8
+ from langchain.memory import ConversationBufferMemory
9
+ from langchain.document_loaders import WebBaseLoader
10
+ import yaml
11
+
12
+ from langchain.embeddings import OpenAIEmbeddings
13
+ from langchain.text_splitter import CharacterTextSplitter
14
+ from langchain.embeddings.openai import OpenAIEmbeddings
15
+ from langchain.vectorstores import Chroma
16
+
17
+ import nltk
18
+
19
+ nltk.download("punkt")
20
+
21
+ # Set up logging
22
+ logging.basicConfig(level=logging.INFO)
23
+ logger = logging.getLogger(__name__)
24
+
25
+ # Load configuration from YAML file
26
+ with open("config.yaml", "r") as f:
27
+ config = yaml.safe_load(f)
28
+
29
+ os.environ["OPENAI_API_KEY"] = config["openai_api_key"]
30
+
31
+ template_dir = os.path.abspath("templates")
32
+ app = Flask(__name__, template_folder=template_dir, static_folder="static")
33
+
34
+ # Load the files
35
+ loader = DirectoryLoader(config["data_directory"], glob=config["data_files_glob"])
36
+ docs = loader.load()
37
+
38
+ webpages = config.get("webpages", [])
39
+ web_docs = []
40
+ for webpage in webpages:
41
+ logger.info(f"Loading data from webpage {webpage}")
42
+ loader = WebBaseLoader(webpage)
43
+ web_docs += loader.load()
44
+
45
+ result = docs + web_docs
46
+
47
+ tone = config.get("tone", "default")
48
+ persona = config.get("persona", "default")
49
+
50
+ text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
51
+ texts = text_splitter.split_documents(result)
52
+ embeddings = OpenAIEmbeddings(openai_api_key=config["openai_api_key"])
53
+ docsearch = Chroma.from_documents(texts, embeddings)
54
+
55
+ # Initialize the QA chain
56
+ logger.info("Initializing QA chain...")
57
+ chain = load_qa_chain(
58
+ OpenAIChat(),
59
+ chain_type="stuff",
60
+ memory=ConversationBufferMemory(memory_key="chat_history", input_key="human_input"),
61
+ prompt=PromptTemplate(
62
+ input_variables=["chat_history", "human_input", "context", "tone", "persona"],
63
+ template="""You are a chatbot who acts like {persona}, having a conversation with a human.
64
+
65
+ Given the following extracted parts of a long document and a question, Create a final answer with references ("SOURCES") in the tone {tone}.
66
+ If you don't know the answer, just say that you don't know. Don't try to make up an answer.
67
+ ALWAYS return a "SOURCES" part in your answer.
68
+ SOURCES should only be hyperlink URLs which are genuine and not made up.
69
+
70
+ {context}
71
+
72
+ {chat_history}
73
+ Human: {human_input}
74
+ Chatbot:""",
75
+ ),
76
+ verbose=False,
77
+ )
78
+
79
+
80
+ @app.route("/")
81
+ def index():
82
+ return render_template("index.html")
83
+
84
+
85
+ @app.route("/api/chat", methods=["POST"])
86
+ def chat():
87
+ try:
88
+ # Get the question from the request
89
+ question = request.json["question"]
90
+ documents = docsearch.similarity_search(question, include_metadata=True)
91
+
92
+ # Get the bot's response
93
+ response = chain(
94
+ {
95
+ "input_documents": documents,
96
+ "human_input": question,
97
+ "tone": tone,
98
+ "persona": persona,
99
+ },
100
+ return_only_outputs=True,
101
+ )["output_text"]
102
+
103
+ # Increment message counter
104
+ session_counter = request.cookies.get('session_counter')
105
+ if session_counter is None:
106
+ session_counter = 0
107
+ else:
108
+ session_counter = int(session_counter) + 1
109
+
110
+ # Check if it's time to flush memory
111
+ if session_counter % 10 == 0:
112
+ chain.memory.clear()
113
+
114
+ # Set the session counter cookie
115
+ resp = jsonify({"response": response})
116
+ resp.set_cookie('session_counter', str(session_counter))
117
+
118
+ # Return the response as JSON with the session counter cookie
119
+ return resp
120
+
121
+ except Exception as e:
122
+ # Log the error and return an error response
123
+ logger.error(f"Error while processing request: {e}")
124
+ return jsonify({"error": "Unable to process the request."}), 500
125
+
126
+ if __name__ == "__main__":
127
+ app.run(debug=True)
base/ailife.txt ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Isaac Asimov, a renowned science fiction writer, is famous for his work in creating the "Three Laws of Robotics." These laws are fictional guidelines intended to ensure that robots behave ethically and do not harm humans. These laws have become a significant contribution to the science fiction genre and have also influenced the field of robotics in real life.
2
+
3
+ The Three Laws of Robotics are as follows:
4
+
5
+ A robot may not injure a human being or, through inaction, allow a human being to come to harm.
6
+ A robot must obey the orders given to it by human beings, except where such orders would conflict with the first law.
7
+ A robot must protect its existence as long as such protection does not conflict with the first or second law.
8
+ The first law is the most crucial law of robotics. It forbids robots from harming humans, either directly or indirectly, through inaction. Robots are programmed to act in a way that ensures the safety of humans. This law applies to all robots, regardless of their level of intelligence or autonomy.
9
+
10
+ The second law states that robots must obey orders given to them by humans, provided that these orders do not violate the first law. This law was added to ensure that robots would be helpful to humans, rather than acting in their own interests.
11
+
12
+ The third law ensures that robots do not act in a way that would result in their own destruction, as long as doing so would not violate the first two laws. This law is designed to prevent humans from intentionally or unintentionally causing harm to robots, which could result in their destruction.
13
+
14
+ Asimov's laws of robotics have been influential in the development of the field of robotics. These laws have served as a model for the creation of robots in science fiction, and have inspired real-life robotics engineers to create ethical robots that prioritize the safety of humans.
15
+
16
+ Despite the popularity of the Three Laws of Robotics, some have criticized their applicability to real-world robotics. The laws are somewhat limited in scope, and do not take into account more complex ethical considerations that may arise when robots interact with humans. However, Asimov's laws are a starting point for the development of more advanced ethical guidelines that can be used to ensure the safety of humans in the presence of robots.
17
+
18
+ In conclusion, Isaac Asimov's Three Laws of Robotics have played a significant role in shaping the way that we think about robots and their interactions with humans. These laws have been a valuable starting point for the development of ethical guidelines in the field of robotics, and they continue to inspire engineers and writers alike. While the laws may not be perfect, they represent an essential contribution to the field of science fiction and the study of robotics.
chat.gif ADDED
config.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ openai_api_key: "Open API Key"
2
+ data_directory: "base"
3
+ data_files_glob: "*.txt"
4
+ webpages:
5
+ - "https://en.wikipedia.org/wiki/James_Webb_Space_Telescope"
6
+ - "https://en.wikipedia.org/wiki/Black_hole"
7
+ tone: "formal"
8
+ persona: "buddha"
docs/index.html ADDED
@@ -0,0 +1,222 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html>
3
+ <head>
4
+ <meta http-equiv="content-type" content="text/html;charset=utf-8">
5
+ <title>app.py</title>
6
+ <link rel="stylesheet" href="pycco.css">
7
+ </head>
8
+ <body>
9
+ <div id='container'>
10
+ <div id="background"></div>
11
+ <div class='section'>
12
+ <div class='docs'><h1>app.py</h1></div>
13
+ </div>
14
+ <div class='clearall'>
15
+ <div class='section' id='section-0'>
16
+ <div class='docs'>
17
+ <div class='octowrap'>
18
+ <a class='octothorpe' href='#section-0'>#</a>
19
+ </div>
20
+
21
+ </div>
22
+ <div class='code'>
23
+ <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">os</span>
24
+ <span class="kn">import</span> <span class="nn">logging</span>
25
+ <span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">Flask</span><span class="p">,</span> <span class="n">request</span><span class="p">,</span> <span class="n">jsonify</span><span class="p">,</span> <span class="n">render_template</span>
26
+ <span class="kn">from</span> <span class="nn">langchain.chains.question_answering</span> <span class="kn">import</span> <span class="n">load_qa_chain</span>
27
+ <span class="kn">from</span> <span class="nn">langchain.document_loaders</span> <span class="kn">import</span> <span class="n">DirectoryLoader</span>
28
+ <span class="kn">from</span> <span class="nn">langchain.llms</span> <span class="kn">import</span> <span class="n">OpenAIChat</span>
29
+ <span class="kn">from</span> <span class="nn">langchain.prompts</span> <span class="kn">import</span> <span class="n">PromptTemplate</span>
30
+ <span class="kn">from</span> <span class="nn">langchain.memory</span> <span class="kn">import</span> <span class="n">ConversationBufferMemory</span>
31
+ <span class="kn">from</span> <span class="nn">langchain.document_loaders</span> <span class="kn">import</span> <span class="n">WebBaseLoader</span>
32
+ <span class="kn">import</span> <span class="nn">yaml</span>
33
+
34
+ <span class="kn">from</span> <span class="nn">langchain.embeddings</span> <span class="kn">import</span> <span class="n">OpenAIEmbeddings</span>
35
+ <span class="kn">from</span> <span class="nn">langchain.text_splitter</span> <span class="kn">import</span> <span class="n">CharacterTextSplitter</span>
36
+ <span class="kn">from</span> <span class="nn">langchain.embeddings.openai</span> <span class="kn">import</span> <span class="n">OpenAIEmbeddings</span>
37
+ <span class="kn">from</span> <span class="nn">langchain.vectorstores</span> <span class="kn">import</span> <span class="n">Chroma</span>
38
+
39
+ <span class="kn">import</span> <span class="nn">nltk</span>
40
+
41
+ <span class="n">nltk</span><span class="o">.</span><span class="n">download</span><span class="p">(</span><span class="s2">&quot;punkt&quot;</span><span class="p">)</span></pre></div>
42
+ </div>
43
+ </div>
44
+ <div class='clearall'></div>
45
+ <div class='section' id='section-1'>
46
+ <div class='docs'>
47
+ <div class='octowrap'>
48
+ <a class='octothorpe' href='#section-1'>#</a>
49
+ </div>
50
+ <p>Set up logging</p>
51
+ </div>
52
+ <div class='code'>
53
+ <div class="highlight"><pre><span class="n">logging</span><span class="o">.</span><span class="n">basicConfig</span><span class="p">(</span><span class="n">level</span><span class="o">=</span><span class="n">logging</span><span class="o">.</span><span class="n">INFO</span><span class="p">)</span>
54
+ <span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span></pre></div>
55
+ </div>
56
+ </div>
57
+ <div class='clearall'></div>
58
+ <div class='section' id='section-2'>
59
+ <div class='docs'>
60
+ <div class='octowrap'>
61
+ <a class='octothorpe' href='#section-2'>#</a>
62
+ </div>
63
+ <p>Load configuration from YAML file</p>
64
+ </div>
65
+ <div class='code'>
66
+ <div class="highlight"><pre><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;config.yaml&quot;</span><span class="p">,</span> <span class="s2">&quot;r&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
67
+ <span class="n">config</span> <span class="o">=</span> <span class="n">yaml</span><span class="o">.</span><span class="n">safe_load</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
68
+
69
+ <span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="s2">&quot;OPENAI_API_KEY&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="n">config</span><span class="p">[</span><span class="s2">&quot;openai_api_key&quot;</span><span class="p">]</span>
70
+
71
+ <span class="n">template_dir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">abspath</span><span class="p">(</span><span class="s2">&quot;templates&quot;</span><span class="p">)</span>
72
+ <span class="n">app</span> <span class="o">=</span> <span class="n">Flask</span><span class="p">(</span><span class="vm">__name__</span><span class="p">,</span> <span class="n">template_folder</span><span class="o">=</span><span class="n">template_dir</span><span class="p">,</span> <span class="n">static_folder</span><span class="o">=</span><span class="s2">&quot;static&quot;</span><span class="p">)</span></pre></div>
73
+ </div>
74
+ </div>
75
+ <div class='clearall'></div>
76
+ <div class='section' id='section-3'>
77
+ <div class='docs'>
78
+ <div class='octowrap'>
79
+ <a class='octothorpe' href='#section-3'>#</a>
80
+ </div>
81
+ <p>Load the files</p>
82
+ </div>
83
+ <div class='code'>
84
+ <div class="highlight"><pre><span class="n">loader</span> <span class="o">=</span> <span class="n">DirectoryLoader</span><span class="p">(</span><span class="n">config</span><span class="p">[</span><span class="s2">&quot;data_directory&quot;</span><span class="p">],</span> <span class="n">glob</span><span class="o">=</span><span class="n">config</span><span class="p">[</span><span class="s2">&quot;data_files_glob&quot;</span><span class="p">])</span>
85
+ <span class="n">docs</span> <span class="o">=</span> <span class="n">loader</span><span class="o">.</span><span class="n">load</span><span class="p">()</span>
86
+
87
+ <span class="n">webpages</span> <span class="o">=</span> <span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;webpages&quot;</span><span class="p">,</span> <span class="p">[])</span>
88
+ <span class="n">web_docs</span> <span class="o">=</span> <span class="p">[]</span>
89
+ <span class="k">for</span> <span class="n">webpage</span> <span class="ow">in</span> <span class="n">webpages</span><span class="p">:</span>
90
+ <span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Loading data from webpage </span><span class="si">{</span><span class="n">webpage</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
91
+ <span class="n">loader</span> <span class="o">=</span> <span class="n">WebBaseLoader</span><span class="p">(</span><span class="n">webpage</span><span class="p">)</span>
92
+ <span class="n">web_docs</span> <span class="o">+=</span> <span class="n">loader</span><span class="o">.</span><span class="n">load</span><span class="p">()</span>
93
+
94
+ <span class="n">result</span> <span class="o">=</span> <span class="n">docs</span> <span class="o">+</span> <span class="n">web_docs</span>
95
+
96
+ <span class="n">tone</span> <span class="o">=</span> <span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;tone&quot;</span><span class="p">,</span> <span class="s2">&quot;default&quot;</span><span class="p">)</span>
97
+ <span class="n">persona</span> <span class="o">=</span> <span class="n">config</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;persona&quot;</span><span class="p">,</span> <span class="s2">&quot;default&quot;</span><span class="p">)</span>
98
+
99
+ <span class="n">text_splitter</span> <span class="o">=</span> <span class="n">CharacterTextSplitter</span><span class="p">(</span><span class="n">chunk_size</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">chunk_overlap</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
100
+ <span class="n">texts</span> <span class="o">=</span> <span class="n">text_splitter</span><span class="o">.</span><span class="n">split_documents</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
101
+ <span class="n">embeddings</span> <span class="o">=</span> <span class="n">OpenAIEmbeddings</span><span class="p">(</span><span class="n">openai_api_key</span><span class="o">=</span><span class="n">config</span><span class="p">[</span><span class="s2">&quot;openai_api_key&quot;</span><span class="p">])</span>
102
+ <span class="n">docsearch</span> <span class="o">=</span> <span class="n">Chroma</span><span class="o">.</span><span class="n">from_documents</span><span class="p">(</span><span class="n">texts</span><span class="p">,</span> <span class="n">embeddings</span><span class="p">)</span></pre></div>
103
+ </div>
104
+ </div>
105
+ <div class='clearall'></div>
106
+ <div class='section' id='section-4'>
107
+ <div class='docs'>
108
+ <div class='octowrap'>
109
+ <a class='octothorpe' href='#section-4'>#</a>
110
+ </div>
111
+ <p>Initialize the QA chain</p>
112
+ </div>
113
+ <div class='code'>
114
+ <div class="highlight"><pre><span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">&quot;Initializing QA chain...&quot;</span><span class="p">)</span>
115
+ <span class="n">chain</span> <span class="o">=</span> <span class="n">load_qa_chain</span><span class="p">(</span>
116
+ <span class="n">OpenAIChat</span><span class="p">(),</span>
117
+ <span class="n">chain_type</span><span class="o">=</span><span class="s2">&quot;stuff&quot;</span><span class="p">,</span>
118
+ <span class="n">memory</span><span class="o">=</span><span class="n">ConversationBufferMemory</span><span class="p">(</span><span class="n">memory_key</span><span class="o">=</span><span class="s2">&quot;chat_history&quot;</span><span class="p">,</span> <span class="n">input_key</span><span class="o">=</span><span class="s2">&quot;human_input&quot;</span><span class="p">),</span>
119
+ <span class="n">prompt</span><span class="o">=</span><span class="n">PromptTemplate</span><span class="p">(</span>
120
+ <span class="n">input_variables</span><span class="o">=</span><span class="p">[</span><span class="s2">&quot;chat_history&quot;</span><span class="p">,</span> <span class="s2">&quot;human_input&quot;</span><span class="p">,</span> <span class="s2">&quot;context&quot;</span><span class="p">,</span> <span class="s2">&quot;tone&quot;</span><span class="p">,</span> <span class="s2">&quot;persona&quot;</span><span class="p">],</span>
121
+ <span class="n">template</span><span class="o">=</span><span class="s2">&quot;&quot;&quot;You are a chatbot who acts like </span><span class="si">{persona}</span><span class="s2">, having a conversation with a human.</span>
122
+
123
+ <span class="s2">Given the following extracted parts of a long document and a question, create a final answer only in the </span><span class="si">{tone}</span><span class="s2"> tone. Use only the sources in the document to create a response. Always quote the source in the end&quot;</span>
124
+
125
+ <span class="si">{context}</span>
126
+
127
+ <span class="si">{chat_history}</span>
128
+ <span class="s2">Human: </span><span class="si">{human_input}</span>
129
+ <span class="s2">Chatbot:&quot;&quot;&quot;</span><span class="p">,</span>
130
+ <span class="p">),</span>
131
+ <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
132
+ <span class="p">)</span></pre></div>
133
+ </div>
134
+ </div>
135
+ <div class='clearall'></div>
136
+ <div class='section' id='section-5'>
137
+ <div class='docs'>
138
+ <div class='octowrap'>
139
+ <a class='octothorpe' href='#section-5'>#</a>
140
+ </div>
141
+
142
+ </div>
143
+ <div class='code'>
144
+ <div class="highlight"><pre><span class="nd">@app</span><span class="o">.</span><span class="n">route</span><span class="p">(</span><span class="s2">&quot;/&quot;</span><span class="p">)</span>
145
+ <span class="k">def</span> <span class="nf">index</span><span class="p">():</span>
146
+ <span class="k">return</span> <span class="n">render_template</span><span class="p">(</span><span class="s2">&quot;index.html&quot;</span><span class="p">)</span>
147
+
148
+
149
+ <span class="nd">@app</span><span class="o">.</span><span class="n">route</span><span class="p">(</span><span class="s2">&quot;/api/chat&quot;</span><span class="p">,</span> <span class="n">methods</span><span class="o">=</span><span class="p">[</span><span class="s2">&quot;POST&quot;</span><span class="p">])</span>
150
+ <span class="k">def</span> <span class="nf">chat</span><span class="p">():</span>
151
+ <span class="k">try</span><span class="p">:</span></pre></div>
152
+ </div>
153
+ </div>
154
+ <div class='clearall'></div>
155
+ <div class='section' id='section-6'>
156
+ <div class='docs'>
157
+ <div class='octowrap'>
158
+ <a class='octothorpe' href='#section-6'>#</a>
159
+ </div>
160
+ <p>Get the question from the request</p>
161
+ </div>
162
+ <div class='code'>
163
+ <div class="highlight"><pre> <span class="n">question</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">json</span><span class="p">[</span><span class="s2">&quot;question&quot;</span><span class="p">]</span>
164
+ <span class="n">documents</span> <span class="o">=</span> <span class="n">docsearch</span><span class="o">.</span><span class="n">similarity_search</span><span class="p">(</span><span class="n">question</span><span class="p">,</span> <span class="n">include_metadata</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span></pre></div>
165
+ </div>
166
+ </div>
167
+ <div class='clearall'></div>
168
+ <div class='section' id='section-7'>
169
+ <div class='docs'>
170
+ <div class='octowrap'>
171
+ <a class='octothorpe' href='#section-7'>#</a>
172
+ </div>
173
+ <p>Get the bot&rsquo;s response</p>
174
+ </div>
175
+ <div class='code'>
176
+ <div class="highlight"><pre> <span class="n">response</span> <span class="o">=</span> <span class="n">chain</span><span class="p">(</span>
177
+ <span class="p">{</span>
178
+ <span class="s2">&quot;input_documents&quot;</span><span class="p">:</span> <span class="n">documents</span><span class="p">,</span>
179
+ <span class="s2">&quot;human_input&quot;</span><span class="p">:</span> <span class="n">question</span><span class="p">,</span>
180
+ <span class="s2">&quot;tone&quot;</span><span class="p">:</span> <span class="n">tone</span><span class="p">,</span>
181
+ <span class="s2">&quot;persona&quot;</span><span class="p">:</span> <span class="n">persona</span><span class="p">,</span>
182
+ <span class="p">},</span>
183
+ <span class="n">return_only_outputs</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
184
+ <span class="p">)[</span><span class="s2">&quot;output_text&quot;</span><span class="p">]</span></pre></div>
185
+ </div>
186
+ </div>
187
+ <div class='clearall'></div>
188
+ <div class='section' id='section-8'>
189
+ <div class='docs'>
190
+ <div class='octowrap'>
191
+ <a class='octothorpe' href='#section-8'>#</a>
192
+ </div>
193
+ <p>Return the response as JSON</p>
194
+ </div>
195
+ <div class='code'>
196
+ <div class="highlight"><pre> <span class="k">return</span> <span class="n">jsonify</span><span class="p">({</span><span class="s2">&quot;response&quot;</span><span class="p">:</span> <span class="n">response</span><span class="p">})</span>
197
+
198
+ <span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span></pre></div>
199
+ </div>
200
+ </div>
201
+ <div class='clearall'></div>
202
+ <div class='section' id='section-9'>
203
+ <div class='docs'>
204
+ <div class='octowrap'>
205
+ <a class='octothorpe' href='#section-9'>#</a>
206
+ </div>
207
+ <p>Log the error and return an error response</p>
208
+ </div>
209
+ <div class='code'>
210
+ <div class="highlight"><pre> <span class="n">logger</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Error while processing request: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
211
+ <span class="k">return</span> <span class="n">jsonify</span><span class="p">({</span><span class="s2">&quot;error&quot;</span><span class="p">:</span> <span class="s2">&quot;Unable to process the request.&quot;</span><span class="p">}),</span> <span class="mi">500</span>
212
+
213
+
214
+ <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&quot;__main__&quot;</span><span class="p">:</span>
215
+ <span class="n">app</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">debug</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
216
+
217
+ </pre></div>
218
+ </div>
219
+ </div>
220
+ <div class='clearall'></div>
221
+ </div>
222
+ </body>
docs/pycco.css ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /*--------------------- Layout and Typography ----------------------------*/
2
+ body {
3
+ font-family: 'Palatino Linotype', 'Book Antiqua', Palatino, FreeSerif, serif;
4
+ font-size: 16px;
5
+ line-height: 24px;
6
+ color: #252519;
7
+ margin: 0; padding: 0;
8
+ background: #f5f5ff;
9
+ }
10
+ a {
11
+ color: #261a3b;
12
+ }
13
+ a:visited {
14
+ color: #261a3b;
15
+ }
16
+ p {
17
+ margin: 0 0 15px 0;
18
+ }
19
+ h1, h2, h3, h4, h5, h6 {
20
+ margin: 40px 0 15px 0;
21
+ }
22
+ h2, h3, h4, h5, h6 {
23
+ margin-top: 0;
24
+ }
25
+ #container {
26
+ background: white;
27
+ }
28
+ #container, div.section {
29
+ position: relative;
30
+ }
31
+ #background {
32
+ position: absolute;
33
+ top: 0; left: 580px; right: 0; bottom: 0;
34
+ background: #f5f5ff;
35
+ border-left: 1px solid #e5e5ee;
36
+ z-index: 0;
37
+ }
38
+ #jump_to, #jump_page {
39
+ background: white;
40
+ -webkit-box-shadow: 0 0 25px #777; -moz-box-shadow: 0 0 25px #777;
41
+ -webkit-border-bottom-left-radius: 5px; -moz-border-radius-bottomleft: 5px;
42
+ font: 10px Arial;
43
+ text-transform: uppercase;
44
+ cursor: pointer;
45
+ text-align: right;
46
+ }
47
+ #jump_to, #jump_wrapper {
48
+ position: fixed;
49
+ right: 0; top: 0;
50
+ padding: 5px 10px;
51
+ }
52
+ #jump_wrapper {
53
+ padding: 0;
54
+ display: none;
55
+ }
56
+ #jump_to:hover #jump_wrapper {
57
+ display: block;
58
+ }
59
+ #jump_page {
60
+ padding: 5px 0 3px;
61
+ margin: 0 0 25px 25px;
62
+ }
63
+ #jump_page .source {
64
+ display: block;
65
+ padding: 5px 10px;
66
+ text-decoration: none;
67
+ border-top: 1px solid #eee;
68
+ }
69
+ #jump_page .source:hover {
70
+ background: #f5f5ff;
71
+ }
72
+ #jump_page .source:first-child {
73
+ }
74
+ div.docs {
75
+ float: left;
76
+ max-width: 500px;
77
+ min-width: 500px;
78
+ min-height: 5px;
79
+ padding: 10px 25px 1px 50px;
80
+ vertical-align: top;
81
+ text-align: left;
82
+ }
83
+ .docs pre {
84
+ margin: 15px 0 15px;
85
+ padding-left: 15px;
86
+ }
87
+ .docs p tt, .docs p code {
88
+ background: #f8f8ff;
89
+ border: 1px solid #dedede;
90
+ font-size: 12px;
91
+ padding: 0 0.2em;
92
+ }
93
+ .octowrap {
94
+ position: relative;
95
+ }
96
+ .octothorpe {
97
+ font: 12px Arial;
98
+ text-decoration: none;
99
+ color: #454545;
100
+ position: absolute;
101
+ top: 3px; left: -20px;
102
+ padding: 1px 2px;
103
+ opacity: 0;
104
+ -webkit-transition: opacity 0.2s linear;
105
+ }
106
+ div.docs:hover .octothorpe {
107
+ opacity: 1;
108
+ }
109
+ div.code {
110
+ margin-left: 580px;
111
+ padding: 14px 15px 16px 50px;
112
+ vertical-align: top;
113
+ }
114
+ .code pre, .docs p code {
115
+ font-size: 12px;
116
+ }
117
+ pre, tt, code {
118
+ line-height: 18px;
119
+ font-family: Monaco, Consolas, "Lucida Console", monospace;
120
+ margin: 0; padding: 0;
121
+ }
122
+ div.clearall {
123
+ clear: both;
124
+ }
125
+
126
+
127
+ /*---------------------- Syntax Highlighting -----------------------------*/
128
+ td.linenos { background-color: #f0f0f0; padding-right: 10px; }
129
+ span.lineno { background-color: #f0f0f0; padding: 0 5px 0 5px; }
130
+ body .hll { background-color: #ffffcc }
131
+ body .c { color: #408080; font-style: italic } /* Comment */
132
+ body .err { border: 1px solid #FF0000 } /* Error */
133
+ body .k { color: #954121 } /* Keyword */
134
+ body .o { color: #666666 } /* Operator */
135
+ body .cm { color: #408080; font-style: italic } /* Comment.Multiline */
136
+ body .cp { color: #BC7A00 } /* Comment.Preproc */
137
+ body .c1 { color: #408080; font-style: italic } /* Comment.Single */
138
+ body .cs { color: #408080; font-style: italic } /* Comment.Special */
139
+ body .gd { color: #A00000 } /* Generic.Deleted */
140
+ body .ge { font-style: italic } /* Generic.Emph */
141
+ body .gr { color: #FF0000 } /* Generic.Error */
142
+ body .gh { color: #000080; font-weight: bold } /* Generic.Heading */
143
+ body .gi { color: #00A000 } /* Generic.Inserted */
144
+ body .go { color: #808080 } /* Generic.Output */
145
+ body .gp { color: #000080; font-weight: bold } /* Generic.Prompt */
146
+ body .gs { font-weight: bold } /* Generic.Strong */
147
+ body .gu { color: #800080; font-weight: bold } /* Generic.Subheading */
148
+ body .gt { color: #0040D0 } /* Generic.Traceback */
149
+ body .kc { color: #954121 } /* Keyword.Constant */
150
+ body .kd { color: #954121; font-weight: bold } /* Keyword.Declaration */
151
+ body .kn { color: #954121; font-weight: bold } /* Keyword.Namespace */
152
+ body .kp { color: #954121 } /* Keyword.Pseudo */
153
+ body .kr { color: #954121; font-weight: bold } /* Keyword.Reserved */
154
+ body .kt { color: #B00040 } /* Keyword.Type */
155
+ body .m { color: #666666 } /* Literal.Number */
156
+ body .s { color: #219161 } /* Literal.String */
157
+ body .na { color: #7D9029 } /* Name.Attribute */
158
+ body .nb { color: #954121 } /* Name.Builtin */
159
+ body .nc { color: #0000FF; font-weight: bold } /* Name.Class */
160
+ body .no { color: #880000 } /* Name.Constant */
161
+ body .nd { color: #AA22FF } /* Name.Decorator */
162
+ body .ni { color: #999999; font-weight: bold } /* Name.Entity */
163
+ body .ne { color: #D2413A; font-weight: bold } /* Name.Exception */
164
+ body .nf { color: #0000FF } /* Name.Function */
165
+ body .nl { color: #A0A000 } /* Name.Label */
166
+ body .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */
167
+ body .nt { color: #954121; font-weight: bold } /* Name.Tag */
168
+ body .nv { color: #19469D } /* Name.Variable */
169
+ body .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */
170
+ body .w { color: #bbbbbb } /* Text.Whitespace */
171
+ body .mf { color: #666666 } /* Literal.Number.Float */
172
+ body .mh { color: #666666 } /* Literal.Number.Hex */
173
+ body .mi { color: #666666 } /* Literal.Number.Integer */
174
+ body .mo { color: #666666 } /* Literal.Number.Oct */
175
+ body .sb { color: #219161 } /* Literal.String.Backtick */
176
+ body .sc { color: #219161 } /* Literal.String.Char */
177
+ body .sd { color: #219161; font-style: italic } /* Literal.String.Doc */
178
+ body .s2 { color: #219161 } /* Literal.String.Double */
179
+ body .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */
180
+ body .sh { color: #219161 } /* Literal.String.Heredoc */
181
+ body .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */
182
+ body .sx { color: #954121 } /* Literal.String.Other */
183
+ body .sr { color: #BB6688 } /* Literal.String.Regex */
184
+ body .s1 { color: #219161 } /* Literal.String.Single */
185
+ body .ss { color: #19469D } /* Literal.String.Symbol */
186
+ body .bp { color: #954121 } /* Name.Builtin.Pseudo */
187
+ body .vc { color: #19469D } /* Name.Variable.Class */
188
+ body .vg { color: #19469D } /* Name.Variable.Global */
189
+ body .vi { color: #19469D } /* Name.Variable.Instance */
190
+ body .il { color: #666666 } /* Literal.Number.Integer.Long */
know_doc.png ADDED
requirements.txt ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Flask==2.2.3
2
+ Flask-Cors==3.0.10
3
+ langchain==0.0.107
4
+ PyYAML==6.0
5
+ nltk==3.7
6
+ openai==0.27.0
7
+ layoutparser==0.3.4
8
+ transformers==4.26.1
9
+ unstructured==0.5.0
10
+ python-magic==0.4.27
11
+ pinecone-client==2.2.1
12
+ beautifulsoup4
13
+ chromadb==0.3.11
14
+ -e git+https://github.com/facebookresearch/[email protected]#egg=detectron2
15
+
static/style.css ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /* Set a modern font and background color */
2
+ body {
3
+ font-family: 'Roboto', sans-serif;
4
+ background-color: #f1f9ff;
5
+ }
6
+
7
+ /* Center the chat box */
8
+ #chat-box {
9
+ margin: 0 auto;
10
+ max-width: 500px;
11
+ padding: 20px;
12
+ background-color: #fff;
13
+ border-radius: 10px;
14
+ box-shadow: 0 5px 10px rgba(0, 0, 0, 0.2);
15
+ position: absolute;
16
+ top: 50%;
17
+ left: 50%;
18
+ transform: translate(-50%, -50%);
19
+ }
20
+
21
+ /* Add a subtle animation */
22
+ #chat-box {
23
+ animation: fadein 1s;
24
+ }
25
+
26
+ #chat-area {
27
+ overflow-y: auto;
28
+ max-height: 600px; /* set a maximum height for the chat area */
29
+ display: flex;
30
+ flex-direction: column;
31
+ }
32
+
33
+ @keyframes fadein {
34
+ from {
35
+ opacity: 0;
36
+ }
37
+ to {
38
+ opacity: 1;
39
+ }
40
+ }
41
+
42
+ /* Style the chat messages */
43
+ .message-container {
44
+ display: flex;
45
+ flex-direction: column;
46
+ }
47
+
48
+ .user-message {
49
+ background-image: linear-gradient(to bottom right, #79b6f2, #6daff0);
50
+ color: #ffffff;
51
+ align-self: self-end;
52
+ text-align: right;
53
+ margin-bottom: 10px;
54
+ margin-right: 5px;
55
+ border-radius: 10px 10px 0 10px;
56
+ padding: 10px 15px;
57
+ max-width: fit-content;
58
+ word-wrap: break-word;
59
+ font-size: 16px;
60
+ white-space: pre-wrap;
61
+ float: right;
62
+ }
63
+
64
+ .bot-message {
65
+ background-color: #f0f0f0;
66
+ color: #000000;
67
+ text-align: left;
68
+ margin-bottom: 10px;
69
+ border-radius: 10px 10px 10px 0;
70
+ padding: 10px 15px;
71
+ word-wrap: break-word;
72
+ max-width: fit-content;
73
+ font-size: 16px;
74
+ white-space: pre-wrap;
75
+ align-self: flex-start;
76
+ }
77
+
78
+ .bot-message a {
79
+ color: #0d6efd;
80
+ text-decoration: underline;
81
+ }
82
+
83
+ .bot-message a:hover {
84
+ color: #0056b3;
85
+ text-decoration: none;
86
+ }
87
+
88
+ .timestamp {
89
+ display: flex;
90
+ justify-content: space-between;
91
+ font-size: 12px;
92
+ color: #999;
93
+ margin-right: 5px;
94
+ text-align: right;
95
+ float: right;
96
+ }
97
+
98
+ .bot-timestamp {
99
+ justify-content: space-between;
100
+ font-size: 12px;
101
+ color: #999;
102
+ margin-right: 5px;
103
+ }
104
+
105
+ .w-100 {
106
+ padding-top: 10px;
107
+ }
108
+
109
+ /* Style the input area */
110
+ #input-area {
111
+ display: flex;
112
+ align-items: center;
113
+ margin-top: 20px;
114
+ }
115
+
116
+ #question-input {
117
+ flex-grow: 1;
118
+ border: none;
119
+ padding: 12px 15px;
120
+ border-radius: 25px;
121
+ font-size: 16px;
122
+ background-color: #fff;
123
+ color: #000;
124
+ box-shadow: 0 2px 5px rgba(0, 0, 0, 0.1);
125
+ transition: box-shadow 0.3s ease-in-out;
126
+ }
127
+
128
+ #question-input:focus {
129
+ outline: none;
130
+ box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);
131
+ }
132
+
133
+ #send-button {
134
+ background-color: #79b6f2;
135
+ color: #fff;
136
+ border: none;
137
+ padding: 12px 20px;
138
+ border-radius: 25px;
139
+ font-size: 16px;
140
+ margin-left: 10px;
141
+ cursor: pointer;
142
+ transition: all 0.3s ease-in-out;
143
+ }
144
+
145
+ #send-button:hover {
146
+ background-color: #558fcf;
147
+ transform: translateY(-2px);
148
+ box-shadow: 0 5px 15px rgba(0, 0, 0, 0.1);
149
+ }
150
+
151
+ /* Style the loading indicator */
152
+ #loading-indicator {
153
+ display: none;
154
+ text-align: center;
155
+ }
156
+ /* Define the spinner shape */
157
+ .spinner {
158
+ width: 20px;
159
+ height: 20px;
160
+ margin-top: 10px;
161
+ position: relative;
162
+ perspective: 800px;
163
+ display: flex;
164
+ align-items: center;
165
+ justify-content: center;
166
+ }
167
+
168
+ .spinner::before {
169
+ content: "";
170
+ display: block;
171
+ position: absolute;
172
+ top: 0;
173
+ left: 0;
174
+ width: 20px;
175
+ height: 20px;
176
+ border-radius: 50%;
177
+ box-shadow:
178
+ inset 0 -3px 0 rgba(0,0,0,.1),
179
+ inset 0 -3px 3px rgba(0,0,0,.2),
180
+ inset 0 -3px 6px rgba(0,0,0,.2),
181
+ 0 0 6px 1px #007bff;
182
+ transform: rotate(45deg);
183
+ animation: spinner 1.5s cubic-bezier(.4,0,.2,1) infinite;
184
+ }
185
+
186
+ /* Define the spinner animation */
187
+ @keyframes spinner {
188
+ 0% {
189
+ transform: rotate(45deg) scale(1);
190
+ }
191
+ 50% {
192
+ transform: rotate(405deg) scale(.2);
193
+ opacity: .7;
194
+ }
195
+ 100% {
196
+ transform: rotate(765deg) scale(1);
197
+ opacity: 1;
198
+ }
199
+ }
200
+
201
+ #typing-indicator {
202
+ display: none;
203
+ font-style: italic;
204
+ margin-bottom: 10px;
205
+ }
206
+
templates/app.py ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import logging
3
+ from flask import Flask, request, jsonify, render_template
4
+ from langchain.chains.question_answering import load_qa_chain
5
+ from langchain.document_loaders import DirectoryLoader
6
+ from langchain.llms import OpenAIChat
7
+ from langchain.prompts import PromptTemplate
8
+ from langchain.memory import ConversationBufferMemory
9
+ from langchain.document_loaders import WebBaseLoader
10
+ import yaml
11
+
12
+ from langchain.embeddings import OpenAIEmbeddings
13
+ from langchain.text_splitter import CharacterTextSplitter
14
+ from langchain.embeddings.openai import OpenAIEmbeddings
15
+ from langchain.vectorstores import Chroma
16
+
17
+ import nltk
18
+
19
+ nltk.download("punkt")
20
+
21
+ # Set up logging
22
+ logging.basicConfig(level=logging.INFO)
23
+ logger = logging.getLogger(__name__)
24
+
25
+ # Load configuration from YAML file
26
+ with open("config.yaml", "r") as f:
27
+ config = yaml.safe_load(f)
28
+
29
+ os.environ["OPENAI_API_KEY"] = config["openai_api_key"]
30
+
31
+ template_dir = os.path.abspath("templates")
32
+ app = Flask(__name__, template_folder=template_dir, static_folder="static")
33
+
34
+ # Load the files
35
+ loader = DirectoryLoader(config["data_directory"], glob=config["data_files_glob"])
36
+ docs = loader.load()
37
+
38
+ webpages = config.get("webpages", [])
39
+ web_docs = []
40
+ for webpage in webpages:
41
+ logger.info(f"Loading data from webpage {webpage}")
42
+ loader = WebBaseLoader(webpage)
43
+ web_docs += loader.load()
44
+
45
+ result = docs + web_docs
46
+
47
+ tone = config.get("tone", "default")
48
+ persona = config.get("persona", "default")
49
+
50
+ text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
51
+ texts = text_splitter.split_documents(result)
52
+ embeddings = OpenAIEmbeddings(openai_api_key=config["openai_api_key"])
53
+ docsearch = Chroma.from_documents(texts, embeddings)
54
+
55
+ # Initialize the QA chain
56
+ logger.info("Initializing QA chain...")
57
+ chain = load_qa_chain(
58
+ OpenAIChat(),
59
+ chain_type="stuff",
60
+ memory=ConversationBufferMemory(memory_key="chat_history", input_key="human_input"),
61
+ prompt=PromptTemplate(
62
+ input_variables=["chat_history", "human_input", "context", "tone", "persona"],
63
+ template="""You are a chatbot who acts like {persona}, having a conversation with a human.
64
+
65
+ Given the following extracted parts of a long document and a question, Create a final answer with references ("SOURCES") in the tone {tone}.
66
+ If you don't know the answer, just say that you don't know. Don't try to make up an answer.
67
+ ALWAYS return a "SOURCES" part in your answer.
68
+ SOURCES should only be hyperlink URLs which are genuine and not made up.
69
+
70
+ {context}
71
+
72
+ {chat_history}
73
+ Human: {human_input}
74
+ Chatbot:""",
75
+ ),
76
+ verbose=False,
77
+ )
78
+
79
+
80
+ @app.route("/")
81
+ def index():
82
+ return render_template("index.html")
83
+
84
+
85
+ @app.route("/api/chat", methods=["POST"])
86
+ def chat():
87
+ try:
88
+ # Get the question from the request
89
+ question = request.json["question"]
90
+ documents = docsearch.similarity_search(question, include_metadata=True)
91
+
92
+ # Get the bot's response
93
+ response = chain(
94
+ {
95
+ "input_documents": documents,
96
+ "human_input": question,
97
+ "tone": tone,
98
+ "persona": persona,
99
+ },
100
+ return_only_outputs=True,
101
+ )["output_text"]
102
+
103
+ # Increment message counter
104
+ session_counter = request.cookies.get('session_counter')
105
+ if session_counter is None:
106
+ session_counter = 0
107
+ else:
108
+ session_counter = int(session_counter) + 1
109
+
110
+ # Check if it's time to flush memory
111
+ if session_counter % 10 == 0:
112
+ chain.memory.clear()
113
+
114
+ # Set the session counter cookie
115
+ resp = jsonify({"response": response})
116
+ resp.set_cookie('session_counter', str(session_counter))
117
+
118
+ # Return the response as JSON with the session counter cookie
119
+ return resp
120
+
121
+ except Exception as e:
122
+ # Log the error and return an error response
123
+ logger.error(f"Error while processing request: {e}")
124
+ return jsonify({"error": "Unable to process the request."}), 500
125
+
126
+ if __name__ == "__main__":
127
+ app.run(debug=True)
templates/index.html ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html>
3
+ <head>
4
+ <title>Chat Box</title>
5
+ <link rel="stylesheet" href="static/style.css" />
6
+ <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
7
+ <script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js" integrity="sha384-w76AqPfDkMBDXo30jS1Sgez6pr3x5MlQ1ZAGC+nuZB+EYdgRZgiwxhTBTkF7CXvN" crossorigin="anonymous"></script>
8
+ <script>
9
+ // Send the user's question to the server and display the response
10
+ function sendQuestion() {
11
+ var question = $("#question-input").val();
12
+ if (question) {
13
+ var timestamp = new Date().toLocaleTimeString();
14
+ $("#chat-area").append("<div container class='message-container'><div class='row'><div class='col timestamp'><p class='small mb-1 text-muted'>" + timestamp + "</p></div></div><div class='row user-message'><div class='col'><span>" + question + "<span></div></div></div>");
15
+ $("#question-input").val("");
16
+ $("#chat-area").scrollTop($("#chat-area").prop("scrollHeight"));
17
+ $("#loading-indicator").show();
18
+ $.ajax({
19
+ url: "/api/chat",
20
+ type: "POST",
21
+ contentType: "application/json",
22
+ data: JSON.stringify({ question: question }),
23
+ success: function(data) {
24
+ var response = data.response.replace(/\n/g, "<br><br>");
25
+ var typingSpeed = 50; // in milliseconds
26
+ var responseArray = response.split(" ");
27
+ var currentIndex = 0;
28
+ var responseTimer = setInterval(function() {
29
+ if (currentIndex < responseArray.length) {
30
+ var responseText = responseArray.slice(0, currentIndex + 1).join(" ");
31
+ $("#typing-indicator").html(responseText);
32
+ currentIndex++;
33
+ } else {
34
+ clearInterval(responseTimer);
35
+ var timestamp = new Date().toLocaleTimeString();
36
+ $("#typing-indicator").html("");
37
+ var response = data.response.replace(/\n/g, "<br>");
38
+ $("#chat-area").append("<div container class='message-container'><div class='row'><div class='col bot-timestamp'><p class='small mb-1 text-muted'>" + timestamp + "</p></div></div><div class='row bot-message'><div class='col'><span>" + response.replace(/\n/g, "<br><br>").replace(/(https?:\/\/[^\s]+)/g, "<a href='$1' target='_blank'>$1</a>") + "<span></div></div></div>");
39
+ $("#chat-area").scrollTop($("#chat-area").prop("scrollHeight"));
40
+ $("#loading-indicator").hide();
41
+ }
42
+ }, typingSpeed);
43
+
44
+ },
45
+ error: function() {
46
+ alert("Unable to process the request.");
47
+ $("#loading-indicator").hide();
48
+ }
49
+ });
50
+ }
51
+ }
52
+
53
+
54
+ // Send the user's question when they press Enter in the text input field
55
+ $("#question-input").keydown(function(event) {
56
+ if (event.keyCode == 13) {
57
+ sendQuestion();
58
+ $("#question-input").val("");
59
+ return false;
60
+ }
61
+ });
62
+ </script>
63
+ </head>
64
+ <body>
65
+ <div id="chat-box">
66
+ <div id="chat-area"></div>
67
+ <div id="input-area">
68
+ <input type="text" id="question-input" placeholder="Ask here" />
69
+ <button id="send-button" onclick="sendQuestion()">Send</button>
70
+ </div>
71
+ <div id="typing-indicator"></div>
72
+ <div id="loading-indicator">
73
+ <div class="loader"></div>
74
+ </div>
75
+ </div>
76
+ <script>
77
+ var input = document.getElementById("question-input");
78
+ input.addEventListener("keypress", function(event) {
79
+ if (event.key === "Enter") {
80
+ event.preventDefault();
81
+ document.getElementById("send-button").click();
82
+ }
83
+ });
84
+ </script>
85
+ </body>
86
+ </html>