nehulagrawal commited on
Commit
393c3d2
·
verified ·
1 Parent(s): 43a4e2c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -0
README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - VoiceAssistant
8
+ - SpeechRecognition
9
+ - RAG
10
+ - TextToSpeech
11
+ - Langchain
12
+ - FAISS
13
+ - Ollama
14
+ - ConversationalAI
15
+ - VectorDatabase
16
+ - LLM
17
+ ---
18
+ # Voice Assistant with RAG and Speech Recognition
19
+
20
+ This project implements a voice assistant that uses Retrieval-Augmented Generation (RAG) and speech recognition to provide responses to user queries. The assistant can listen to voice input, process it, and respond with synthesized speech based on the knowledge base you passed.
21
+
22
+ ## Features
23
+
24
+ - Speech recognition using Google Speech Recognition
25
+ - Text-to-Speech (TTS) using Mozilla TTS
26
+ - RAG-based question answering using Langchain and FAISS
27
+ - Integration with Ollama for language model processing
28
+
29
+ ## Prerequisites
30
+
31
+ Before running this project, make sure you have the following dependencies installed:
32
+
33
+ - Python 3.7+
34
+ - PyTorch
35
+ - Transformers
36
+ - SpeechRecognition
37
+ - pyttsx3
38
+ - soundfile
39
+ - playsound
40
+ - TTS
41
+ - Langchain
42
+ - FAISS
43
+ - Ollama
44
+
45
+ Create an Conda environment
46
+ ```
47
+ conda create -n VoiceAI python==3.10
48
+ conda activate VoiceAI
49
+ ```
50
+ You can install most of these dependencies using pip:
51
+ ```
52
+ pip install torch transformers speechrecognition pyttsx3 soundfile playsound TTS langchain faiss-cpu
53
+ ```
54
+
55
+ For Ollama, follow the installation instructions on their official website https://ollama.com/library/llama3.
56
+
57
+ ## Setup
58
+
59
+ 1. Clone this repository to your local machine.
60
+ 2. Install the required dependencies as mentioned above.
61
+ 3. Make sure you have the `KnowledgeBase.pdf` file in the same directory as the script. This file will be used to create the knowledge base for the RAG system.
62
+ 4. Ensure that Ollama is running on `http://localhost:11434` with the `llama3` model loaded.
63
+
64
+ ## Usage
65
+
66
+ To run the voice assistant, execute the following command in your terminal:
67
+
68
+ ```
69
+ python voice_assistant.py
70
+ ```
71
+
72
+ The assistant will start listening for your voice input. Speak clearly into your microphone to ask questions or give commands. The assistant will process your input and respond with synthesized speech.
73
+
74
+ ## How It Works
75
+
76
+ 1. The script loads the knowledge base from `KnowledgeBase.pdf` and creates a FAISS vector store using sentence embeddings.
77
+ 2. It sets up a Retrieval QA chain using Ollama as the language model and the FAISS vector store as the retriever.
78
+ 3. The main loop continuously listens for voice input using the computer's microphone.
79
+ 4. When speech is detected, it's converted to text using Google's Speech Recognition service.
80
+ 5. The text query is then processed by the RAG system to generate a response.
81
+ 6. The response is converted to speech using Mozilla TTS and played back to the user.
82
+
83
+ ## Customization
84
+
85
+ - To use a different knowledge base, replace `KnowledgeBase.pdf` with your own PDF file and update the filename in the script.
86
+ - You can experiment with different embedding models by changing the `model_name` in the `HuggingFaceEmbeddings` initialization.
87
+ - To use a different Ollama model, update the `model` parameter in the `Ollama` initialization.
88
+ - Try to use other TTS frameworks - MeloTTS, coquiTTS, Mars5TTS.
89
+
90
+ ## Troubleshooting
91
+
92
+ - If you encounter issues with speech recognition, ensure your microphone is properly connected and configured.
93
+ - For TTS issues, make sure you have the necessary audio drivers installed on your system.
94
+ - If the RAG system is not working as expected, check that your knowledge base PDF is properly formatted and contains relevant information.