File size: 4,700 Bytes
393c3d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58539d0
393c3d2
6ca7ed1
 
ed26cfe
6ca7ed1
 
393c3d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ed26cfe
 
 
 
d3a7366
8d6493e
5d2c53f
 
 
 
 
 
393c3d2
 
 
 
5d2c53f
 
 
 
 
393c3d2
 
 
 
 
 
 
 
 
 
 
 
 
 
5d2c53f
393c3d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5d2c53f
393c3d2
8d6493e
 
 
 
 
 
 
 
393c3d2
 
 
 
8d6493e
 
 
 
 
 
 
 
 
b931d9e
c80a125
8d6493e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
license: apache-2.0
language:
- en
library_name: transformers
tags:
- VoiceAssistant
- SpeechRecognition
- RAG
- TextToSpeech
- Langchain
- FAISS
- Ollama
- ConversationalAI
- VectorDatabase
- LLM
---
# Voice Assistant with RAG

<p align="center">
<!-- Smaller size image -->
<img src="https://img.freepik.com/free-vector/people-with-mobile-phones-using-smart-voice-assistant-software-man-woman-near-screen-with-microphone-soundwaves-sound-recording-app-interface-ai-technology-concept_74855-10131.jpg?w=740&t=st=1721482446~exp=1721483046~hmac=3dbfb5f5a3ff560d4314c4f8be7fdc83727aee234e2efc837b51c3e75c5acb2b" alt="Image" style="width:600px; height:300px;">
</p>

This project implements a voice assistant that uses Retrieval-Augmented Generation (RAG) and speech recognition to provide responses to user queries. The assistant can listen to voice input, process it, and respond with synthesized speech based on the knowledge base you passed.

## Features

- Speech recognition using Google Speech Recognition
- Text-to-Speech (TTS) using Mozilla TTS
- RAG-based question answering using Langchain and FAISS
- Integration with Ollama for language model processing

## Prerequisites

Before running this project, make sure you have the following dependencies installed:

- Python 3.7+
- PyTorch
- Transformers
- SpeechRecognition
- soundfile
- playsound
- TTS
- Langchain
- FAISS
- Ollama

<p align="center">
<!-- Smaller size image -->
<img src="https://huggingface.co/foduucom/VoiceGrit/resolve/main/Flow%20chart%205.jpg" alt="Image" style="width:600px; height:300px;">
</p>

# How to get started with project

1. Clone this repository.
```
git clone https://huggingface.co/foduucom/Voice-Assistant-using-RAG
```
2. Create conda environment.
```
conda create -n VoiceAI python==3.10
conda activate VoiceAI
```
3. You can install most of these dependencies using pip:
```
pip install -r requirements.txt
```

For Ollama, follow the installation instructions on their official website https://ollama.com/library/llama3.

## Setup

1. Clone this repository to your local machine.
2. Install the required dependencies as mentioned above.
3. Make sure you have the `KnowledgeBase.pdf` file in the same directory as the script. This file will be used to create the knowledge base for the RAG system.
4. Ensure that Ollama is running on `http://localhost:11434` with the `llama3` model loaded.

## Usage

To run the voice assistant, execute the following command in your terminal:

```
python Voice_Assistant.py
```

The assistant will start listening for your voice input. Speak clearly into your microphone to ask questions or give commands. The assistant will process your input and respond with synthesized speech.

## How It Works

1. The script loads the knowledge base from `KnowledgeBase.pdf` and creates a FAISS vector store using sentence embeddings.
2. It sets up a Retrieval QA chain using Ollama as the language model and the FAISS vector store as the retriever.
3. The main loop continuously listens for voice input using the computer's microphone.
4. When speech is detected, it's converted to text using Google's Speech Recognition service.
5. The text query is then processed by the RAG system to generate a response.
6. The response is converted to speech using Mozilla TTS and played back to the user.

## Customization

- To use a different knowledge base, replace `KnowledgeBase.pdf` with your own PDF file and update the filename in the script.
- You can experiment with different embedding models by changing the `model_name` in the `HuggingFaceEmbeddings` initialization.
- To use a different Ollama model, update the `model` parameter in the `Ollama` initialization.
- Try to use other TTS frameworks - Melo TTS, coqui TTS, Mars5 TTS.

## Further Improvements

1. Works on achieving lower latency for responses.
2. Understand speech better, even with background noise or accents.
3. Learn to speak and understand more languages.
4. Have better conversations by remembering what we talked about before.
5. Sound more natural when I speak, maybe even express emotions.
   
## Troubleshooting

- If you encounter issues with speech recognition, ensure your microphone is properly connected and configured.
- For TTS issues, make sure you have the necessary audio drivers installed on your system.
- If the RAG system is not working as expected, check that your knowledge base PDF is properly formatted and contains relevant information.


## Model Card Contact

For inquiries and contributions, please contact us at [email protected].

```bibtex
@ModelCard{
    author       = {Nehul Agrawal and Roshan Kshirsagar}
    title         = {Voice Assistant},
    year           = {2024}
}