File size: 2,539 Bytes
b9be4de db6b619 1ed8d55 db6b619 c44212d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# Documentation
## File Structure:
- `docs/` - Documentation files
- `code/` - Code files
- `storage/` - Storage files
- `vectorstores/` - Vector Databases
- `.env` - Environment Variables
- `Dockerfile` - Dockerfile for Hugging Face
- `.chainlit` - Chainlit Configuration
- `chainlit.md` - Chainlit README
- `README.md` - Repository README
- `.gitignore` - Gitignore file
- `requirements.txt` - Python Requirements
- `.gitattributes` - Gitattributes file
## Code Structure
- `code/main.py` - Main Chainlit App
- `code/config.yaml` - Configuration File to set Embedding related, Vector Database related, and Chat Model related parameters.
- `code/modules/vector_db.py` - Vector Database Creation
- `code/modules/chat_model_loader.py` - Chat Model Loader (Creates the Chat Model)
- `code/modules/constants.py` - Constants (Loads the Environment Variables, Prompts, Model Paths, etc.)
- `code/modules/data_loader.py` - Loads and Chunks the Data
- `code/modules/embedding_model.py` - Creates the Embedding Model to Embed the Data
- `code/modules/llm_tutor.py` - Creates the RAG LLM Tutor
- The Function `qa_bot()` loads the vector database and the chat model, and sets the prompt to pass to the chat model.
- `code/modules/helpers.py` - Helper Functions
## Storage and Vectorstores
- `storage/data/` - Data Storage (Put your pdf files under this directory, and urls in the urls.txt file)
- `storage/models/` - Model Storage (Put your local LLMs under this directory)
- `vectorstores/` - Vector Databases (Stores the Vector Databases generated from `code/modules/vector_db.py`)
## Useful Configurations
set these in `code/config.yaml`:
* ``["embedding_options"]["embedd_files"]`` - If set to True, embeds the files from the storage directory everytime you run the chainlit command. If set to False, uses the stored vector database.
* ``["embedding_options"]["expand_urls"]`` - If set to True, gets and reads the data from all the links under the url provided. If set to False, only reads the data in the url provided.
* ``["embedding_options"]["search_top_k"]`` - Number of sources that the retriever returns
* ``["llm_params]["use_history"]`` - Whether to use history in the prompt or not
* ``["llm_params]["memory_window"]`` - Number of interactions to keep a track of in the history
## LlamaCpp
* https://python.langchain.com/docs/integrations/llms/llamacpp
## Hugging Face Models
* Download the ``.gguf`` files for your Local LLM from Hugging Face (Example: https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF) |