project_ouline.md · bardicreels/rag at main

Goal: Create a PDF RAG chatbot to work on Hugging Face Spaces. Key points:

Using Hugging Face Spaces to host the chatbot. using html on space using flask locally to test out features

@huggingface.co/docs @huggingface.co/docs/hub/spaces-sdks-python @hugging face static docs

Create a PDF-based RAG (Retrieval-Augmented Generation) chatbot. Implement character-based interactions, where the chatbot embodies a persona based on the PDF content. Deploy the chatbot on Hugging Face Spaces using a static HTML frontend and Flask backend. Develop a local Flask setup for testing and development purposes. Implement efficient PDF processing, including text extraction and chunking. Utilize Hugging Face models for text embedding and generation. Create a user-friendly web interface for interacting with the chatbot. Ensure the chatbot provides contextually relevant responses based on the PDF content Create a RAG (Retrieval-Augmented Generation) chatbot Use a PDF file as the knowledge base Have the chatbot take on the role of a character Users will interact with it as though it were a living version of the data Deploy the project on Hugging Face Spaces Use static HTML for the frontend on Hugging Face Spaces Use Flask locally to test out features Focus on PDF functionality for now (VTT and JSON are stretch goals) Store the PDF file in a 'data/' folder within the project structure PDF Storage: All PDF files should be stored in the Hugging Face repository, not locally. Model Hosting: All machine learning models (for embedding, text generation, etc.) should be hosted on Hugging Face, not run locally. Heavy Computations: All computationally intensive tasks such as PDF processing, text embedding, and response generation should be performed on Hugging Face's servers. API Usage: Interact with Hugging Face models and services via their API, sending requests from your local machine but having the processing done remotely. Local Testing: Flask Server: Run a local Flask server for development and testing purposes. Minimal Local Dependencies: Keep local dependencies to a minimum, mainly Flask and libraries needed for API interactions. Local Web Interface: Serve a simple HTML/JavaScript frontend locally for testing the chatbot interface. API Key Management: Use environment variables to manage API keys locally without exposing them.