Best option for deployment?

Hi everyone, I’m new here. Looking forward to getting to know more people in the AI space. I’m still a junior in AI and ML but I’ve been a software engineer for 10 years.

I recently built a chatbot using llamaindex with RAG. I am using Ollama to host Llama 3 LLM locally. I can run everything locally on my pc, where I have a Flask API as my end point, that when the end point is hit with a question, the api calls into my llamaindex code. This works really well on my pc with a strong GPU.

My question is, what are my options for deployment in AWS? The llama3 is 4gb, plus another LLM for embeddings for RAG. There’s so much information on tutorials on how to build these AI apps, but very little on deployment in production.

How are companies doing these types of deployments?
Is this the best way to go about deployment?
Is it better to use chapgpt and pay for tokens?

Please, any advice or guidance would really be appreciated.

2 Likes

This is something I’m trying to figure out too.

This example is one I’m going to try and modify for an MVP: SpacesExamples/fastapi_t5 at main

Otherwise, you can use things like AWS sage, Azure, GCP, Paperspace.

I think you’d have to swap out Ollama for llama.cpp or HFUF for production.

Let’s jam on this!

Check out this: