Spaces:
Runtime error
Runtime error
Too slow with CPU
#2
by
cmaire
- opened
Hi,
I tried your model but is is so slow. What can I do ?
Thank you
Hi @cmaire ,
Since I don't have access to a GPU-enabled space, here's how to run the model locally:
Prerequisites:
- Request access to meta-llama/Llama-3.1-8B
- Generate a Hugging Face token
- Install Docker with NVIDIA GPU libraries on your CUDA-enabled machine
For Llama 3.1 (≈3 min/query on my poor RTX2860):
docker run --gpus=all -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="hf_your_token" \
registry.hf.space/eltorio-llama-3-1-8b-appreciation:latest python app.py
For faster responses, try Llama 3.2 3B (<10s/query):
- Request access to meta-llama/Llama-3.2-3B
- Run:
docker run --gpus=all -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="hf_your_token" \
registry.hf.space/eltorio-llama-3-2-3b-appreciation:latest python app.py
Access the interface at http://localhost:7860
Best regards,
Ronan
Thank you for your answer but my graphic card is AMD , is it ok ?
short:
No
In fact it should be possible but I never tried see https://huggingface.co/blog/huggingface-and-optimum-amd
eltorio
changed discussion status to
closed