deepseek-ai/DeepSeek-V3 · vLLM on A100s

4 days ago

•

Has anyone successfully served this model with vLLM?? I am trying to load and serve it on a cluster, where I have access to 7 nodes with 2 A100s. No OOM issues, but don't have docker. I was:

Starting a Ray cluster with a head node, all other nodes as workers, all with the same comida env. Nothing fancy, just ray start ...
vllm serve ...
--tensor-parallel-size 2 # no. of GPUs per node
--pipeline-parallel-size 6 # no. of nodes

, and then getting all sorts of errors. Any insights???

Also, since A100s can't do fp8, I was setting --dtype bfloat16, --quantization none, etc. But not sure if this is enough, and haven't seen any bf16 version uploaded..

21world

4 days ago

https://huggingface.co/opensourcerelease/DeepSeek-V3-bf16/tree/main

fsaudm

4 days ago

@21world already downloading! The other question still remains, anyone any luck with vLLM and A100s?

21world

4 days ago

•

edited 4 days ago

must have 40 pcs a100 with 40gb gpu ram each (20pcs if gpus have 80gb ram each) ,total 1600gb gpu ram,bf16 is double size , also soft that will run model in paralel gpus

21world

4 days ago

•

edited 4 days ago

i never used vLLM ,only llama.cpp with other models,this model v3 is not supported by llama.cpp