engine
I try this on:
- vllm
- sglang
- tgi
None of them work
How did you run this model, could you please give a walkthrough?
I have updated the readme to run this model on CPU.
Could you attach more logs for vllms, the model in main branch is a typic autogptq format, which should be supported by these backends. In transformers, we found it may need 8X80G for this model, though the model is only ~330G.
We will test some of other models we exported to check whether it's the issue of exporting.
I launched VLLM (8 x H100 SXM) using the following command to start the server:
vllm serve --model OPEA/DeepSeek-V3-int4-sym-gptq-inc-preview --tensor-parallel-size 8 --max-model-len 16384 --max_num_seqs 1 --trust-remote-code
Then, I made a simple “Hello” request like this:
chat_completion = client.chat.completions.create(
model="OPEA/DeepSeek-V3-int4-sym-gptq-inc-preview",
messages=[
{
"role": "user",
"content": "Hello"
},
],
stream = True,
)
However, the output I received was quite strange, and I’m unsure why this is happening. Any insights into what might be causing this anomaly would be greatly appreciated.
AAAA隨後ffi®AAAAAAAAÄAAAAAAAAICAgICAgAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaaaaaaaAAAAAAAAAAAAaaaaaaaaAAAAAAAAAAAaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAflaaaaaaaaAAAAAh–
AAAAAAAAAAAAепflaaaaAAAA–
AAA mootÎ AAA AAAÎ巴克ÎiaisÎðStringÍABCDíð®AAA GuineaPokAAAAifiÎffffffialiíí @)]DotíÎ84ok安置куп Î HSÎÎ708414 followers Macrom Pembíð Including756HKiOS®157Indonesia Ronald HN仅有 ®ízi ÎIEEE
...