engine

#3
by ehartford - opened

I try this on:

  • vllm
  • sglang
  • tgi

None of them work

How did you run this model, could you please give a walkthrough?

Open Platform for Enterprise AI org
edited 4 days ago

I have updated the readme to run this model on CPU.
Could you attach more logs for vllms, the model in main branch is a typic autogptq format, which should be supported by these backends. In transformers, we found it may need 8X80G for this model, though the model is only ~330G.

We will test some of other models we exported to check whether it's the issue of exporting.

I launched VLLM (8 x H100 SXM) using the following command to start the server:

vllm serve --model OPEA/DeepSeek-V3-int4-sym-gptq-inc-preview --tensor-parallel-size 8 --max-model-len 16384 --max_num_seqs 1 --trust-remote-code

Then, I made a simple “Hello” request like this:

chat_completion = client.chat.completions.create(
    model="OPEA/DeepSeek-V3-int4-sym-gptq-inc-preview",
    messages=[
        {
            "role": "user",
            "content": "Hello"
        },
    
    ],
    stream = True,
)

However, the output I received was quite strange, and I’m unsure why this is happening. Any insights into what might be causing this anomaly would be greatly appreciated.

AAAA隨後ffi®AAAAAAAAÄAAAAAAAAICAgICAgAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaaaaaaaAAAAAAAAAAAAaaaaaaaaAAAAAAAAAAAaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAflaaaaaaaaAAAAAh–
AAAAAAAAAAAAепflaaaaAAAA–
 AAA mootÎ AAA AAAÎ巴克ÎiaisÎðStringÍABCDíð®AAA GuineaPokAAAAifiÎffffffialiíí @)]DotíÎ84ok安置куп Î HSÎÎ708414 followers Macrom Pembíð Including756HKiOS®157Indonesia Ronald HN仅有 ®ízi ÎIEEE
...

Sign up or log in to comment