OPEA/DeepSeek-V3-int4-sym-gptq-inc

ehartford

4 days ago

I try this on:

vllm
sglang
tgi

None of them work

How did you run this model, could you please give a walkthrough?

cicdatopea

Open Platform for Enterprise AI org 4 days ago

•

edited 4 days ago

I have updated the readme to run this model on CPU.
Could you attach more logs for vllms, the model in main branch is a typic autogptq format, which should be supported by these backends. In transformers, we found it may need 8X80G for this model, though the model is only ~330G.

We will test some of other models we exported to check whether it's the issue of exporting.

NikolaSigmoid

4 days ago

I launched VLLM (8 x H100 SXM) using the following command to start the server:

vllm serve --model OPEA/DeepSeek-V3-int4-sym-gptq-inc-preview --tensor-parallel-size 8 --max-model-len 16384 --max_num_seqs 1 --trust-remote-code

Then, I made a simple “Hello” request like this:

chat_completion = client.chat.completions.create(
    model="OPEA/DeepSeek-V3-int4-sym-gptq-inc-preview",
    messages=[
        {
            "role": "user",
            "content": "Hello"
        },
    
    ],
    stream = True,
)

However, the output I received was quite strange, and I’m unsure why this is happening. Any insights into what might be causing this anomaly would be greatly appreciated.

AAAA隨後ﬃ®AAAAAAAAÄAAAAAAAAICAgICAgAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAaaaaaaaaAAAAAAAAAAAAaaaaaaaaAAAAAAAAAAAaaaaaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAﬂaaaaaaaaAAAAAh–
AAAAAAAAAAAAепﬂaaaaAAAA–
 AAA mootÎ AAA AAAÎ巴克ÎiaisÎðStringÍABCDíð®AAA GuineaPokAAAAifiÎffffffialiíí @)]DotíÎ84ok安置куп Î HSÎÎ708414 followers Macrom Pembíð Including756HKiOS®157Indonesia Ronald HN仅有 ®ízi ÎIEEE
...

OPEA
/

DeepSeek-V3-int4-sym-gptq-inc

engine