Qwen/QwQ-32B-Preview · Use sample code to start error reporting

28 days ago

cheng@cheng:~$ python qwq.py
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 17/17 [00:10<00:00, 1.61it/s]
Traceback (most recent call last):
File "/home/cheng/qwq.py", line 24, in
generated_ids = model.generate(
^^^^^^^^^^^^^^^
File "/home/cheng/anaconda3/envs/vllm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/cheng/anaconda3/envs/vllm/lib/python3.11/site-packages/transformers/generation/utils.py", line 2252, in generate
result = self._sample(
^^^^^^^^^^^^^
File "/home/cheng/anaconda3/envs/vllm/lib/python3.11/site-packages/transformers/generation/utils.py", line 3297, in _sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

../aten/src/ATen/native/cuda/TensorCompare.cu:110: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion probability tensor contains either inf, nan or element < 0 failed.

SongXiaoMao

19 days ago

I've redownloaded the HF model and updated vllm to the latest version, 0.6.5, and it's working perfectly now. Thank you!
Although this model is experimental, the experience is undoubtedly the best in open-source currently, even surpassing closed-source options. Thank you to Alibaba Cloud for open-sourcing!

SongXiaoMao changed discussion status to closed 19 days ago