Spaces:

yusufs
/

vllm-inference

Paused

App Files Files

vllm-inference / run-llama.sh

Commit History

fix(runner.sh): --enforce-eager not support values

cb15911

yusufs commited on 13 days ago

fix(runner.sh): explicitly disabling enforce_eager

266e7dd

yusufs commited on 13 days ago

fix(runner.sh): disable eager-loading so it using cuda graph (in order for parallel and faster processing)

6bb48e9

yusufs commited on 13 days ago

feat(seed): Random seed for reproducibility.

d4b0956

yusufs commited on Dec 26, 2024

feat(quantization): T4 not support bfloat16

0345d26

yusufs commited on Nov 29, 2024

feat(llama3.2): run llama3.2 using bfloat16 with cache dtype fp8 with same model len

38d356a

yusufs commited on Nov 29, 2024

feat(llama3.2): using Llama-3.2-3B-Instruct 0cb88a4f764b7a12671c53f0838cd831a0843b95

8b37c20

yusufs commited on Nov 29, 2024

feat(add-model): always download model during build, it will be cached in the consecutive builds

8679a35

yusufs commited on Nov 27, 2024