vllm-inference / run-sailor.sh

Commit History

feat(seed): Random seed for reproducibility.
d4b0956

yusufs commited on

docs(sailor): add not about minimum resources of sailor
6dac0d0

yusufs commited on

feat(sailorchat): using sailor chat model
0f3cd25

yusufs commited on

feat(quantization): T4 not support bfloat16
0345d26

yusufs commited on

feat(llama3.2): run llama3.2 using bfloat16 with cache dtype fp8 with same model len
38d356a

yusufs commited on

feat(sail/Sailor-4B-Chat): try increase gpu-memory-utilization to 0.9 before changing the token length
4a9e328

yusufs commited on

feat(llama3.2): using Llama-3.2-3B-Instruct 0cb88a4f764b7a12671c53f0838cd831a0843b95
8b37c20

yusufs commited on

feat(add-model): always download model during build, it will be cached in the consecutive builds
8679a35

yusufs commited on