Every response start with <|start_header_id|>assistant<|end_header_id|>

#2
by notadib - opened

vllm parameters:

vllm serve cortecs/Llama-3.3-70B-Instruct-FP8-Dynamic --max-model-len 32000 --max_num_batched_tokens 32000 -tp 2 --max_num_seqs 256 --gpu-memory-utilization 0.95 --tokenizer-pool-size 4 --num_scheduler_steps 16 --max_logprobs 20 

Sign up or log in to comment