vllm parameters:
vllm serve cortecs/Llama-3.3-70B-Instruct-FP8-Dynamic --max-model-len 32000 --max_num_batched_tokens 32000 -tp 2 --max_num_seqs 256 --gpu-memory-utilization 0.95 --tokenizer-pool-size 4 --num_scheduler_steps 16 --max_logprobs 20
· Sign up or log in to comment