Every response start with <|start_header_id|>assistant<|end_header_id|>

by notadib - opened about 8 hours ago

about 8 hours ago

vllm parameters:

vllm serve cortecs/Llama-3.3-70B-Instruct-FP8-Dynamic --max-model-len 32000 --max_num_batched_tokens 32000 -tp 2 --max_num_seqs 256 --gpu-memory-utilization 0.95 --tokenizer-pool-size 4 --num_scheduler_steps 16 --max_logprobs 20

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment