bowenbaoamd commited on
Commit
d544aec
·
verified ·
1 Parent(s): 4fba5f5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -24,7 +24,8 @@ python3 quantize_quark.py \
24
  --kv_cache_dtype fp8 \
25
  --num_calib_data 128 \
26
  --model_export quark_safetensors \
27
- --no_weight_matrix_merge
 
28
  # If model size is too large for single GPU, please use multi GPU instead.
29
  python3 quantize_quark.py \
30
  --model_dir $MODEL_DIR \
@@ -34,7 +35,8 @@ python3 quantize_quark.py \
34
  --num_calib_data 128 \
35
  --model_export quark_safetensors \
36
  --no_weight_matrix_merge \
37
- --multi_gpu
 
38
  ```
39
  ## Deployment
40
  Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).
 
24
  --kv_cache_dtype fp8 \
25
  --num_calib_data 128 \
26
  --model_export quark_safetensors \
27
+ --no_weight_matrix_merge \
28
+ --custom_mode fp8
29
  # If model size is too large for single GPU, please use multi GPU instead.
30
  python3 quantize_quark.py \
31
  --model_dir $MODEL_DIR \
 
35
  --num_calib_data 128 \
36
  --model_export quark_safetensors \
37
  --no_weight_matrix_merge \
38
+ --multi_gpu \
39
+ --custom_mode fp8
40
  ```
41
  ## Deployment
42
  Quark has its own export format and allows FP8 quantized models to be efficiently deployed using the vLLM backend(vLLM-compatible).