neuralmagic
/

TinyLlama-1.1B-Chat-v1.0-marlin

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

robertgshaw2 commited on Mar 6, 2024

Commit

bd74ab9

·

verified ·

1 Parent(s): 8680e42

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -59,9 +59,8 @@ Instructions:
 ```
 ## Quantization
-For details on how this model was quantized and converted to marlin format, see the `quantization/apply_gptq_save_marlin.py` script in the model card.
-Run the following
 ```bash
 pip install -r quantization/requirements.txt
 CUDA_VISIBLE_DEVICES=0 python3 quantization/apply_gptq_save_marlin.py --model-id TinyLlama/TinyLlama-1.1B-Chat-v1.0 --save-dir ./tinyllama-marlin

 ```
 ## Quantization
+For details on how this model was quantized and converted to marlin format, run the `quantization/apply_gptq_save_marlin.py` script:
 ```bash
 pip install -r quantization/requirements.txt
 CUDA_VISIBLE_DEVICES=0 python3 quantization/apply_gptq_save_marlin.py --model-id TinyLlama/TinyLlama-1.1B-Chat-v1.0 --save-dir ./tinyllama-marlin