InferenceIllusionist
/

WizardLM-2-8x22B-iMat-GGUF

Inference Endpoints

Model card Files Files and versions Community

InferenceIllusionist commited on Apr 26, 2024

Commit

7ba170d

·

verified ·

1 Parent(s): c5ec214

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -10,8 +10,9 @@ tags:
 # Wizard-LM-2-8x22-iMat-GGUF
-Quantized from fp32 with love.
-* Weighted quantizations created with .imatrix file calculated in 105 chunks with n_ctx=512 using groups_merged.txt
 For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)

 # Wizard-LM-2-8x22-iMat-GGUF
+Quantized from fp32 with love. If you're using the latest version of llama.cpp you should no longer need to combine files before loading.
+* Weighted quantizations created using [imatrix file](https://huggingface.co/jukofyork/WizardLM-2-8x22B-imatrix) provided by jukofyork
+* Calculated in 105 chunks with n_ctx=512 using groups_merged.txt
 For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)