InferenceIllusionist
/

WizardLM-2-8x22B-iMat-GGUF

Inference Endpoints

Model card Files Files and versions Community

InferenceIllusionist commited on Apr 25, 2024

Commit

d4717af

·

verified ·

1 Parent(s): f33141d

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -12,6 +12,7 @@ tags:
 Quantized from fp32 with love. If you're on the latest release of llama.cpp you should no longer need to combine files before loading
 * Weighted quantizations created using Wizard-LM-2-8x22 [imatrix file](https://huggingface.co/jukofyork/WizardLM-2-8x22B-imatrix) provided by jukofyork
 For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)

 Quantized from fp32 with love. If you're on the latest release of llama.cpp you should no longer need to combine files before loading
 * Weighted quantizations created using Wizard-LM-2-8x22 [imatrix file](https://huggingface.co/jukofyork/WizardLM-2-8x22B-imatrix) provided by jukofyork
+* Calculated in 105 chunks with n_ctx=512 using groups_merged.txt
 For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)