ibrahimkettaneh
/

QwQ-32B-Preview-abliterated-4.5bpw-h8-exl2

@@ -27,6 +27,35 @@ Source: [🐺🐦‍⬛ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) throug
 Credits go to for their helpful and informative benchmark: [Wolfram Ravenwolf](https://huggingface.co/wolfram)
 # Context
 This is an uncensored version of [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).

 Credits go to for their helpful and informative benchmark: [Wolfram Ravenwolf](https://huggingface.co/wolfram)
+# Recommendation for Best Performance
+To increase performance, increase the max new output when running inference from the default to 16384 tokens.
+## Detailed Table
+| Duration | Total | % | TIGER-Lab | Correct Random Guesses | Prompt tokens | tk/s | Completion tokens | tk/s |
+|----------|--------|---|-----------|----------------------|----------------|-------|-------------------|-------|
+| QwQ-32B-Preview (8.0bpw EXL2, max_tokens=16384) | bartowski/QwQ-32B-Preview-exl2_8_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 38436MiB | 1/2 | 2h 3m 30s | 325/410 | 79.27% |  | 0/2, 0.00% | 656716 | 88.58 | 327825 | 44.22 |
+| QwQ-32B-Preview (8.0bpw EXL2, max_tokens=16384) | bartowski/QwQ-32B-Preview-exl2_8_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 38436MiB | 2/2 | 2h 3m 35s | 324/410 | 79.02% |  |  | 656716 | 88.52 | 343440 | 46.29 |
+| QwQ-32B-Preview (4.25bpw EXL2, max_tokens=16384) | bartowski/QwQ-32B-Preview-exl2_4_25 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 27636MiB | 1/2 | 1h 56m 8s | 319/410 | 77.80% |  | 0/1, 0.00% | 656716 | 94.20 | 374973 | 53.79 |
+| QwQ-32B-Preview (4.25bpw EXL2, max_tokens=16384) | bartowski/QwQ-32B-Preview-exl2_4_25 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 27636MiB | 2/2 | 1h 55m 44s | 318/410 | 77.56% |  |  | 656716 | 94.45 | 377638 | 54.31 |
+| QwQ-32B-Preview (8.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_8_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 38528MiB | 1/4 | 1h 29m 49s | 324/410 | 79.02% |  | 0/1, 0.00% | 656716 | 121.70 | 229008 | 42.44 |
+| QwQ-32B-Preview (8.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_8_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 38528MiB | 2/4 | 1h 32m 30s | 314/410 | 76.59% |  | 0/2, 0.00% | 656716 | 118.24 | 239161 | 43.06 |
+| QwQ-32B-Preview (8.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_8_0 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 37000MiB | 3/4 | 2h 25m 24s | 308/410 | 75.12% |  | 0/2, 0.00% | 656716 | 75.23 | 232208 | 26.60 |
+| QwQ-32B-Preview (8.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_8_0 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 37000MiB | 4/4 | 2h 27m 27s | 305/410 | 74.39% |  | 0/3, 0.00% | 656716 | 74.19 | 235650 | 26.62 |
+| QwQ-32B-Preview-abliterated (4.5bpw EXL2, max_tokens=16384) | ibrahimkettaneh_QwQ-32B-Preview-abliterated-4.5bpw-h8-exl2 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 28556MiB | 1/2 | 2h 10m 53s | 310/410 | 75.61% |  |  | 656716 | 83.59 | 412512 | 52.51 |
+| QwQ-32B-Preview-abliterated (4.5bpw EXL2, max_tokens=16384) | ibrahimkettaneh_QwQ-32B-Preview-abliterated-4.5bpw-h8-exl2 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 28556MiB | 2/2 | 2h 25m 29s | 310/410 | 75.61% |  |  | 656716 | 75.20 | 478590 | 54.80 |
+| QwQ-32B-Preview (4.25bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_4_25 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 26198MiB | 1/4 | 1h 39m 49s | 308/410 | 75.12% |  | 0/1, 0.00% | 656716 | 109.59 | 243552 | 40.64 |
+| QwQ-32B-Preview (4.25bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_4_25 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 27750MiB | 2/4 | 1h 22m 12s | 304/410 | 74.15% |  |  | 656716 | 133.04 | 247314 | 50.10 |
+| QwQ-32B-Preview (4.25bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_4_25 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 27750MiB | 3/4 | 1h 21m 39s | 296/410 | 72.20% |  |  | 656716 | 133.94 | 246020 | 50.18 |
+| QwQ-32B-Preview (4.25bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_4_25 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 26198MiB | 4/4 | 1h 42m 33s | 294/410 | 71.71% |  |  | 656716 | 106.63 | 250222 | 40.63 |
+| QwQ-32B-Preview (3.0bpw EXL2, max_tokens=8192) | bartowski/QwQ-32B-Preview-exl2_3_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 22990MiB | 1/2 | 1h 15m 18s | 289/410 | 70.49% |  |  | 656716 | 145.23 | 269937 | 59.69 |
+| QwQ-32B-Preview (3.0bpw EXL2, max_tokens=8192) | bartowski/QwQ-32B-Preview-exl2_3_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 22990MiB | 2/2 | 1h 19m 50s | 274/410 | 66.83% |  | 0/2, 0.00% | 656716 | 137.01 | 291818 | 60.88 |
+| QwQ-32B-Preview (3.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_3_0 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 21574MiB | 1/2 | 1h 5m 30s | 268/410 | 65.37% |  | 1/3, 33.33% | 656716 | 166.95 | 205218 | 52.17 |
+| QwQ-32B-Preview (3.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_3_0 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 21574MiB | 2/2 | 1h 8m 44s | 266/410 | 64.88% |  |  | 656716 | 159.10 | 215616 | 52.24 |
+For more context, details, and comparisons, you can refer to [the original article by Ravenwolf](https://huggingface.co/blog/wolfram/llm-comparison-test-2024-12-04).
 # Context
 This is an uncensored version of [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).