ibrahimkettaneh
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -27,6 +27,35 @@ Source: [🐺🐦⬛ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) throug
|
|
27 |
|
28 |
Credits go to for their helpful and informative benchmark: [Wolfram Ravenwolf](https://huggingface.co/wolfram)
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
# Context
|
31 |
|
32 |
This is an uncensored version of [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
|
|
|
27 |
|
28 |
Credits go to for their helpful and informative benchmark: [Wolfram Ravenwolf](https://huggingface.co/wolfram)
|
29 |
|
30 |
+
# Recommendation for Best Performance
|
31 |
+
|
32 |
+
To increase performance, increase the max new output when running inference from the default to 16384 tokens.
|
33 |
+
|
34 |
+
## Detailed Table
|
35 |
+
|
36 |
+
| Duration | Total | % | TIGER-Lab | Correct Random Guesses | Prompt tokens | tk/s | Completion tokens | tk/s |
|
37 |
+
|----------|--------|---|-----------|----------------------|----------------|-------|-------------------|-------|
|
38 |
+
| QwQ-32B-Preview (8.0bpw EXL2, max_tokens=16384) | bartowski/QwQ-32B-Preview-exl2_8_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 38436MiB | 1/2 | 2h 3m 30s | 325/410 | 79.27% | | 0/2, 0.00% | 656716 | 88.58 | 327825 | 44.22 |
|
39 |
+
| QwQ-32B-Preview (8.0bpw EXL2, max_tokens=16384) | bartowski/QwQ-32B-Preview-exl2_8_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 38436MiB | 2/2 | 2h 3m 35s | 324/410 | 79.02% | | | 656716 | 88.52 | 343440 | 46.29 |
|
40 |
+
| QwQ-32B-Preview (4.25bpw EXL2, max_tokens=16384) | bartowski/QwQ-32B-Preview-exl2_4_25 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 27636MiB | 1/2 | 1h 56m 8s | 319/410 | 77.80% | | 0/1, 0.00% | 656716 | 94.20 | 374973 | 53.79 |
|
41 |
+
| QwQ-32B-Preview (4.25bpw EXL2, max_tokens=16384) | bartowski/QwQ-32B-Preview-exl2_4_25 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 27636MiB | 2/2 | 1h 55m 44s | 318/410 | 77.56% | | | 656716 | 94.45 | 377638 | 54.31 |
|
42 |
+
| QwQ-32B-Preview (8.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_8_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 38528MiB | 1/4 | 1h 29m 49s | 324/410 | 79.02% | | 0/1, 0.00% | 656716 | 121.70 | 229008 | 42.44 |
|
43 |
+
| QwQ-32B-Preview (8.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_8_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 38528MiB | 2/4 | 1h 32m 30s | 314/410 | 76.59% | | 0/2, 0.00% | 656716 | 118.24 | 239161 | 43.06 |
|
44 |
+
| QwQ-32B-Preview (8.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_8_0 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 37000MiB | 3/4 | 2h 25m 24s | 308/410 | 75.12% | | 0/2, 0.00% | 656716 | 75.23 | 232208 | 26.60 |
|
45 |
+
| QwQ-32B-Preview (8.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_8_0 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 37000MiB | 4/4 | 2h 27m 27s | 305/410 | 74.39% | | 0/3, 0.00% | 656716 | 74.19 | 235650 | 26.62 |
|
46 |
+
| QwQ-32B-Preview-abliterated (4.5bpw EXL2, max_tokens=16384) | ibrahimkettaneh_QwQ-32B-Preview-abliterated-4.5bpw-h8-exl2 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 28556MiB | 1/2 | 2h 10m 53s | 310/410 | 75.61% | | | 656716 | 83.59 | 412512 | 52.51 |
|
47 |
+
| QwQ-32B-Preview-abliterated (4.5bpw EXL2, max_tokens=16384) | ibrahimkettaneh_QwQ-32B-Preview-abliterated-4.5bpw-h8-exl2 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 28556MiB | 2/2 | 2h 25m 29s | 310/410 | 75.61% | | | 656716 | 75.20 | 478590 | 54.80 |
|
48 |
+
| QwQ-32B-Preview (4.25bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_4_25 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 26198MiB | 1/4 | 1h 39m 49s | 308/410 | 75.12% | | 0/1, 0.00% | 656716 | 109.59 | 243552 | 40.64 |
|
49 |
+
| QwQ-32B-Preview (4.25bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_4_25 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 27750MiB | 2/4 | 1h 22m 12s | 304/410 | 74.15% | | | 656716 | 133.04 | 247314 | 50.10 |
|
50 |
+
| QwQ-32B-Preview (4.25bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_4_25 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 27750MiB | 3/4 | 1h 21m 39s | 296/410 | 72.20% | | | 656716 | 133.94 | 246020 | 50.18 |
|
51 |
+
| QwQ-32B-Preview (4.25bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_4_25 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 26198MiB | 4/4 | 1h 42m 33s | 294/410 | 71.71% | | | 656716 | 106.63 | 250222 | 40.63 |
|
52 |
+
| QwQ-32B-Preview (3.0bpw EXL2, max_tokens=8192) | bartowski/QwQ-32B-Preview-exl2_3_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 22990MiB | 1/2 | 1h 15m 18s | 289/410 | 70.49% | | | 656716 | 145.23 | 269937 | 59.69 |
|
53 |
+
| QwQ-32B-Preview (3.0bpw EXL2, max_tokens=8192) | bartowski/QwQ-32B-Preview-exl2_3_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 22990MiB | 2/2 | 1h 19m 50s | 274/410 | 66.83% | | 0/2, 0.00% | 656716 | 137.01 | 291818 | 60.88 |
|
54 |
+
| QwQ-32B-Preview (3.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_3_0 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 21574MiB | 1/2 | 1h 5m 30s | 268/410 | 65.37% | | 1/3, 33.33% | 656716 | 166.95 | 205218 | 52.17 |
|
55 |
+
| QwQ-32B-Preview (3.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_3_0 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 21574MiB | 2/2 | 1h 8m 44s | 266/410 | 64.88% | | | 656716 | 159.10 | 215616 | 52.24 |
|
56 |
+
|
57 |
+
For more context, details, and comparisons, you can refer to [the original article by Ravenwolf](https://huggingface.co/blog/wolfram/llm-comparison-test-2024-12-04).
|
58 |
+
|
59 |
# Context
|
60 |
|
61 |
This is an uncensored version of [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
|