ibrahimkettaneh commited on
Commit
5d73b57
·
verified ·
1 Parent(s): 62cc1f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -27,6 +27,35 @@ Source: [🐺🐦‍⬛ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) throug
27
 
28
  Credits go to for their helpful and informative benchmark: [Wolfram Ravenwolf](https://huggingface.co/wolfram)
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  # Context
31
 
32
  This is an uncensored version of [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
 
27
 
28
  Credits go to for their helpful and informative benchmark: [Wolfram Ravenwolf](https://huggingface.co/wolfram)
29
 
30
+ # Recommendation for Best Performance
31
+
32
+ To increase performance, increase the max new output when running inference from the default to 16384 tokens.
33
+
34
+ ## Detailed Table
35
+
36
+ | Duration | Total | % | TIGER-Lab | Correct Random Guesses | Prompt tokens | tk/s | Completion tokens | tk/s |
37
+ |----------|--------|---|-----------|----------------------|----------------|-------|-------------------|-------|
38
+ | QwQ-32B-Preview (8.0bpw EXL2, max_tokens=16384) | bartowski/QwQ-32B-Preview-exl2_8_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 38436MiB | 1/2 | 2h 3m 30s | 325/410 | 79.27% | | 0/2, 0.00% | 656716 | 88.58 | 327825 | 44.22 |
39
+ | QwQ-32B-Preview (8.0bpw EXL2, max_tokens=16384) | bartowski/QwQ-32B-Preview-exl2_8_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 38436MiB | 2/2 | 2h 3m 35s | 324/410 | 79.02% | | | 656716 | 88.52 | 343440 | 46.29 |
40
+ | QwQ-32B-Preview (4.25bpw EXL2, max_tokens=16384) | bartowski/QwQ-32B-Preview-exl2_4_25 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 27636MiB | 1/2 | 1h 56m 8s | 319/410 | 77.80% | | 0/1, 0.00% | 656716 | 94.20 | 374973 | 53.79 |
41
+ | QwQ-32B-Preview (4.25bpw EXL2, max_tokens=16384) | bartowski/QwQ-32B-Preview-exl2_4_25 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 27636MiB | 2/2 | 1h 55m 44s | 318/410 | 77.56% | | | 656716 | 94.45 | 377638 | 54.31 |
42
+ | QwQ-32B-Preview (8.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_8_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 38528MiB | 1/4 | 1h 29m 49s | 324/410 | 79.02% | | 0/1, 0.00% | 656716 | 121.70 | 229008 | 42.44 |
43
+ | QwQ-32B-Preview (8.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_8_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 38528MiB | 2/4 | 1h 32m 30s | 314/410 | 76.59% | | 0/2, 0.00% | 656716 | 118.24 | 239161 | 43.06 |
44
+ | QwQ-32B-Preview (8.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_8_0 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 37000MiB | 3/4 | 2h 25m 24s | 308/410 | 75.12% | | 0/2, 0.00% | 656716 | 75.23 | 232208 | 26.60 |
45
+ | QwQ-32B-Preview (8.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_8_0 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 37000MiB | 4/4 | 2h 27m 27s | 305/410 | 74.39% | | 0/3, 0.00% | 656716 | 74.19 | 235650 | 26.62 |
46
+ | QwQ-32B-Preview-abliterated (4.5bpw EXL2, max_tokens=16384) | ibrahimkettaneh_QwQ-32B-Preview-abliterated-4.5bpw-h8-exl2 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 28556MiB | 1/2 | 2h 10m 53s | 310/410 | 75.61% | | | 656716 | 83.59 | 412512 | 52.51 |
47
+ | QwQ-32B-Preview-abliterated (4.5bpw EXL2, max_tokens=16384) | ibrahimkettaneh_QwQ-32B-Preview-abliterated-4.5bpw-h8-exl2 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 28556MiB | 2/2 | 2h 25m 29s | 310/410 | 75.61% | | | 656716 | 75.20 | 478590 | 54.80 |
48
+ | QwQ-32B-Preview (4.25bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_4_25 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 26198MiB | 1/4 | 1h 39m 49s | 308/410 | 75.12% | | 0/1, 0.00% | 656716 | 109.59 | 243552 | 40.64 |
49
+ | QwQ-32B-Preview (4.25bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_4_25 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 27750MiB | 2/4 | 1h 22m 12s | 304/410 | 74.15% | | | 656716 | 133.04 | 247314 | 50.10 |
50
+ | QwQ-32B-Preview (4.25bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_4_25 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 27750MiB | 3/4 | 1h 21m 39s | 296/410 | 72.20% | | | 656716 | 133.94 | 246020 | 50.18 |
51
+ | QwQ-32B-Preview (4.25bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_4_25 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 26198MiB | 4/4 | 1h 42m 33s | 294/410 | 71.71% | | | 656716 | 106.63 | 250222 | 40.63 |
52
+ | QwQ-32B-Preview (3.0bpw EXL2, max_tokens=8192) | bartowski/QwQ-32B-Preview-exl2_3_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 22990MiB | 1/2 | 1h 15m 18s | 289/410 | 70.49% | | | 656716 | 145.23 | 269937 | 59.69 |
53
+ | QwQ-32B-Preview (3.0bpw EXL2, max_tokens=8192) | bartowski/QwQ-32B-Preview-exl2_3_0 | Qwen/Qwen2.5-Coder-0.5B-Instruct | 32B | EXL2 | TabbyAPI | RTX 6000 | 22990MiB | 2/2 | 1h 19m 50s | 274/410 | 66.83% | | 0/2, 0.00% | 656716 | 137.01 | 291818 | 60.88 |
54
+ | QwQ-32B-Preview (3.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_3_0 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 21574MiB | 1/2 | 1h 5m 30s | 268/410 | 65.37% | | 1/3, 33.33% | 656716 | 166.95 | 205218 | 52.17 |
55
+ | QwQ-32B-Preview (3.0bpw EXL2) | bartowski/QwQ-32B-Preview-exl2_3_0 | - | 32B | EXL2 | TabbyAPI | RTX 6000 | 21574MiB | 2/2 | 1h 8m 44s | 266/410 | 64.88% | | | 656716 | 159.10 | 215616 | 52.24 |
56
+
57
+ For more context, details, and comparisons, you can refer to [the original article by Ravenwolf](https://huggingface.co/blog/wolfram/llm-comparison-test-2024-12-04).
58
+
59
  # Context
60
 
61
  This is an uncensored version of [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).