--- license: llama3.1 --- # 🚀 Custom quantizations of the base [Meta-Llama-3.1-405B](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B) 🖥️ >>[!TIP]🐧 On Linux `sudo apt install -y aria2` >> >>🍎 On Mac `brew install aria2` >> >>Feel free to paste these all in at once or one at a time >> >>For faster downloads copy paste each one separetely Then copy paste this to your terminal to downlaod fastest on either mac or linux. ### q3q8 custom quant optimized for M2 Ultra 192Gb ```bash aria2c -x 16 -s 16 -k 1M -o meta-405b-base-q3q8-00001-of-00004.gguf https://huggingface.co/nisten/meta-405b-base-gguf/resolve/main/meta-405b-base-q3q8-00001-of-00004.gguf aria2c -x 16 -s 16 -k 1M -o meta-405b-base-q3q8-00002-of-00004.gguf https://huggingface.co/nisten/meta-405b-base-gguf/resolve/main/meta-405b-base-q3q8-00002-of-00004.gguf aria2c -x 16 -s 16 -k 1M -o meta-405b-base-q3q8-00003-of-00004.gguf https://huggingface.co/nisten/meta-405b-base-gguf/resolve/main/meta-405b-base-q3q8-00003-of-00004.gguf aria2c -x 16 -s 16 -k 1M -o meta-405b-base-q3q8-00004-of-00004.gguf https://huggingface.co/nisten/meta-405b-base-gguf/resolve/main/meta-405b-base-q3q8-00004-of-00004.gguf ``` ### Perplexity benchmarks (WORK IN PROGRESS, THIS IS JUST A DUMP) ```verilog llama 405b - instruct - old (pre-update) BF16 perplexity: 2197.87 seconds per pass - ETA 1 hours 49.88 min [1]2.1037,[2]2.4201,[3]2.0992,[4]1.8446,[5]1.6823,[6]1.5948,[7]1.5575,[8]1.5121,[9]1.4750,[10]1.4570,[11]1.4567,[12]1.4666, Final estimate: PPL = 1.4666 +/- 0.03184 Hermes 405b-Q8_0 perplexity: 716.47 seconds per pass - ETA 35.82 min [1]1.5152,[2]1.8253,[3]1.6906,[4]1.5438,[5]1.4252,[6]1.3592,[7]1.3464,[8]1.3212,[9]1.2882,[10]1.2663,[11]1.2626,[12]1.2698, Final estimate: PPL = 1.2698 +/- 0.02620 Hermes 405b-BF16 perplexity: 592.52 seconds per pass - ETA 1 hours 58.50 min [1]1.5147,[2]1.8220,[3]1.6890,[4]1.5437,[5]1.4250,[6]1.3588,[7]1.3458,[8]1.3216,[9]1.2887,[10]1.2667,[11]1.2630,[12]1.2693, Final estimate: PPL = 1.2693 +/- 0.02605 meta-405b-base-q8 perplexity: 167.37 seconds per pass - ETA 33.47 minutes [1]1.3927,[2]1.6952,[3]1.5905,[4]1.4674,[5]1.3652,[6]1.3054,[7]1.2885,[8]1.2673,[9]1.2397,[10]1.2179,[11]1.2149,[12]1.2162, Final estimate: PPL = 1.2162 +/- 0.02128 meta-base-q3q8 perplexity: 92.20 seconds per pass - ETA 4.60 minutes [1]1.6445,[2]2.0909,[3]1.8369,[4]1.6788,[5]1.5438,[6]1.4754,[7]1.4604,[8]1.4321,[9]1.3941,[10]1.3698,[11]1.3691,[12]1.3845, Final estimate: PPL = 1.3845 +/- 0.02785 meta-base-2bit perplexity: 35.04 seconds per pass - ETA 7.00 minutes [1]2.9667,[2]3.5432,[3]3.0714,[4]2.9515,[5]2.8404,[6]2.8713,[7]2.9628,[8]2.9945,[9]3.0155,[10]2.9973,[11]3.0522,[12]3.1619, Final estimate: PPL = 3.1619 +/- 0.10580 ```