hgloow
/

Sakura-SOLAR-Instruct-EXL2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hgloow commited on Jan 1, 2024

Commit

d23fbb2

·

1 Parent(s): f2ae149

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -18,11 +18,10 @@ tags:
   - [VAGOsolutions/SauerkrautLM-SOLAR-Instruct](https://huggingface.co/VAGOsolutions/SauerkrautLM-SOLAR-Instruct)
   - [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0)
-## EXL2 Quants
 Measured using ExLlamaV2 and 4096 max_seq_len with [Oobabooga's Text Generation WebUI](https://github.com/oobabooga/text-generation-webui/tree/main).
 Use [TheBloke's 4bit-32g quants](https://huggingface.co/TheBloke/Sakura-SOLAR-Instruct-GPTQ/tree/gptq-4bit-32g-actorder_True) (7.4GB VRAM usage) if you have 8GB cards.
-### Quantizations
 | Branch | BPW | Folder Size | VRAM Usage | Description |
 | ------ | --- | ----------- | ---------- | ----------- |
 [3.0bpw](https://huggingface.co/hgloow/Sakura-SOLAR-Instruct-EXL2/tree/3.0bpw)|3.0BPW|4.01GB|5.1 GB|For >=6GB VRAM cards with idle VRAM atleast or below 500MB (headroom for other things)
@@ -31,7 +30,7 @@ Use [TheBloke's 4bit-32g quants](https://huggingface.co/TheBloke/Sakura-SOLAR-In
 [7.0bpw](https://huggingface.co/hgloow/Sakura-SOLAR-Instruct-EXL2/tree/7.0bpw)|7.0BPW|8.89GB|10.2 GB|For >=11GB VRAM cards with idle VRAM atleast or below 500MB (headroom for other things)
 [8.0bpw](https://huggingface.co/hgloow/Sakura-SOLAR-Instruct-EXL2/tree/8.0bpw)|8.0BPW|10.1GB|11.3 GB|For >=12GB VRAM cards with idle VRAM atleast or below 500MB (headroom for other things)
-### Zipped Quantizations (if you want to download a single file, smaller to download)
 | Branch | File Size |
 | ------ | --------- |
 [3.0bpw-zip](https://huggingface.co/hgloow/Sakura-SOLAR-Instruct-EXL2/tree/3.0bpw-zip)|3.72GB

   - [VAGOsolutions/SauerkrautLM-SOLAR-Instruct](https://huggingface.co/VAGOsolutions/SauerkrautLM-SOLAR-Instruct)
   - [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0)
+## Quantizations
 Measured using ExLlamaV2 and 4096 max_seq_len with [Oobabooga's Text Generation WebUI](https://github.com/oobabooga/text-generation-webui/tree/main).
 Use [TheBloke's 4bit-32g quants](https://huggingface.co/TheBloke/Sakura-SOLAR-Instruct-GPTQ/tree/gptq-4bit-32g-actorder_True) (7.4GB VRAM usage) if you have 8GB cards.
 | Branch | BPW | Folder Size | VRAM Usage | Description |
 | ------ | --- | ----------- | ---------- | ----------- |
 [3.0bpw](https://huggingface.co/hgloow/Sakura-SOLAR-Instruct-EXL2/tree/3.0bpw)|3.0BPW|4.01GB|5.1 GB|For >=6GB VRAM cards with idle VRAM atleast or below 500MB (headroom for other things)
 [7.0bpw](https://huggingface.co/hgloow/Sakura-SOLAR-Instruct-EXL2/tree/7.0bpw)|7.0BPW|8.89GB|10.2 GB|For >=11GB VRAM cards with idle VRAM atleast or below 500MB (headroom for other things)
 [8.0bpw](https://huggingface.co/hgloow/Sakura-SOLAR-Instruct-EXL2/tree/8.0bpw)|8.0BPW|10.1GB|11.3 GB|For >=12GB VRAM cards with idle VRAM atleast or below 500MB (headroom for other things)
+## Zipped Quantizations (if you want to download a single file, smaller to download)
 | Branch | File Size |
 | ------ | --------- |
 [3.0bpw-zip](https://huggingface.co/hgloow/Sakura-SOLAR-Instruct-EXL2/tree/3.0bpw-zip)|3.72GB