TheBloke commited on
Commit
85afc56
·
1 Parent(s): b926638

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md CHANGED
@@ -1,6 +1,54 @@
1
  ---
2
  license: gpl
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  As a base model used https://huggingface.co/eachadea/vicuna-13b-1.1
5
 
6
  Finetuned on Teknium's GPTeacher dataset, unreleased Roleplay v2 dataset, GPT-4-LLM dataset, and Nous Research Instruct Dataset
 
1
  ---
2
  license: gpl
3
  ---
4
+
5
+ # gpt4-x-vicuna-13B-GPTQ
6
+
7
+ This repo contains 4bit GPTQ format quantised models of [NousResearch's gpt4-x-vicuna-13b](https://huggingface.co/NousResearch/gpt4-x-vicuna-13b).
8
+
9
+ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
10
+
11
+ ## Repositories available
12
+
13
+ * [4bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/gpt4-x-vicuna-13B-GPTQ).
14
+
15
+ ## How to easily download and use this model in text-generation-webui
16
+
17
+ Open the text-generation-webui UI as normal.
18
+
19
+ 1. Click the **Model tab**.
20
+ 2. Under **Download custom model or LoRA**, enter `TheBloke/gpt4-x-vicuna-13B-GPTQ`.
21
+ 3. Click **Download**.
22
+ 4. Wait until it says it's finished downloading.
23
+ 5. Click the **Refresh** icon next to **Model** in the top left.
24
+ 6. In the **Model drop-down**: choose the model you just downloaded, `gpt4-x-vicuna-13B-GPTQ`.
25
+ 7. If you see an error in the bottom right, ignore it - it's temporary.
26
+ 8. Fill out the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = 128`, `model_type = Llama`
27
+ 9. Click **Save settings for this model** in the top right.
28
+ 10. Click **Reload the Model** in the top right.
29
+ 11. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
30
+
31
+ ## Provided files
32
+
33
+ **Compatible file - GPT4-x-Vicuna-13B-GPTQ-4bit-128g.compat.act-order.safetensors**
34
+
35
+ In the `main` branch - the default one - you will find `GPT4-x-Vicuna-13B-GPTQ-4bit-128g.compat.act-order.safetensors`
36
+
37
+ This will work with all versions of GPTQ-for-LLaMa. It has maximum compatibility
38
+
39
+ It was created without the `--act-order` parameter. It may have slightly lower inference quality compared to the other file, but is guaranteed to work on all versions of GPTQ-for-LLaMa and text-generation-webui.
40
+
41
+ * `GPT4-x-Vicuna-13B-GPTQ-4bit-128g.compat.act-order.safetensors`
42
+ * Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
43
+ * Works with text-generation-webui one-click-installers
44
+ * Parameters: Groupsize = 128g. No act-order.
45
+ * Command used to create the GPTQ:
46
+ ```
47
+ CUDA_VISIBLE_DEVICES=0 python3 llama.py GPT4All-13B-snoozy c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors GPT4-x-Vicuna-13B-GPTQ-4bit-128g.compat.act-order.safetensors
48
+ ```
49
+
50
+ # Original model card
51
+
52
  As a base model used https://huggingface.co/eachadea/vicuna-13b-1.1
53
 
54
  Finetuned on Teknium's GPTeacher dataset, unreleased Roleplay v2 dataset, GPT-4-LLM dataset, and Nous Research Instruct Dataset