andrijdavid
commited on
Commit
·
feeda79
1
Parent(s):
7450d23
Upload folder using huggingface_hub
Browse files- .gitattributes +18 -0
- README.md +6 -6
.gitattributes
CHANGED
@@ -37,3 +37,21 @@ tinyfrank-f16.gguf filter=lfs diff=lfs merge=lfs -text
|
|
37 |
tinyfrank-q2L.gguf filter=lfs diff=lfs merge=lfs -text
|
38 |
tinyfrank-q4L.gguf filter=lfs diff=lfs merge=lfs -text
|
39 |
tinyfrank-q6L.gguf filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
tinyfrank-q2L.gguf filter=lfs diff=lfs merge=lfs -text
|
38 |
tinyfrank-q4L.gguf filter=lfs diff=lfs merge=lfs -text
|
39 |
tinyfrank-q6L.gguf filter=lfs diff=lfs merge=lfs -text
|
40 |
+
tinyfrank-1.4B-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
|
41 |
+
tinyfrank-1.4B-Q3_K.gguf filter=lfs diff=lfs merge=lfs -text
|
42 |
+
tinyfrank-1.4B-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
|
43 |
+
tinyfrank-1.4B-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
44 |
+
tinyfrank-1.4B-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
45 |
+
tinyfrank-1.4B-Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
|
46 |
+
tinyfrank-1.4B-Q4_1.gguf filter=lfs diff=lfs merge=lfs -text
|
47 |
+
tinyfrank-1.4B-Q4_K.gguf filter=lfs diff=lfs merge=lfs -text
|
48 |
+
tinyfrank-1.4B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
49 |
+
tinyfrank-1.4B-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
50 |
+
tinyfrank-1.4B-Q5_0.gguf filter=lfs diff=lfs merge=lfs -text
|
51 |
+
tinyfrank-1.4B-Q5_1.gguf filter=lfs diff=lfs merge=lfs -text
|
52 |
+
tinyfrank-1.4B-Q5_K.gguf filter=lfs diff=lfs merge=lfs -text
|
53 |
+
tinyfrank-1.4B-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
54 |
+
tinyfrank-1.4B-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
55 |
+
tinyfrank-1.4B-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
|
56 |
+
tinyfrank-1.4B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
57 |
+
tinyfrank-1.4B-f16.gguf filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -58,7 +58,7 @@ The following clients/libraries will automatically download models for you, prov
|
|
58 |
|
59 |
### In `text-generation-webui`
|
60 |
|
61 |
-
Under Download Model, you can enter the model repo: andrijdavid/tinyfrank-1.4B-GGUF and below it, a specific filename to download, such as: tinyfrank-1.4B.gguf.
|
62 |
|
63 |
Then click Download.
|
64 |
|
@@ -73,7 +73,7 @@ pip3 install huggingface-hub
|
|
73 |
Then you can download any individual model file to the current directory, at high speed, with a command like this:
|
74 |
|
75 |
```shell
|
76 |
-
huggingface-cli download andrijdavid/tinyfrank-1.4B-GGUF tinyfrank-1.4B.gguf --local-dir . --local-dir-use-symlinks False
|
77 |
```
|
78 |
|
79 |
<details>
|
@@ -96,7 +96,7 @@ pip3 install hf_transfer
|
|
96 |
And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
|
97 |
|
98 |
```shell
|
99 |
-
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download andrijdavid/tinyfrank-1.4B-GGUF tinyfrank-1.4B.gguf --local-dir . --local-dir-use-symlinks False
|
100 |
```
|
101 |
|
102 |
Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
|
@@ -108,7 +108,7 @@ Windows Command Line users: You can set the environment variable by running `set
|
|
108 |
Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
|
109 |
|
110 |
```shell
|
111 |
-
./main -ngl 35 -m tinyfrank-1.4B.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<PROMPT>"
|
112 |
```
|
113 |
|
114 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
@@ -159,7 +159,7 @@ pip install llama-cpp-python
|
|
159 |
from llama_cpp import Llama
|
160 |
# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
|
161 |
llm = Llama(
|
162 |
-
model_path="./tinyfrank-1.4B.gguf", # Download the model file first
|
163 |
n_ctx=32768, # The max sequence length to use - note that longer sequence lengths require much more resources
|
164 |
n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
|
165 |
n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
|
@@ -172,7 +172,7 @@ output = llm(
|
|
172 |
echo=True # Whether to echo the prompt
|
173 |
)
|
174 |
# Chat Completion API
|
175 |
-
llm = Llama(model_path="./tinyfrank-1.4B.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
|
176 |
llm.create_chat_completion(
|
177 |
messages = [
|
178 |
{"role": "system", "content": "You are a story writing assistant."},
|
|
|
58 |
|
59 |
### In `text-generation-webui`
|
60 |
|
61 |
+
Under Download Model, you can enter the model repo: andrijdavid/tinyfrank-1.4B-GGUF and below it, a specific filename to download, such as: tinyfrank-1.4B-f16.gguf.
|
62 |
|
63 |
Then click Download.
|
64 |
|
|
|
73 |
Then you can download any individual model file to the current directory, at high speed, with a command like this:
|
74 |
|
75 |
```shell
|
76 |
+
huggingface-cli download andrijdavid/tinyfrank-1.4B-GGUF tinyfrank-1.4B-f16.gguf --local-dir . --local-dir-use-symlinks False
|
77 |
```
|
78 |
|
79 |
<details>
|
|
|
96 |
And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
|
97 |
|
98 |
```shell
|
99 |
+
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download andrijdavid/tinyfrank-1.4B-GGUF tinyfrank-1.4B-f16.gguf --local-dir . --local-dir-use-symlinks False
|
100 |
```
|
101 |
|
102 |
Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
|
|
|
108 |
Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
|
109 |
|
110 |
```shell
|
111 |
+
./main -ngl 35 -m tinyfrank-1.4B-f16.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<PROMPT>"
|
112 |
```
|
113 |
|
114 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
|
|
159 |
from llama_cpp import Llama
|
160 |
# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
|
161 |
llm = Llama(
|
162 |
+
model_path="./tinyfrank-1.4B-f16.gguf", # Download the model file first
|
163 |
n_ctx=32768, # The max sequence length to use - note that longer sequence lengths require much more resources
|
164 |
n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
|
165 |
n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
|
|
|
172 |
echo=True # Whether to echo the prompt
|
173 |
)
|
174 |
# Chat Completion API
|
175 |
+
llm = Llama(model_path="./tinyfrank-1.4B-f16.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
|
176 |
llm.create_chat_completion(
|
177 |
messages = [
|
178 |
{"role": "system", "content": "You are a story writing assistant."},
|