Update README.md
Browse files
README.md
CHANGED
@@ -6,9 +6,9 @@ This is the OpenNMT-py converted version of Mistral 7b Instruct v0.1, 4-bit AWQ
|
|
6 |
The safetensors file is 4.2GB hence runs smoothly on any RTX card.
|
7 |
|
8 |
Command line to run is:
|
9 |
-
|
10 |
python onmt/bin/translate.py --config /pathto/mistral-instruct-inference-awq.yaml --src /pathto/input-vicuna.txt --output /pathto/mistral-output.txt
|
11 |
-
|
12 |
Where for instance, input-vicuna.txt contains:
|
13 |
|
14 |
USER:⦅newline⦆Show me some attractions in Boston.⦅newline⦆⦅newline⦆ASSISTANT:⦅newline⦆
|
@@ -22,23 +22,17 @@ Boston is a great city with many attractions to visit. Here are some popular one
|
|
22 |
|
23 |
If you run with a batch size of 60 you can get a nice throughput even with GEMV:
|
24 |
|
25 |
-
|
26 |
-
|
27 |
-
[2023-12-20 08:
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
[2023-12-20 08:
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
[2023-12-20 08:27:28,197 INFO] Average translation time (ms): 109.1
|
38 |
-
|
39 |
-
[2023-12-20 08:27:28,197 INFO] Tokens per second: 1835.8
|
40 |
-
|
41 |
-
Time w/o python interpreter load/terminate: 24.914613008499146
|
42 |
|
43 |
|
44 |
|
|
|
6 |
The safetensors file is 4.2GB hence runs smoothly on any RTX card.
|
7 |
|
8 |
Command line to run is:
|
9 |
+
```
|
10 |
python onmt/bin/translate.py --config /pathto/mistral-instruct-inference-awq.yaml --src /pathto/input-vicuna.txt --output /pathto/mistral-output.txt
|
11 |
+
```
|
12 |
Where for instance, input-vicuna.txt contains:
|
13 |
|
14 |
USER:⦅newline⦆Show me some attractions in Boston.⦅newline⦆⦅newline⦆ASSISTANT:⦅newline⦆
|
|
|
22 |
|
23 |
If you run with a batch size of 60 you can get a nice throughput even with GEMV:
|
24 |
|
25 |
+
```
|
26 |
+
[2023-12-20 08:41:50,556 INFO] Loading checkpoint from /mnt/InternalCrucial4/dataAI/mistral-7B/mistral-instruct/mistral-onmt-awq.pt
|
27 |
+
[2023-12-20 08:41:50,647 INFO] aawq_gemv compression of layer ['w_1', 'w_2', 'w_3', 'linear_values', 'linear_query', 'linear_keys', 'final_linear']
|
28 |
+
[2023-12-20 08:41:54,655 INFO] Loading data into the model
|
29 |
+
step0 time: 1.2817533016204834
|
30 |
+
[2023-12-20 08:42:01,746 INFO] PRED SCORE: -0.2969, PRED PPL: 1.35 NB SENTENCES: 59
|
31 |
+
[2023-12-20 08:42:01,746 INFO] Total translation time (s): 6.1
|
32 |
+
[2023-12-20 08:42:01,746 INFO] Average translation time (ms): 104.2
|
33 |
+
[2023-12-20 08:42:01,746 INFO] Tokens per second: 1923.2
|
34 |
+
Time w/o python interpreter load/terminate: 11.200659036636353
|
35 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
|
38 |
|