vince62s commited on
Commit
2867d65
·
1 Parent(s): 5cdc95b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -19
README.md CHANGED
@@ -6,9 +6,9 @@ This is the OpenNMT-py converted version of Mistral 7b Instruct v0.1, 4-bit AWQ
6
  The safetensors file is 4.2GB hence runs smoothly on any RTX card.
7
 
8
  Command line to run is:
9
-
10
  python onmt/bin/translate.py --config /pathto/mistral-instruct-inference-awq.yaml --src /pathto/input-vicuna.txt --output /pathto/mistral-output.txt
11
-
12
  Where for instance, input-vicuna.txt contains:
13
 
14
  USER:⦅newline⦆Show me some attractions in Boston.⦅newline⦆⦅newline⦆ASSISTANT:⦅newline⦆
@@ -22,23 +22,17 @@ Boston is a great city with many attractions to visit. Here are some popular one
22
 
23
  If you run with a batch size of 60 you can get a nice throughput even with GEMV:
24
 
25
- [2023-12-20 08:27:03,293 INFO] Loading checkpoint from /mnt/InternalCrucial4/dataAI/mistral-7B/mistral-instruct/mistral-onmt-awq.pt
26
-
27
- [2023-12-20 08:27:03,394 INFO] aawq_gemv compression of layer ['w_1', 'w_2', 'w_3', 'linear_values', 'linear_query', 'linear_keys', 'final_linear']
28
-
29
- [2023-12-20 08:27:08,346 INFO] Loading data into the model
30
-
31
- step0 time: 1.3734617233276367
32
-
33
- [2023-12-20 08:27:28,197 INFO] PRED SCORE: -0.2994, PRED PPL: 1.35 NB SENTENCES: 59
34
-
35
- [2023-12-20 08:27:28,197 INFO] Total translation time (s): 6.4
36
-
37
- [2023-12-20 08:27:28,197 INFO] Average translation time (ms): 109.1
38
-
39
- [2023-12-20 08:27:28,197 INFO] Tokens per second: 1835.8
40
-
41
- Time w/o python interpreter load/terminate: 24.914613008499146
42
 
43
 
44
 
 
6
  The safetensors file is 4.2GB hence runs smoothly on any RTX card.
7
 
8
  Command line to run is:
9
+ ```
10
  python onmt/bin/translate.py --config /pathto/mistral-instruct-inference-awq.yaml --src /pathto/input-vicuna.txt --output /pathto/mistral-output.txt
11
+ ```
12
  Where for instance, input-vicuna.txt contains:
13
 
14
  USER:⦅newline⦆Show me some attractions in Boston.⦅newline⦆⦅newline⦆ASSISTANT:⦅newline⦆
 
22
 
23
  If you run with a batch size of 60 you can get a nice throughput even with GEMV:
24
 
25
+ ```
26
+ [2023-12-20 08:41:50,556 INFO] Loading checkpoint from /mnt/InternalCrucial4/dataAI/mistral-7B/mistral-instruct/mistral-onmt-awq.pt
27
+ [2023-12-20 08:41:50,647 INFO] aawq_gemv compression of layer ['w_1', 'w_2', 'w_3', 'linear_values', 'linear_query', 'linear_keys', 'final_linear']
28
+ [2023-12-20 08:41:54,655 INFO] Loading data into the model
29
+ step0 time: 1.2817533016204834
30
+ [2023-12-20 08:42:01,746 INFO] PRED SCORE: -0.2969, PRED PPL: 1.35 NB SENTENCES: 59
31
+ [2023-12-20 08:42:01,746 INFO] Total translation time (s): 6.1
32
+ [2023-12-20 08:42:01,746 INFO] Average translation time (ms): 104.2
33
+ [2023-12-20 08:42:01,746 INFO] Tokens per second: 1923.2
34
+ Time w/o python interpreter load/terminate: 11.200659036636353
35
+ ```
 
 
 
 
 
 
36
 
37
 
38