OpenNMT
/

Mistral-7B-v0.2-instruct-onmt-awq-gemv

Model card Files Files and versions Community

vince62s commited on Dec 20, 2023

Commit

2867d65

·

1 Parent(s): 5cdc95b

Update README.md

Files changed (1) hide show

README.md +13 -19

README.md CHANGED Viewed

@@ -6,9 +6,9 @@ This is the OpenNMT-py converted version of Mistral 7b Instruct v0.1, 4-bit AWQ
 The safetensors file is 4.2GB hence runs smoothly on any RTX card.
 Command line to run is:
 python onmt/bin/translate.py --config /pathto/mistral-instruct-inference-awq.yaml --src /pathto/input-vicuna.txt --output /pathto/mistral-output.txt
 Where for instance, input-vicuna.txt contains:
 USER:｟newline｠Show me some attractions in Boston.｟newline｠｟newline｠ASSISTANT:｟newline｠
@@ -22,23 +22,17 @@ Boston is a great city with many attractions to visit. Here are some popular one
 If you run with a batch size of 60 you can get a nice throughput even with GEMV:
-[2023-12-20 08:27:03,293 INFO] Loading checkpoint from /mnt/InternalCrucial4/dataAI/mistral-7B/mistral-instruct/mistral-onmt-awq.pt
-[2023-12-20 08:27:03,394 INFO] aawq_gemv compression of layer ['w_1', 'w_2', 'w_3', 'linear_values', 'linear_query', 'linear_keys', 'final_linear']
-[2023-12-20 08:27:08,346 INFO] Loading data into the model
-step0 time:  1.3734617233276367
-[2023-12-20 08:27:28,197 INFO] PRED SCORE: -0.2994, PRED PPL: 1.35 NB SENTENCES: 59
-[2023-12-20 08:27:28,197 INFO] Total translation time (s): 6.4
-[2023-12-20 08:27:28,197 INFO] Average translation time (ms): 109.1
-[2023-12-20 08:27:28,197 INFO] Tokens per second: 1835.8
-Time w/o python interpreter load/terminate:  24.914613008499146

 The safetensors file is 4.2GB hence runs smoothly on any RTX card.
 Command line to run is:
+```
 python onmt/bin/translate.py --config /pathto/mistral-instruct-inference-awq.yaml --src /pathto/input-vicuna.txt --output /pathto/mistral-output.txt
+```
 Where for instance, input-vicuna.txt contains:
 USER:｟newline｠Show me some attractions in Boston.｟newline｠｟newline｠ASSISTANT:｟newline｠
 If you run with a batch size of 60 you can get a nice throughput even with GEMV:
+```
+[2023-12-20 08:41:50,556 INFO] Loading checkpoint from /mnt/InternalCrucial4/dataAI/mistral-7B/mistral-instruct/mistral-onmt-awq.pt
+[2023-12-20 08:41:50,647 INFO] aawq_gemv compression of layer ['w_1', 'w_2', 'w_3', 'linear_values', 'linear_query', 'linear_keys', 'final_linear']
+[2023-12-20 08:41:54,655 INFO] Loading data into the model
+step0 time:  1.2817533016204834
+[2023-12-20 08:42:01,746 INFO] PRED SCORE: -0.2969, PRED PPL: 1.35 NB SENTENCES: 59
+[2023-12-20 08:42:01,746 INFO] Total translation time (s): 6.1
+[2023-12-20 08:42:01,746 INFO] Average translation time (ms): 104.2
+[2023-12-20 08:42:01,746 INFO] Tokens per second: 1923.2
+Time w/o python interpreter load/terminate:  11.200659036636353
+```