Pedram Rostami commited on
Commit
ae8cd0d
·
1 Parent(s): 35d38de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md CHANGED
@@ -73,6 +73,54 @@ model_output = model_output.replace(model_input, "")
73
  print(model_output)
74
  ```
75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
  ## License
77
  PersianMind is subject to Meta's [LLaMa2 Community License](https://raw.githubusercontent.com/facebookresearch/llama/main/LICENSE).
78
  It is further licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/), which allows non-commercial use of the model.
 
73
  print(model_output)
74
  ```
75
 
76
+ ## How to Get Started with the Quantized Model
77
+
78
+ Quantized models can be run on resource-constrained devices.
79
+ To use quantized models, you should install the `bitsandbytes` library.
80
+ To get started with 8-bit quantized model, use the code below to define the model.
81
+
82
+ ```python
83
+ model = LlamaForCausalLM.from_pretrained(
84
+ "universitytehran/PersianMind-v1.0",
85
+ device_map="auto",
86
+ low_cpu_mem_usage=True,
87
+ load_in_8bit=True
88
+ )
89
+ ```
90
+
91
+ To get started with 4-bit quantized model, use the code below to define the model.
92
+
93
+ ```python
94
+ from transformers import BitsAndBytesConfig
95
+
96
+ quantization_config = BitsAndBytesConfig(
97
+ load_in_4bit=True,
98
+ bnb_4bit_use_double_quant=True,
99
+ bnb_4bit_quant_type="nf4",
100
+ )
101
+ model = LlamaForCausalLM.from_pretrained(
102
+ "universitytehran/PersianMind-v1.0",
103
+ quantization_config=quantization_config,
104
+ device_map="auto"
105
+ )
106
+ ```
107
+
108
+ ## Evaluating Quantized Models
109
+
110
+ | Model | Belebele (Persian) | Translation Fa2En | Translation En2Fa | Model Size | Words/sec |
111
+ | :----------------- | :----------------: | :---------------: | :---------------: | :--------: | :-------: |
112
+ | PersianMind | 73.9 | 83.61 | 79.44 | 13.66G | 25.35 |
113
+ | PersianMind-8bit | 73.7 | 82.32 | 78.61 | 7.2G | 11.36 |
114
+ | PersianMind-4bit | 70.2 | 82.07 | 80.36 | 3.9G | 24.36 |
115
+
116
+ We evaluated quantized models in various tasks against the original model.
117
+ Specifically, we evaluated all models using the reading comprehension multiple-choice
118
+ question-answering benchmark of Belebele (Persian subset) and reported the accuracy of each model.
119
+ Additionally, we evaluated our models for Persian-to-English and English-to-Persian translation tasks.
120
+ For this, we utilized the Persian-English subset of the Flores-200 dataset and reported our results using the Comet metric.
121
+ Furthermore, we calculated the average number of words generated by each model per second during running the translation tasks.
122
+ To understand resource efficiency, we measured the memory usage of each model by employing the `get_memory_footprint` function.
123
+
124
  ## License
125
  PersianMind is subject to Meta's [LLaMa2 Community License](https://raw.githubusercontent.com/facebookresearch/llama/main/LICENSE).
126
  It is further licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/), which allows non-commercial use of the model.