Pedram Rostami
commited on
Commit
·
ae8cd0d
1
Parent(s):
35d38de
Update README.md
Browse files
README.md
CHANGED
@@ -73,6 +73,54 @@ model_output = model_output.replace(model_input, "")
|
|
73 |
print(model_output)
|
74 |
```
|
75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
76 |
## License
|
77 |
PersianMind is subject to Meta's [LLaMa2 Community License](https://raw.githubusercontent.com/facebookresearch/llama/main/LICENSE).
|
78 |
It is further licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/), which allows non-commercial use of the model.
|
|
|
73 |
print(model_output)
|
74 |
```
|
75 |
|
76 |
+
## How to Get Started with the Quantized Model
|
77 |
+
|
78 |
+
Quantized models can be run on resource-constrained devices.
|
79 |
+
To use quantized models, you should install the `bitsandbytes` library.
|
80 |
+
To get started with 8-bit quantized model, use the code below to define the model.
|
81 |
+
|
82 |
+
```python
|
83 |
+
model = LlamaForCausalLM.from_pretrained(
|
84 |
+
"universitytehran/PersianMind-v1.0",
|
85 |
+
device_map="auto",
|
86 |
+
low_cpu_mem_usage=True,
|
87 |
+
load_in_8bit=True
|
88 |
+
)
|
89 |
+
```
|
90 |
+
|
91 |
+
To get started with 4-bit quantized model, use the code below to define the model.
|
92 |
+
|
93 |
+
```python
|
94 |
+
from transformers import BitsAndBytesConfig
|
95 |
+
|
96 |
+
quantization_config = BitsAndBytesConfig(
|
97 |
+
load_in_4bit=True,
|
98 |
+
bnb_4bit_use_double_quant=True,
|
99 |
+
bnb_4bit_quant_type="nf4",
|
100 |
+
)
|
101 |
+
model = LlamaForCausalLM.from_pretrained(
|
102 |
+
"universitytehran/PersianMind-v1.0",
|
103 |
+
quantization_config=quantization_config,
|
104 |
+
device_map="auto"
|
105 |
+
)
|
106 |
+
```
|
107 |
+
|
108 |
+
## Evaluating Quantized Models
|
109 |
+
|
110 |
+
| Model | Belebele (Persian) | Translation Fa2En | Translation En2Fa | Model Size | Words/sec |
|
111 |
+
| :----------------- | :----------------: | :---------------: | :---------------: | :--------: | :-------: |
|
112 |
+
| PersianMind | 73.9 | 83.61 | 79.44 | 13.66G | 25.35 |
|
113 |
+
| PersianMind-8bit | 73.7 | 82.32 | 78.61 | 7.2G | 11.36 |
|
114 |
+
| PersianMind-4bit | 70.2 | 82.07 | 80.36 | 3.9G | 24.36 |
|
115 |
+
|
116 |
+
We evaluated quantized models in various tasks against the original model.
|
117 |
+
Specifically, we evaluated all models using the reading comprehension multiple-choice
|
118 |
+
question-answering benchmark of Belebele (Persian subset) and reported the accuracy of each model.
|
119 |
+
Additionally, we evaluated our models for Persian-to-English and English-to-Persian translation tasks.
|
120 |
+
For this, we utilized the Persian-English subset of the Flores-200 dataset and reported our results using the Comet metric.
|
121 |
+
Furthermore, we calculated the average number of words generated by each model per second during running the translation tasks.
|
122 |
+
To understand resource efficiency, we measured the memory usage of each model by employing the `get_memory_footprint` function.
|
123 |
+
|
124 |
## License
|
125 |
PersianMind is subject to Meta's [LLaMa2 Community License](https://raw.githubusercontent.com/facebookresearch/llama/main/LICENSE).
|
126 |
It is further licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/), which allows non-commercial use of the model.
|