universitytehran
/

PersianMind-v1.0

Text Generation

text-generation-inference

Model card Files Files and versions Community

Pedram Rostami commited on Jan 8, 2024

Commit

ae8cd0d

·

1 Parent(s): 35d38de

Update README.md

Files changed (1) hide show

README.md +48 -0

README.md CHANGED Viewed

@@ -73,6 +73,54 @@ model_output = model_output.replace(model_input, "")
 print(model_output)
 ```
 ## License
 PersianMind is subject to Meta's [LLaMa2 Community License](https://raw.githubusercontent.com/facebookresearch/llama/main/LICENSE).
 It is further licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/), which allows non-commercial use of the model.

 print(model_output)
 ```
+## How to Get Started with the Quantized Model
+Quantized models can be run on resource-constrained devices.
+To use quantized models, you should install the `bitsandbytes` library.
+To get started with 8-bit quantized model, use the code below to define the model.
+```python
+model = LlamaForCausalLM.from_pretrained(
+    "universitytehran/PersianMind-v1.0",
+    device_map="auto",
+    low_cpu_mem_usage=True,
+    load_in_8bit=True
+)
+```
+To get started with 4-bit quantized model, use the code below to define the model.
+```python
+from transformers import BitsAndBytesConfig
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+)
+model = LlamaForCausalLM.from_pretrained(
+    "universitytehran/PersianMind-v1.0",
+    quantization_config=quantization_config,
+    device_map="auto"
+)
+```
+## Evaluating Quantized Models
+| Model              | Belebele (Persian) | Translation Fa2En | Translation En2Fa | Model Size | Words/sec |
+| :----------------- | :----------------: | :---------------: | :---------------: | :--------: | :-------: |
+| PersianMind        |        73.9        |       83.61       |       79.44       |   13.66G   |   25.35   |
+| PersianMind-8bit   |        73.7        |       82.32       |       78.61       |    7.2G    |   11.36   |
+| PersianMind-4bit   |        70.2        |       82.07       |       80.36       |    3.9G    |   24.36   |
+We evaluated quantized models in various tasks against the original model.
+Specifically, we evaluated all models using the reading comprehension multiple-choice
+question-answering benchmark of Belebele (Persian subset) and reported the accuracy of each model.
+Additionally, we evaluated our models for Persian-to-English and English-to-Persian translation tasks.
+For this, we utilized the Persian-English subset of the Flores-200 dataset and reported our results using the Comet metric.
+Furthermore, we calculated the average number of words generated by each model per second during running the translation tasks.
+To understand resource efficiency, we measured the memory usage of each model by employing the `get_memory_footprint` function.
 ## License
 PersianMind is subject to Meta's [LLaMa2 Community License](https://raw.githubusercontent.com/facebookresearch/llama/main/LICENSE).
 It is further licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/), which allows non-commercial use of the model.