File size: 7,574 Bytes
390be8d 3b3a840 390be8d 838e124 390be8d 3b3a840 ba75333 af603ee 390be8d 88bebcd ba75333 390be8d beb0d76 390be8d beb0d76 da88011 beb0d76 390be8d 1ea7f53 390be8d ff69d78 390be8d beb0d76 dd07a7b 390be8d e8acab7 390be8d e8acab7 390be8d e8acab7 390be8d 49513bb 390be8d beb0d76 ae8cd0d beb0d76 ae8cd0d e8acab7 ae8cd0d 06ff501 ae8cd0d e8acab7 ae8cd0d beb0d76 ae8cd0d 4407029 81dad0c ae8cd0d da88011 ae8cd0d d8bbdbd beb0d76 ae8cd0d 390be8d beb0d76 b2de090 390be8d 65800e7 390be8d e004aaf 390be8d e004aaf 390be8d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
---
license: cc-by-nc-sa-4.0
language:
- multilingual
- fa
- en
library_name: transformers
tags:
- text-generation-inference
inference: false
metrics:
- bleu
- comet
- accuracy
- perplexity
- spearmanr
pipeline_tag: text-generation
co2_eq_emissions:
emissions: 232380
source: "PersianMind: A Cross-Lingual Persian-English Large Language Model. https://arxiv.org/abs/2401.06466"
training_type: "fine-tuning"
hardware_used: "4 RTX3090 24GB GPUs"
geographical_location: "Tehran, Iran"
---
<p align="center">
<img src="PersianMind.jpg" alt="PersianMind logo" width=200/>
</p>
# <span style="font-variant:small-caps;">PersianMind</span>
<span style="font-variant:small-caps;">PersianMind</span> is a cross-lingual Persian-English large language model.
The model achieves state-of-the-art results on Persian subset of the [<span style="font-variant:small-caps;">Belebele</span>](https://github.com/facebookresearch/belebele) benchmark
and the [ParsiNLU multiple-choice QA](https://github.com/persiannlp/parsinlu) task.
It also attains performance comparable to GPT-3.5-turbo in a Persian reading comprehension task.
## Model Description
- **Developed by:** [Pedram Rostami](mailto:[email protected]), [Ali Salemi](mailto:[email protected]), and [Mohammad Javad Dousti](mailto:[email protected])
- **Model type:** Language model
- **Languages:** English and Persian
- **License:** [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) (non-commercial use only.)
## How to Get Started with the Model
Use the code below to get started with the model.
Note that you need to install <code><b>sentencepiece</b></code> and <code><b>accelerate</b></code> libraries along with <code><b>PyTorch</b></code> and <code><b>🤗Transformers</b></code> to run this code.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForCausalLM.from_pretrained(
"universitytehran/PersianMind-v1.0",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
device_map={"": device},
)
tokenizer = AutoTokenizer.from_pretrained(
"universitytehran/PersianMind-v1.0",
)
TEMPLATE = "{context}\nYou: {prompt}\nPersianMind: "
CONTEXT = "This is a conversation with PersianMind. It is an artificial intelligence model designed by a team of " \
"NLP experts at the University of Tehran to help you with various tasks such as answering questions, " \
"providing recommendations, and helping with decision making. You can ask it anything you want and " \
"it will do its best to give you accurate and relevant information."
PROMPT = "در مورد هوش مصنوعی توضیح بده."
model_input = TEMPLATE.format(context=CONTEXT, prompt=PROMPT)
input_tokens = tokenizer(model_input, return_tensors="pt")
input_tokens = input_tokens.to(device)
generate_ids = model.generate(**input_tokens, max_new_tokens=512, do_sample=False, repetition_penalty=1.1)
model_output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(model_output[len(model_input):])
```
### How to Quantize the Model
Quantized models can be run on resource-constrained devices.
To quantize the model, you should install the <code><b>bitsandbytes</b></code> library.
In order to quantize the model in 8-bit (`INT8`), use the code below.
```python
model = AutoModelForCausalLM.from_pretrained(
"universitytehran/PersianMind-v1.0",
device_map="auto",
low_cpu_mem_usage=True,
load_in_8bit=True
)
```
Alternatively, you can quantize the model in 4-bit (`NormalFloat4`) with the following code.
```python
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
model = AutoModelForCausalLM.from_pretrained(
"universitytehran/PersianMind-v1.0",
quantization_config=quantization_config,
device_map="auto"
)
```
### Evaluating Quantized Models
| Model | <span style="font-variant:small-caps;">Belebele</span> (Persian) | Fa→En Translation<br>(<span style="font-variant:small-caps;">Comet</span>) | En→Fa Translation<br>(<span style="font-variant:small-caps;">Comet</span>) | Model Size | Tokens/sec |
| :----------------------------------------------------------------: | :--------------------------------------------------------------: | :------------------------------------------------------------------------: | :------------------------------------------------------------------------: | :--------: | :--------: |
| <span style="font-variant:small-caps;">PersianMind</span> (`BF16`) | 73.9 | 83.61 | 79.44 | 13.7G | 25.35 |
| <span style="font-variant:small-caps;">PersianMind</span> (`INT8`) | 73.7 | 82.32 | 78.61 | 7.2G | 11.36 |
| <span style="font-variant:small-caps;">PersianMind</span> (`NF4`) | 70.2 | 82.07 | 80.36 | 3.9G | 24.36 |
We evaluated quantized models in various tasks against the original model.
Specifically, we evaluated all models using the reading comprehension multiple-choice
question-answering benchmark of [<span style="font-variant:small-caps;">Belebele</span>](https://github.com/facebookresearch/belebele) (Persian subset) and reported the accuracy of each model.
Additionally, we evaluated our models for Persian-to-English and English-to-Persian translation tasks.
For this, we utilized the Persian-English subset of the [<span style="font-variant:small-caps;">Flores</span>-200](https://github.com/facebookresearch/flores/tree/main/flores200) dataset and
reported our results using the <span style="font-variant:small-caps;">Comet</span> metric.
Furthermore, we calculated the average number of generated tokens per second by each model during running the translation tasks.
To understand resource efficiency, we measured the memory usage of each model by employing the `get_memory_footprint()` function.
## License
<span style="font-variant:small-caps;">PersianMind</span> is subject to Meta's [LLaMa2 Community License](https://raw.githubusercontent.com/facebookresearch/llama/main/LICENSE).
It is further licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/), which allows non-commercial use of the model.
Commercial use of this model requires written agreement which must be obtained from the copyright holders who are listed as developers in this page.
If you suspect any violations, please reach out to us.
## Citation
If you find this model helpful, please ensure to cite the following paper.
**BibTeX:**
```bibtex
@misc{persianmind,
title={{PersianMind: A Cross-Lingual Persian-English Large Language Model}},
author={Rostami, Pedram and Salemi, Ali and Dousti, Mohammad Javad},
year={2024}
eprint={2401.06466},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
``` |