--- license: cc-by-nc-sa-4.0 language: - multilingual - fa - en library_name: transformers tags: - text-generation-inference inference: false metrics: - bleu - comet - accuracy - perplexity - spearmanr pipeline_tag: text-generation co2_eq_emissions: emissions: 232380 source: "PersianMind: A Cross-Lingual Persian-English Large Language Model. https://arxiv.org/abs/2401.06466" training_type: "fine-tuning" hardware_used: "4 RTX3090 24GB GPUs" geographical_location: "Tehran, Iran" ---
# PersianMind PersianMind is a cross-lingual Persian-English large language model. The model achieves state-of-the-art results on Persian subset of the [Belebele](https://github.com/facebookresearch/belebele) benchmark and the [ParsiNLU multiple-choice QA](https://github.com/persiannlp/parsinlu) task. It also attains performance comparable to GPT-3.5-turbo in a Persian reading comprehension task. ## Model Description - **Developed by:** [Pedram Rostami](mailto:pedram.rostami@ut.ac.ir), [Ali Salemi](mailto:alisalemi@ut.ac.ir), and [Mohammad Javad Dousti](mailto:mjdousti@ut.ac.ir) - **Model type:** Language model - **Languages:** English and Persian - **License:** [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) (non-commercial use only.) ## How to Get Started with the Model Use the code below to get started with the model. Note that you need to install
sentencepiece
and accelerate
libraries along with PyTorch
and 🤗Transformers
to run this code.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForCausalLM.from_pretrained(
"universitytehran/PersianMind-v1.0",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
device_map={"": device},
)
tokenizer = AutoTokenizer.from_pretrained(
"universitytehran/PersianMind-v1.0",
)
TEMPLATE = "{context}\nYou: {prompt}\nPersianMind: "
CONTEXT = "This is a conversation with PersianMind. It is an artificial intelligence model designed by a team of " \
"NLP experts at the University of Tehran to help you with various tasks such as answering questions, " \
"providing recommendations, and helping with decision making. You can ask it anything you want and " \
"it will do its best to give you accurate and relevant information."
PROMPT = "در مورد هوش مصنوعی ØªÙˆØ¶ÛŒØ Ø¨Ø¯Ù‡."
model_input = TEMPLATE.format(context=CONTEXT, prompt=PROMPT)
input_tokens = tokenizer(model_input, return_tensors="pt")
input_tokens = input_tokens.to(device)
generate_ids = model.generate(**input_tokens, max_new_tokens=512, do_sample=False, repetition_penalty=1.1)
model_output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(model_output[len(model_input):])
```
### How to Quantize the Model
Quantized models can be run on resource-constrained devices.
To quantize the model, you should install the bitsandbytes
library.
In order to quantize the model in 8-bit (`INT8`), use the code below.
```python
model = AutoModelForCausalLM.from_pretrained(
"universitytehran/PersianMind-v1.0",
device_map="auto",
low_cpu_mem_usage=True,
load_in_8bit=True
)
```
Alternatively, you can quantize the model in 4-bit (`NormalFloat4`) with the following code.
```python
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
model = AutoModelForCausalLM.from_pretrained(
"universitytehran/PersianMind-v1.0",
quantization_config=quantization_config,
device_map="auto"
)
```
### Evaluating Quantized Models
| Model | Belebele (Persian) | Fa→En Translation