grammarly
/

medit-xxl

Text Generation

Inference Endpoints

Model card Files Files and versions Community

medit-xxl / README.md

dimalik

add more examples

e84d086 12 months ago

|

2.41 kB

	---
	license: cc-by-nc-sa-4.0
	datasets:
	- wi_locness
	- matejklemen/falko_merlin
	- paws
	- paws-x
	- asset
	language:
	- en
	- de
	- es
	- ar
	- ja
	- ko
	- zh
	metrics:
	- bleu
	- rouge
	- sari
	- accuracy
	library_name: transformers
	---

	# Model Card for mEdIT-xxl

	The `medit-xxl` model was obtained by fine-tuning the `MBZUAI/bactrian-x-llama-13b-lora` model on the mEdIT dataset.

	Paper: mEdIT: Multilingual Text Editing via Instruction Tuning

	Authors: Vipul Raheja, Dimitris Alikaniotis, Vivek Kulkarni, Bashar Alhafni, Dhruv Kumar

	## Model Details

	### Model Description

	- Language(s) (NLP): Arabic, Chinese, English, German, Japanese, Korean, Spanish
	- Finetuned from model: `MBZUAI/bactrian-x-llama-13b-lora`

	### Model Sources

	- Repository: https://github.com/vipulraheja/medit
	- Paper: TBA

	## How to use

	### Instruction format

	Adherence to the following instruction format is essential; failure to do so may result in the model producing less-than-ideal results.


	```
	instruction_tokens = [
	"Instruction",
	"Anweisung",
	...
	]

	input_tokens = [
	"Input",
	"Aporte",
	...
	]

	output_tokens = [
	"Output",
	"Produzione",
	...
	]

	task_descriptions = [
	"Fix grammatical errors in this sentence", # <-- GEC task
	"Umschreiben Sie den Satz", # <-- Paraphrasing
	...
	]

	The entire list of possible instruction, input, output tokens, and task descriptions can be found in the Appendix of our paper.


	prompt_template = """### <instruction_token>:\n<task description>\n### <input_token>:\n<input>\n### <output_token>:\n\n"""

	Note that the tokens and the task description need not be in the language of the input.
	```

	### Run the model

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_id = "grammarly/medit-xxl"
	tokenizer = AutoTokenizer.from_pretrained(model_id)

	model = AutoModelForCausalLM.from_pretrained(model_id)

	# English GEC
	prompt = '### 命令:\n文章を文法的にする\n### 入力:\nI has small cat ,\n### 出力:\n\n'

	inputs = tokenizer(prompt, return_tensors='pt')

	outputs = model.generate(**inputs, max_new_tokens=20)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True)

	# --> I have a small cat ,

	# German GEC

	prompt = '### 命令:\n文章を文法的にする\n### 入力:\nIch haben eines kleines Katze ,\n### 出力:\n\n'

	# ...
	# --> Ich habe eine kleine Katze ,
	```