---
language:
- fr
- en
tags:
- Question generation
datasets:
- FQuAD
- PIAF
- SQuAD
metrics:
- rouge
- BERTScore (cammenbert)
widget:
- text: "The period from the death of Alexander the Great in 323 until the death of Cleopatra VII, the last Macedonian ruler of Egypt, is known as the Hellenistic period. In the early part of this period, a new form of kingship developed based on Macedonian and Near Eastern traditions. The first Hellenistic kings were previously Alexander's generals, and took power in the period following his death, though they were not part of existing royal lineages and lacked historic claims to the territories they controlled."
---

# A Model for multiple questions generation for french and english languages

## Training

The model has been trained on different french and english corpus (FQuAD, PIAF and SQuAD) where for each paragraph the objective is to predict all the possible questions.
We are using the mbart model `facebook/mbart-large-50-many-to-many-mmt`, notice that it would works better with MBarthez like models for the french generation (MBarthez finetuning will be uploaded later).
For all dataset we translate queries (in french or in english) but we always preserve paragraph in its original language, thus the model can create english questions with french paragraph.
We trained the model during 12 epochs considering an epochs as 2000 batches of 128 (considering gradient accumulation).
 ## Generate
 ```python 
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

access_token = "hf_......"

# Loading the model weights and the tokenizer
tokenizer = AutoTokenizer.from_pretrained("ThomasGerald/mbart-multi-question-generation", use_auth_token=access_token)
model = AutoModelForSeq2SeqLM.from_pretrained("ThomasGerald/mbart-multi-question-generation", use_auth_token=access_token)

# For the exemple we give the following text talking about origin of the grec language 
text = ("La recherche moderne considère généralement que la langue grecque n'est pas née en Grèce," +
    "mais elle n'est pas arrivée à un consensus quant à la date d'arrivée des groupes parlant un "+
    "« proto-grec », qui s'est produite durant des phases préhistoriques pour lesquelles il n'y a"+
    "pas de texte indiquant quelles langues étaient parlées. Les premiers textes écrits en grec sont"+
    "les tablettes en linéaire B de l'époque mycénienne, au XIVe siècle av. J.-C., ce qui indique que"+
    "des personnes parlant un dialecte grec sont présentes en Grèce au plus tard durant cette période."+
    " La linguistique n'est pas en mesure de trancher, pas plus que l'archéologie.")

# We specify the input languages and tokenize
tokenizer.set_src_lang_special_tokens("fr_XX")
tokenized_text = tokenizer([text], return_tensors="pt")
########## FRENCH DECODING
# We generate given the output language code
output = model.generate(**tokenized_text, forced_bos_token_id=tokenizer.lang_code_to_id['fr_XX'])

# We show the outpu of the model (sequence of questions separated by the token <question_sep>)
tokenizer.batch_decode(output, skip_special_tokens=False)
#### output : 
'''["</s>fr_XX Quelle est l'origine de la langue grecque selon la recherche moderne?<question_sep>
Quelle est la date d'arrivée des grecs proto-grecs?<question_sep> Où sont les premiers textes
écrits en grec?<question_sep> De quand date l'époque mycénienne?<question_sep>
Qu'est ce que la linguistique n'est pas en mesure de faire?</s>"]
'''
######### ENGLISH DECODING
# We generate given the output language code
output = model.generate(**tokenized_text, forced_bos_token_id=tokenizer.lang_code_to_id['en_XX'])

# We show the outpu of the model (sequence of questions separated by the token <question_sep>)
tokenizer.batch_decode(output, skip_special_tokens=False)
#### output : 
'''["</s>en_XX What is the origin of the Greek language according to modern research?
<question_sep> When did the prehistoric phases take place?<question_sep> What are the
first texts written in Greek?<question_sep> When do the first written texts in Greek
date back?<question_sep> What is the only science that can't decide the dates of the
first Greek texts?</s>"]
'''
```