techysanoj's picture
Update README.md
78f256e verified
metadata
datasets:
  - squad
  - squad_v2
widget:
  - text: Which name is also used to describe the Amazon rainforest in English?
    context: >-
      The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia;
      Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt
      amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia
      or the Amazon Jungle, is a moist broadleaf forest that covers most of the
      Amazon basin of South America. This basin encompasses 7,000,000 square
      kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres
      (2,100,000 sq mi) are covered by the rainforest. This region includes
      territory belonging to nine nations. The majority of the forest is
      contained within Brazil, with 60% of the rainforest, followed by Peru with
      13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador,
      Bolivia, Guyana, Suriname and French Guiana. States or departments in four
      nations contain "Amazonas" in their names. The Amazon represents over half
      of the planet's remaining rainforests, and comprises the largest and most
      biodiverse tract of tropical rainforest in the world, with an estimated
      390 billion individual trees divided into 16,000 species.
  - text: How many square kilometers of rainforest is covered in the basin?
    context: >-
      The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia;
      Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt
      amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia
      or the Amazon Jungle, is a moist broadleaf forest that covers most of the
      Amazon basin of South America. This basin encompasses 7,000,000 square
      kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres
      (2,100,000 sq mi) are covered by the rainforest. This region includes
      territory belonging to nine nations. The majority of the forest is
      contained within Brazil, with 60% of the rainforest, followed by Peru with
      13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador,
      Bolivia, Guyana, Suriname and French Guiana. States or departments in four
      nations contain "Amazonas" in their names. The Amazon represents over half
      of the planet's remaining rainforests, and comprises the largest and most
      biodiverse tract of tropical rainforest in the world, with an estimated
      390 billion individual trees divided into 16,000 species.
language:
  - en
  - hi
metrics:
  - accuracy
pipeline_tag: question-answering

avishkaarak-ekta-hindi

This is the avishkaarak-ekta-hindi model, fine-tuned using the SQuAD2.0 dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering.

Overview

Language model: avishkaarak-ekta-hindi
Language: English, Hindi(Upcoming)
Downstream-task: Extractive QA
Training data: SQuAD 2.0
Eval data: SQuAD 2.0
Code: See an example QA pipeline on Haystack
Infrastructure: 4x Tesla v100

Hyperparameters

batch_size = 4
n_epochs = 50
base_LM_model = "roberta-base"
max_seq_len = 512
learning_rate = 9e-5
lr_schedule = LinearWarmup
warmup_proportion = 0.2
doc_stride=128
max_query_length=64

Usage

In Haystack

Haystack is an NLP framework by deepset. You can use this model in a Haystack pipeline to do question answering at scale (over many documents). To load the model in Haystack:

reader = FARMReader(model_name_or_path="AVISHKAARAM/avishkaarak-ekta-hindi")
# or 
reader = TransformersReader(model_name_or_path="AVISHKAARAM/avishkaarak-ekta-hindi",tokenizer="deepset/roberta-base-squad2")

For a complete example of AVISHKAARAM/avishkaarak-ekta-hindi being used for Question Answering, check out the Tutorials in Haystack Documentation

In Transformers

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "AVISHKAARAM/avishkaarak-ekta-hindi"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
    'question': 'Why is model conversion important?',
    'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
}
res = nlp(QA_input)

# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Performance

Evaluated on the SQuAD 2.0 dev set with the official eval script.

"exact": 79.87029394424324,
"f1": 82.91251169582613,

"total": 11873,
"HasAns_exact": 77.93522267206478,
"HasAns_f1": 84.02838248389763,
"HasAns_total": 5928,
"NoAns_exact": 81.79983179142137,
"NoAns_f1": 81.79983179142137,
"NoAns_total": 5945

Authors

Shashwat Bindal: optimus.coders.@ai

Sanoj: optimus.coders.@ai