julien-c HF staff commited on
Commit
75c8efc
·
1 Parent(s): 25ca108

Migrate model card from transformers-repo

Browse files

Read announcement at /static-proxy?url=https%3A%2F%2Fdiscuss.huggingface.co%2Ft%2Fannouncement-all-model-cards-will-be-migrated-to-hf-co-model-repos%2F2755%3Cbr%2F%3EOriginal file history: https://github.com/huggingface/transformers/commits/master/model_cards/joeddav/bart-large-mnli-yahoo-answers/README.md

Files changed (1) hide show
  1. README.md +71 -0
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - text-classification
5
+ - pytorch
6
+ datasets:
7
+ - yahoo-answers
8
+ pipeline_tag: zero-shot-classification
9
+ ---
10
+
11
+ # bart-lage-mnli-yahoo-answers
12
+
13
+ ## Model Description
14
+
15
+ This model takes [facebook/bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) and fine-tunes it on Yahoo Answers topic classification. It can be used to predict whether a topic label can be assigned to a given sequence, whether or not the label has been seen before.
16
+
17
+ You can play with an interactive demo of this zero-shot technique with this model, as well as the non-finetuned [facebook/bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli), [here](https://huggingface.co/zero-shot/).
18
+
19
+ ## Intended Usage
20
+
21
+ This model was fine-tuned on topic classification and will perform best at zero-shot topic classification. Use `hypothesis_template="This text is about {}."` as this is the template used during fine-tuning.
22
+
23
+ For settings other than topic classification, you can use any model pre-trained on MNLI such as [facebook/bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) or [roberta-large-mnli](https://huggingface.co/roberta-large-mnli) with the same code as written below.
24
+
25
+ #### With the zero-shot classification pipeline
26
+
27
+ The model can be used with the `zero-shot-classification` pipeline like so:
28
+
29
+ ```python
30
+ from transformers import pipeline
31
+ nlp = pipeline("zero-shot-classification", model="joeddav/bart-large-mnli-yahoo-answers")
32
+
33
+ sequence_to_classify = "Who are you voting for in 2020?"
34
+ candidate_labels = ["Europe", "public health", "politics", "elections"]
35
+ hypothesis_template = "This text is about {}."
36
+ nlp(sequence_to_classify, candidate_labels, multi_class=True, hypothesis_template=hypothesis_template)
37
+ ```
38
+
39
+ #### With manual PyTorch
40
+
41
+ ```python
42
+ # pose sequence as a NLI premise and label as a hypothesis
43
+ from transformers import BartForSequenceClassification, BartTokenizer
44
+ nli_model = BartForSequenceClassification.from_pretrained('joeddav/bart-large-mnli-yahoo-answers')
45
+ tokenizer = BartTokenizer.from_pretrained('joeddav/bart-large-mnli-yahoo-answers')
46
+
47
+ premise = sequence
48
+ hypothesis = f'This text is about {label}.'
49
+
50
+ # run through model pre-trained on MNLI
51
+ x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
52
+ max_length=tokenizer.max_len,
53
+ truncation_strategy='only_first')
54
+ logits = nli_model(x.to(device))[0]
55
+
56
+ # we throw away "neutral" (dim 1) and take the probability of
57
+ # "entailment" (2) as the probability of the label being true
58
+ entail_contradiction_logits = logits[:,[0,2]]
59
+ probs = entail_contradiction_logits.softmax(dim=1)
60
+ prob_label_is_true = probs[:,1]
61
+ ```
62
+
63
+ ## Training
64
+
65
+ The model is a pre-trained MNLI classifier further fine-tuned on Yahoo Answers topic classification in the manner originally described in [Yin et al. 2019](https://arxiv.org/abs/1909.00161) and [this blog post](https://joeddav.github.io/blog/2020/05/29/ZSL.html). That is, each sequence is fed to the pre-trained NLI model in place of the premise and each candidate label as the hypothesis, formatted like so: `This text is about {class name}.` For each example in the training set, a true and a randomly-selected false label hypothesis are fed to the model which must predict which labels are valid and which are false.
66
+
67
+ Since this method studies the ability to classify unseen labels after being trained on a different set of labels, the model is only trained on 5 out of the 10 labels in Yahoo Answers. These are "Society & Culture", "Health", "Computers & Internet", "Business & Finance", and "Family & Relationships".
68
+
69
+ ## Evaluation Results
70
+
71
+ This model was evaluated with the label-weighted F1 of the _seen_ and _unseen_ labels. That is, for each example the model must predict from one of the 10 corpus labels. The F1 is reported for the labels seen during training as well as the labels unseen during training. We found an F1 score of `.68` and `.72` for the unseen and seen labels, respectively. In order to adjust for the in-vs-out of distribution labels, we subtract a fixed amount of 30% from the normalized probabilities of the _seen_ labels, as described in [Yin et al. 2019](https://arxiv.org/abs/1909.00161) and [our blog post](https://joeddav.github.io/blog/2020/05/29/ZSL.html).