Hello.
Working with huggingface transformers for Masked Language Task I have expected that prediction will return the same input string plus tokens for masked ones:
from transformers import BertConfig, BertTokenizer, BertForMaskedLM
model1 = BertForMaskedLM.from_pretrained("bert-base-uncased")
tokenizer1 = BertTokenizer.from_pretrained("bert-base-uncased")
# Read the rest of this [MASK] to understand things in more detail
text = ["Read the rest of this [MASK] to understand things in more detail"]
encoding1 = tokenizer1(text, return_tensors="pt")
# forward pass
outputs1 = model1(**encoding1)
outputs1.logits.argmax(-1)
The output is :
tensor([[1012, 3191, 1996, 2717, 1997, 2023, 2338, 2000, 3305, 2477, 1999, 2062,
1012, 1012]])
But when I have decoded the output I did not find last input token detail :
tokenizer1.convert_ids_to_tokens([1012, 3191, 1996, 2717, 1997, 2023, 2338, 2000, 3305, 2477, 1999, 2062, 1012, 1012])
['.',
'read',
'the',
'rest',
'of',
'this',
'book',
'to',
'understand',
'things',
'in',
'more',
'.',
'.']
Maybe I’m using it incorrectly model? Any other reason?