TL;DR
The fine-tuned BertModel on NER task behaves differently at each load of the .bin
file.
Example:
When the model finishes training:
- Input
I am John and I work at Hugging-Face
- output
[(I, O), (am, O), (John, PER), (and, O), (I, O), (work, O), (at, O), (Hugging-Face, ORG)]
After stopping the notebook session and loading the model:
- Input
I am John and I work at Hugging-Face
- output
[(I, PER), (am, PER), (John, PER), (and, PER), (I, ORG), (work, PER), (at, O), (Hugging-Face, PER)]
Environment:
- Colab Pro +
- Transformers == 4.23.1
- Torch == 1.12.1
Description
I am currently facing an issue with my NER model based on BertModel
from the Transformers library and inspired from the BertForTokenClassification
code base.
Indeed, the issue is the following, after training and evaluating my model I end up with a well-performing model with a validation accuracy greater than 96%. The problem is that when I save the model and load it for inference it gives different results, yes it gives different predictions (bad) each time it is loaded. It should be noted that when the model has finished training the predictions are good, but when I stop the notebook session and start another one and then load my best model saved, it behaves differently.
Model Architecture:
class NerBertModel(nn.Module):
def __init__(self, id2label, label2id, num_labels):
super(PhenoBertModel, self).__init__()
self.id2label = id2label
self.label2id = label2id
self.num_labels = num_labels
self.bert = Config.MODEL
classifier_dropout = (
Config.CONFIG.classifier_dropout if Config.CONFIG.classifier_dropout is not None else Config.CONFIG.hidden_dropout_prob
)
self.dropout = nn.Dropout(classifier_dropout)
self.classifier = nn.Linear(Config.CONFIG.hidden_size, num_labels)
def forward(self,
input_ids: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.Tensor] = None,
token_type_ids: Optional[torch.Tensor] = None,
labels: Optional[torch.Tensor] = None):
outputs = self.bert(input_ids, attention_mask)
sequence_output = outputs[0]
sequence_output = self.dropout(sequence_output)
logits = self.classifier(sequence_output)
loss = None
if labels is not None:
loss_fct = nn.CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
return loss, logits
Saved the model using:
torch.save(model.state_dict(), Config.MODEL_PATH)
Loaded the model using:
model = NerBertModel(id2label, label2id, num_labels=len(id2label))
model.load_state_dict(
torch.load(
Config.MODEL_PATH, # model.bin file
map_location=torch.device(Config.DEVICE)
)
model.to(Config.DEVICE)
model.eval()
The same problem occurs also when using the standard NER model BertForTokenClassfication
from the Transformers library directly while saving it and loading as follows:
# Save best model
mode.save_pretrained("model_path")
# Load the model
BertForTokenClassfication.from_pretrained("model_path").
The seed function I am using
def seed_torch(seed=42):
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
seed_torch()