Feed output from one transformer model as input to another

cameronstronge · July 28, 2021, 11:51am

So I am trying to train an automated essay scoring system, that combines the loss of predicting scores with predicting whether a sentence is grammatically correct. To do this I have split each sentence in the essay with a sep and a cls token so that an essay is fed into Bert like this:

essay 1 : [cls] … sent 1 … [sep][cls] … sent 2 … [sep][cls] … sent 3 … [sep][cls] … etc
essay 2 : [cls] … sent 1 … [sep][cls] … sent 2 … [sep][cls] … sent 3 … [sep][cls] … etc

As well as a list of labels for each sentence whether it contains a grammatical error or not and a score for the essay i.e

essay 1 : labels: [1,0,0,1,etc…],score:38
essay 2 : labels: [1,1,0,1,etc…],score:24

(So that the list of labels can be passed into a dataset they must have the same length as the input_ids, attention_masks etc. so I have padded them with 2’s at each word that is not a cls token, so they actually look something like this…

[1,2,2,0,2,2,2,0,2,2,2,2,1,etc…]
which would correspond to a sent
[cls,w,w,cls,w,w,w,cls,w,w,w,w,cls…]

(w=word)

I then use this to get the index of the cls tokens in each essay.

So the performance of the model on the grammatical error detection is comparable to feeding in a sentence normally to bert:

sent 1 : [cls] … sent 1 … [sep]
sent2 : [cls] … sent 2 … [sep]

However, unsurprisingly the additional sep and cls tokens decrease the model performance when compared to feeding an essay into Bert normally:

essay 1 : [cls] … essay 1 … [sep]
essay 2 : [cls] … essay 2 … [sep]

So to combat this I am trying to use the vector representation of each cls token in the output of each essay as input to another smaller transformer model as done in this paper here https://arxiv.org/pdf/1903.10318.pdf). However I cannot figure out how to do this, I have tried feeding the output into a embedding layer with input_embeds = True but this does not work.

My is a simplified version of my code so far with just a mini batch I have used as a test:

# encoded_dataset_train is my training dataset type = (datasets.arrow_dataset.Dataset)
mini_batch = encoded_dataset_train[:2]
print(mini_batch)
'''{'attention_mask': tensor([[1, 1, 1,  ..., 0, 0, 0],
         [1, 1, 1,  ..., 0, 0, 0]]),
 'input_ids': tensor([[    0, 23314,  5348,  ...,     1,     1,     1],
         [    0,  1360,    73,  ...,     1,     1,     1]]),
 'label': tensor([[1, 2, 2,  ..., 2, 2, 2],
         [1, 2, 2,  ..., 2, 2, 2]]),
 'score': tensor([31, 23])}'''

from transformers import AutoModel
model = AutoModel.from_pretrained('distilroberta-base')

# get the output of the final attention layer of the model 
input_ids = mini_batch['input_ids']
attention_mask = mini_batch['attention_mask']
output = model(input_ids=input_ids,attention_mask=attention_mask)

print(output.shape)
'''
torch.Size([2, 512, 768])
'''

# get a mask where each cls token in each batch is 1 and 0 for all other tokens
bs,tok_len = mini_batch['label'].shape

active_labels = torch.where(mini_batch['label']<2,1,0).reshape(bs,tok_len,1)expand(active_labels.shape[0],active_labels.shape[1],768)

print(active_labels.shape)
'''
torch.Size([2, 512, 768])
'''

# multiply the output by the mask
active_loss = output*active_labels

print(active_loss.shape)
'''
torch.Size([2, 512, 768])
'''

# Create my smaller model (so far only extracted the layers, I want have not combined them yet)

embeds = model.embeddings
layers = model.encoder.layer[:2]
classifier = model.classifier

# Result for trying to pass the output into the embedding layer

embeds.forward(input_ids=active_loss,inputs_embeds=True)

'''
RuntimeError: The size of tensor a (512) must match the size of tensor b (768) at non-singleton dimension 2
'''

Cheers in advance

cameronstronge · July 30, 2021, 9:16am

Is this possible or do I have to perform pooling at each sentence vector to get single value to represent that sentence and use that as an input?

Topic		Replies	Views
Multiple texts as inputs to Transformers models 🤗Transformers	9	9495	September 13, 2024
Extracting the output of hidden BERT layers and re-training the BERT model on custom datasets 🤗Transformers	0	785	March 17, 2021
Use two sentences as inputs for sentence classification 🤗Transformers	7	19471	April 21, 2022
Sentence splitting 🤗Tokenizers	7	29642	September 15, 2022
Inconsistency in Model Output [ Token Classification] 🤗Transformers	0	325	April 12, 2023

Feed output from one transformer model as input to another

Related topics