RuntimeError: The size of tensor a (16) must match the size of tensor b (13008) at non-singleton dimension 3
Sorry to bother you again! I am sure you are very busy. I am having trouble running your code. It looks like there is a mismatch in tensor size between the key/query and the additive causal mask. My chunk size is 16, and my max_positions is 13008, a multiple of 16. What am I doing wrong here?
/usr/local/lib/python3.10/dist-packages/transformers/models/mega/modeling_mega.py:1098 in forwardβ
β 1095 β β if padding_mask is not None and padding_mask.dim() == 0: β
β 1096 β β β padding_mask = None β
β 1097 β β β
β β± 1098 β β attn_weights = self.attention_function(query, key, padding_mask=padding_mask, ca β
β 1099 β β β
β 1100 β β value = self.hidden_dropout(value, batch_first=True) β
β 1101 β β kernel = self.attention_dropout(attn_weights) β
β β
β /usr/local/lib/python3.10/dist-packages/transformers/models/mega/modeling_mega.py:901 in β
β softmax_attention β
β β
β 898 β β if causal_mask is not None: β
β 899 β β β additive_causal_mask = torch.zeros_like(causal_mask, dtype=qk.dtype) β
β 900 β β β additive_causal_mask = additive_causal_mask.masked_fill((1 - causal_mask).bo β
β β± 901 β β β qk = qk + additive_causal_mask β
β 902 β β β
β 903 β β if padding_mask is not None: β
β 904 β β β # 1 for tokens which are not masked β
β°βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ―
RuntimeError: The size of tensor a (16) must match the size of tensor b (13008) at non-singleton dimension 3
Hi @Tylersuard . As I mentioned in your other post, I'm not sure what is causing this. This would be a good GitHub issue so the Hugging Face folks can look into it.