RuntimeError: The size of tensor a (16) must match the size of tensor b (13008) at non-singleton dimension 3

#3
by Tylersuard - opened

Sorry to bother you again! I am sure you are very busy. I am having trouble running your code. It looks like there is a mismatch in tensor size between the key/query and the additive causal mask. My chunk size is 16, and my max_positions is 13008, a multiple of 16. What am I doing wrong here?

/usr/local/lib/python3.10/dist-packages/transformers/models/mega/modeling_mega.py:1098 in forwardβ”‚
β”‚ 1095 β”‚ β”‚ if padding_mask is not None and padding_mask.dim() == 0: β”‚
β”‚ 1096 β”‚ β”‚ β”‚ padding_mask = None β”‚
β”‚ 1097 β”‚ β”‚ β”‚
β”‚ ❱ 1098 β”‚ β”‚ attn_weights = self.attention_function(query, key, padding_mask=padding_mask, ca β”‚
β”‚ 1099 β”‚ β”‚ β”‚
β”‚ 1100 β”‚ β”‚ value = self.hidden_dropout(value, batch_first=True) β”‚
β”‚ 1101 β”‚ β”‚ kernel = self.attention_dropout(attn_weights) β”‚
β”‚ β”‚
β”‚ /usr/local/lib/python3.10/dist-packages/transformers/models/mega/modeling_mega.py:901 in β”‚
β”‚ softmax_attention β”‚
β”‚ β”‚
β”‚ 898 β”‚ β”‚ if causal_mask is not None: β”‚
β”‚ 899 β”‚ β”‚ β”‚ additive_causal_mask = torch.zeros_like(causal_mask, dtype=qk.dtype) β”‚
β”‚ 900 β”‚ β”‚ β”‚ additive_causal_mask = additive_causal_mask.masked_fill((1 - causal_mask).bo β”‚
β”‚ ❱ 901 β”‚ β”‚ β”‚ qk = qk + additive_causal_mask β”‚
β”‚ 902 β”‚ β”‚ β”‚
β”‚ 903 β”‚ β”‚ if padding_mask is not None: β”‚
β”‚ 904 β”‚ β”‚ β”‚ # 1 for tokens which are not masked β”‚
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: The size of tensor a (16) must match the size of tensor b (13008) at non-singleton dimension 3

Hi @Tylersuard . As I mentioned in your other post, I'm not sure what is causing this. This would be a good GitHub issue so the Hugging Face folks can look into it.

mnaylor changed discussion status to closed

Sign up or log in to comment