mnaylor/mega-base-wikitext · RuntimeError: The size of tensor a (16) must match the size of tensor b (13008) at non-singleton dimension 3

May 6, 2023

Sorry to bother you again! I am sure you are very busy. I am having trouble running your code. It looks like there is a mismatch in tensor size between the key/query and the additive causal mask. My chunk size is 16, and my max_positions is 13008, a multiple of 16. What am I doing wrong here?

/usr/local/lib/python3.10/dist-packages/transformers/models/mega/modeling_mega.py:1098 in forward│
│ 1095 │ │ if padding_mask is not None and padding_mask.dim() == 0: │
│ 1096 │ │ │ padding_mask = None │
│ 1097 │ │ │
│ ❱ 1098 │ │ attn_weights = self.attention_function(query, key, padding_mask=padding_mask, ca │
│ 1099 │ │ │
│ 1100 │ │ value = self.hidden_dropout(value, batch_first=True) │
│ 1101 │ │ kernel = self.attention_dropout(attn_weights) │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/mega/modeling_mega.py:901 in │
│ softmax_attention │
│ │
│ 898 │ │ if causal_mask is not None: │
│ 899 │ │ │ additive_causal_mask = torch.zeros_like(causal_mask, dtype=qk.dtype) │
│ 900 │ │ │ additive_causal_mask = additive_causal_mask.masked_fill((1 - causal_mask).bo │
│ ❱ 901 │ │ │ qk = qk + additive_causal_mask │
│ 902 │ │ │
│ 903 │ │ if padding_mask is not None: │
│ 904 │ │ │ # 1 for tokens which are not masked │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: The size of tensor a (16) must match the size of tensor b (13008) at non-singleton dimension 3

mnaylor

Owner May 11, 2023

Hi @Tylersuard . As I mentioned in your other post, I'm not sure what is causing this. This would be a good GitHub issue so the Hugging Face folks can look into it.

mnaylor changed discussion status to closed Nov 30, 2023