Abnormally Large Memory Footprint?

#2
by RylanSchaeffer - opened

I'm loading the model in torch_dtype=torch.float16, but I'm finding that the memory footprint is 2-4x larger than comparable 7B and 8B language models. I also noticed that the return type is float32. Is something converting the outputs into float32 and maybe causing the model to run in float32?

I found the problem: "padding": 'max_length', . The other 7B and 8B models were padded to the longest in the batch, not the tokenizer's max length.

Is your problem solved?

Sign up or log in to comment