Abnormally Large Memory Footprint?

by RylanSchaeffer - opened Aug 21, 2024

Aug 21, 2024

I'm loading the model in torch_dtype=torch.float16, but I'm finding that the memory footprint is 2-4x larger than comparable 7B and 8B language models. I also noticed that the return type is float32. Is something converting the outputs into float32 and maybe causing the model to run in float32?

RylanSchaeffer

Aug 21, 2024

I found the problem: "padding": 'max_length', . The other 7B and 8B models were padded to the longest in the batch, not the tokenizer's max length.

Ray2333

Owner Aug 22, 2024

Is your problem solved?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment