Hello, I’ve spent days trying to figure out an issue I’ve been having with GPT-J, while using the example code.
from transformers import GPTJForCausalLM, AutoTokenizer
import torch
model = GPTJForCausalLM.from_pretrained(
"EleutherAI/gpt-j-6B", revision="float16", torch_dtype=torch.float16, low_cpu_mem_usage=True
)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
prompt = (
"In a shocking finding, scientists discovered a herd of unicorns living in a remote, "
"previously unexplored valley, in the Andes Mountains. Even more surprising to the "
"researchers was the fact that the unicorns spoke perfect English."
)
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
gen_tokens = model.generate(
input_ids,
do_sample=True,
temperature=0.9,
max_length=50,
)
gen_text = tokenizer.batch_decode(gen_tokens)[0]
error:
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
this one error has tormented me for so long. I have a GTX 1060 gpu with cuda 11.7 installed. I’ve looked everywhere and yet to find a fix for it, and I can’t simply run it on my cpu since it ends up running out of memory.
Thanks