How do I run this on cpu?

#3
by ARMcPro - opened

I am a bit new to this, any help appreciated; thank you!

Mobius Labs GmbH org

Hi! Can you try this, but might be a bit slow:

#Load model
import transformers, torch
compute_dtype = torch.float32
cache_path    = ''
device        = 'cpu'
model_id      = "mobiuslabsgmbh/aanaphi2-v0.1"
model         = transformers.AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, 
                                                                  cache_dir=cache_path,
                                                                  device_map=device)
tokenizer     = transformers.AutoTokenizer.from_pretrained(model_id, cache_dir=cache_path)

#Set Prompt format
instruction_template = "### Human: "
response_template    = "### Assistant: "
def prompt_format(prompt):
    out = instruction_template + prompt + '\n' + response_template
    return out
model.eval();


def generate(prompt, max_length=1024):
    prompt_chat = prompt_format(prompt)
    inputs      = tokenizer(prompt_chat, return_tensors="pt", return_attention_mask=True).to(device)
    outputs     = model.generate(**inputs, max_length=max_length, eos_token_id= tokenizer.eos_token_id) 
    text        = tokenizer.batch_decode(outputs[:,:-1])[0]
    return text

#Generate
print(generate('If A+B=C and B=C, what would be the value of A?'))

Hi! Can you try this, but might be a bit slow:

#Load model
import transformers, torch
compute_dtype = torch.float32
cache_path    = ''
device        = 'cpu'
model_id      = "mobiuslabsgmbh/aanaphi2-v0.1"
model         = transformers.AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, 
                                                                  cache_dir=cache_path,
                                                                  device_map=device)
tokenizer     = transformers.AutoTokenizer.from_pretrained(model_id, cache_dir=cache_path)

#Set Prompt format
instruction_template = "### Human: "
response_template    = "### Assistant: "
def prompt_format(prompt):
    out = instruction_template + prompt + '\n' + response_template
    return out
model.eval();


def generate(prompt, max_length=1024):
    prompt_chat = prompt_format(prompt)
    inputs      = tokenizer(prompt_chat, return_tensors="pt", return_attention_mask=True).to(device)
    outputs     = model.generate(**inputs, max_length=max_length, eos_token_id= tokenizer.eos_token_id) 
    text        = tokenizer.batch_decode(outputs[:,:-1])[0]
    return text

#Generate
print(generate('If A+B=C and B=C, what would be the value of A?'))

Thank you for your answer! I tried this, but I get a Killed signal. I checked my cpu and memory states while it was executing and none of them were close to being full. I wonder what might be the issue!

Mobius Labs GmbH org
edited Mar 4, 2024

Strange, it shouldn't use too much RAM to get killed!

You can also just run it on Google colab with the free GPU, I have just tried there and it works fine. You'll need to install the following before you run the code:
pip install transformers, accelerate

Strange, it shouldn't use too much RAM to get killed!

You can also just run it on Google colab with the free GPU, I have just tried there and it works fine. You'll need to install the following before you run the code:
pip install transformers, accelerate

Thank you so much for all your help, I really appreciate it!

Mobius Labs GmbH org

Happy to help!

mobicham changed discussion status to closed

Sign up or log in to comment