How do I run this on cpu?

by ARMcPro - opened Mar 4, 2024

Discussion

ARMcPro

Mar 4, 2024

I am a bit new to this, any help appreciated; thank you!

mobicham

Mobius Labs GmbH org Mar 4, 2024

Hi! Can you try this, but might be a bit slow:

#Load model
import transformers, torch
compute_dtype = torch.float32
cache_path    = ''
device        = 'cpu'
model_id      = "mobiuslabsgmbh/aanaphi2-v0.1"
model         = transformers.AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, 
                                                                  cache_dir=cache_path,
                                                                  device_map=device)
tokenizer     = transformers.AutoTokenizer.from_pretrained(model_id, cache_dir=cache_path)

#Set Prompt format
instruction_template = "### Human: "
response_template    = "### Assistant: "
def prompt_format(prompt):
    out = instruction_template + prompt + '\n' + response_template
    return out
model.eval();


def generate(prompt, max_length=1024):
    prompt_chat = prompt_format(prompt)
    inputs      = tokenizer(prompt_chat, return_tensors="pt", return_attention_mask=True).to(device)
    outputs     = model.generate(**inputs, max_length=max_length, eos_token_id= tokenizer.eos_token_id) 
    text        = tokenizer.batch_decode(outputs[:,:-1])[0]
    return text

#Generate
print(generate('If A+B=C and B=C, what would be the value of A?'))

ARMcPro

Mar 4, 2024

Hi! Can you try this, but might be a bit slow:

#Load model
import transformers, torch
compute_dtype = torch.float32
cache_path    = ''
device        = 'cpu'
model_id      = "mobiuslabsgmbh/aanaphi2-v0.1"
model         = transformers.AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, 
                                                                  cache_dir=cache_path,
                                                                  device_map=device)
tokenizer     = transformers.AutoTokenizer.from_pretrained(model_id, cache_dir=cache_path)

#Set Prompt format
instruction_template = "### Human: "
response_template    = "### Assistant: "
def prompt_format(prompt):
    out = instruction_template + prompt + '\n' + response_template
    return out
model.eval();


def generate(prompt, max_length=1024):
    prompt_chat = prompt_format(prompt)
    inputs      = tokenizer(prompt_chat, return_tensors="pt", return_attention_mask=True).to(device)
    outputs     = model.generate(**inputs, max_length=max_length, eos_token_id= tokenizer.eos_token_id) 
    text        = tokenizer.batch_decode(outputs[:,:-1])[0]
    return text

#Generate
print(generate('If A+B=C and B=C, what would be the value of A?'))

Thank you for your answer! I tried this, but I get a Killed signal. I checked my cpu and memory states while it was executing and none of them were close to being full. I wonder what might be the issue!

mobicham

Mobius Labs GmbH org Mar 4, 2024

•

edited Mar 4, 2024

Strange, it shouldn't use too much RAM to get killed!

You can also just run it on Google colab with the free GPU, I have just tried there and it works fine. You'll need to install the following before you run the code:
pip install transformers, accelerate

ARMcPro

Mar 4, 2024

Strange, it shouldn't use too much RAM to get killed!

You can also just run it on Google colab with the free GPU, I have just tried there and it works fine. You'll need to install the following before you run the code:
pip install transformers, accelerate

Thank you so much for all your help, I really appreciate it!

mobicham

Mobius Labs GmbH org Mar 5, 2024

Happy to help!

mobicham changed discussion status to closed Mar 5, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment