Loading ./pipeline/ requires you to execute the configuration file in that repo on your local machine

#21
by kycrowe - opened

Hello, I want to avoid re-downaloding the model every time for the code below

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-7b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)

so I saved the pipeline by doing

pipeline.save_pretrained("./pipeline_path/")

However, I am unable to simply reload the pipeline with

pipe_load = transformers.pipeline("text-generation", model = "./pipeline_path/")

Getting

ValueError: Loading ./pipeline_path/ requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

I added trust_remote_code=True in to avoid this error, but my jupyter kernel dies.

My questions are:

  1. Any idea how to resolve this? Should I just not run this stuff in a jupyter notebook?
  2. I'm also not quite understand execute the configuration file in that repo on your local machine.
  3. Is there any other better ways to avoid re-downaloding the model?

Any help would be greatly appreciated!

ChatGPT suggest this:

To save and load models using the Hugging Face's Transformers library, you might want to save both the model and the tokenizer, not the pipeline, as they are the primary components. Here's how to do it:

To save:

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model)

tokenizer.save_pretrained("./model_path/")
model.save_pretrained("./model_path/")

To load:

tokenizer = AutoTokenizer.from_pretrained("./model_path/")
model = AutoModelForCausalLM.from_pretrained("./model_path/")

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

In this way, you don't need to download the model each time you run your script, and it should resolve the issues you are encountering with the trust_remote_code=True setting.

#designfailure

Thank you @designfailure
I tried loading model and tokenizer separately like below

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model)

But getting the require execute the configuration file again on the line of loading model:

ValueError: Loading tiiuae/falcon-7b-instruct requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

https://huggingface.co/tiiuae/falcon-7b-instruct/discussions/10 flags the same issue when running on AWS Sagemaker.

Thank you @designfailure
I tried loading model and tokenizer separately like below

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model)

But getting the require execute the configuration file again on the line of loading model:

ValueError: Loading tiiuae/falcon-7b-instruct requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

What was the fix in the end?

I am encountering the same problem. What was the fix plz?

hello ! same problem too. Someone have a fix ?Thanks

Thank you @designfailure
I tried loading model and tokenizer separately like below

tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model)

But getting the require execute the configuration file again on the line of loading model:

ValueError: Loading tiiuae/falcon-7b-instruct requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

hello ! same problem too.What was the fix in the end? Thanks

I had the same issue, resolved it with updating to transformers==4.34.0

Transformers 4.39.3 and I still have the same problem :(

Sign up or log in to comment