Unable to Deploy to Amazon SageMaker using Supplied Deploy Code

#48
by garystafford - opened

Probably a simple mistake with this gated model, when deploying the model to Amazon SageMaker, using the supplied code in the Deploy tab, I am getting the following error. I have accepted the terms of the model in the UI and can use it in the playground.

huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-647cad52-09f2293b191bde6a363fbafd)
Repository Not Found for url: https://huggingface.co/api/models/bigcode/starcoder.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

I have no experience with this error but in general it might be useful to see the full python traceback which will show the line on which it failed.

Got the same problem, even if i try to login with the cli

maybe try this page first?
I'm also struggling with same issue.

maybe try this page first?
I'm also struggling with same issue.

@dzh1121 which page?

Same here....

Same here, any updates?

Friends, at long last I figured what on earth you need to do to auth yourself. For some reason methods like notebook_login are bugged out.

You actually need to supply the HF token into the env or hub. Something like this:

hub = {
    'HF_MODEL_ID':'bigcode/starcoder',
    'SM_NUM_GPUS': json.dumps(1),
    'HF_API_TOKEN': "<YOUR HF TOKEN>",
    'HUGGING_FACE_HUB_TOKEN': "<YOUR HF TOKEN>"
}

After that it worked for me. Finally I do not know if you need both entries, I'm keeping them. Experiment on your own.

Thanks, @grohj ! Turns out you actually only need the HUGGING_FACE_HUB_TOKEN.

I passed the HUGGING_FACE_HUB_TOKEN issue.
But it seems Sagemaker expects one bin file "model.pth" but this repo has many bin files like "pytorch_model-00003-of-00007.bin"
I don't think I can simply concat those bin files.
Any one has encountered this issue?

I passed the HUGGING_FACE_HUB_TOKEN issue.
But it seems Sagemaker expects one bin file "model.pth" but this repo has many bin files like "pytorch_model-00003-of-00007.bin"
I don't think I can simply concat those bin files.
Any one has encountered this issue?

From what I gathered during deployments. If you are using the Huggingface LLM DLC container. It should be able to convert the model from split bin format into pytorch / safetensors format on the fly.

Thanks @grohj , had been stuck on this day and really couldn't find any of this documentation anywhere.

I am deploying llama 2, and supplying HUGGING_FACE_HUB_TOKEN resolved the issue.

It was really misleading trying to debug this as my stack trace:

2023-07-27T12:43:39.691+10:00	requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/meta-llama/Llama-2-70b-chat-hf
2023-07-27T12:43:39.691+10:00	The above exception was the direct cause of the following exception:
2023-07-27T12:43:39.691+10:00	Traceback (most recent call last): File "/opt/conda/bin/text-generation-server", line 8, in <module> sys.exit(app()) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 96, in download_weights utils.weight_files(model_id, revision, extension) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 92, in weight_files filenames = weight_hub_files(model_id, revision, extension) File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 25, in weight_hub_files info = api.model_info(model_id, revision=revision) File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn return fn(*args, **kwargs) File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 1604, in model_info hf_raise_for_status(r) File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 291, in hf_raise_for_status raise RepositoryNotFoundError(message, response) from e
2023-07-27T12:43:39.691+10:00	huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64c1d9da-52b66c1e1538aa5917cf0e33)
2023-07-27T12:43:39.691+10:00	Repository Not Found for url: https://huggingface.co/api/models/meta-llama/Llama-2-70b-chat-hf.
2023-07-27T12:43:39.691+10:00	Please make sure you specified the correct `repo_id` and `repo_type`.
2023-07-27T12:43:39.691+10:00	If you are trying to access a private or gated repo, make sure you are authenticated.
2023-07-27T12:43:41.446+10:00	Invalid username or password.

Since the url
https://huggingface.co/api/models/meta-llama/Llama-2-70b-chat-hf
requires basic auth, naturally I was looking for a config in the sdk where you can supply basic auth, so the url would become
https://{username}:{password}@huggingface.co/api/models/meta-llama/Llama-2-70b-chat-hf.
But that configuration does not exist anywhere.

Hope the update the sagemaker script generation for gated models as it is will be a big time saver for many people.

Sign up or log in to comment