Serverless deploy troubles

Hi, I’m trying to deploy a serverless endpoint from model_data. Trying to do it in the same manner I deployed a similar model to an EC2 instance, but it seems to fail.

I do
huggingface_model = HuggingFaceModel(**model_params)
where
model_params = {‘role’: <exec_role>, ‘transformers_version’: ‘4.6’, ‘sagemaker_session’: <sagemaker.session.Session object at 0x158528e50>, ‘pytorch_version’: ‘1.7’, ‘py_version’: ‘py36’, ‘model_data’: <path_to_S3>}

then
serverless_config = ServerlessInferenceConfig(
memory_size_in_mb=memory_size_in_mb, max_concurrency=max_concurrency
)
huggingface_model.deploy(
serverless_inference_config=serverless_config, endpoint_name=model_name, wait=wait)

All seems to deploy well, then when I run, i’m getting:
“(“You need to define one of the following [\u0027feature-extraction\u0027, \u0027text-classification\u0027, \u0027token-classification\u0027, \u0027question-answering\u0027, \u0027table-question-answering\u0027, \u0027fill-mask\u0027, \u0027summarization\u0027, \u0027translation\u0027, \u0027text2text-generation\u0027, \u0027text-generation\u0027, \u0027zero-shot-classification\u0027, \u0027conversational\u0027, \u0027image-classification\u0027] as env \u0027TASK\u0027.”, 403)”

I tried adding env={“HF_TASK”: “feature-extraction”} to the model creation, but i then get an error (which makes sense, since i’m not really specifying a model from the hub)
“Can\u0027t load config for \u0027/.sagemaker/mms/models/model\u0027. Make sure that:\n\n- \u0027/.sagemaker/mms/models/model\u0027 is a correct model identifier listed on \u0027https://huggingface.co/models\u0027\n\n- or \u0027/.sagemaker/mms/models/model\u0027 is the correct path to a directory containing a config.json file\n\n”
}

Anyone has some idea that can help?

Thank you,
Alex

Hi Alex

The way you instantiate the HuggingFaceModel class looks a bit unusual to me. I usually go about it this way:

huggingface_model = HuggingFaceModel(
   model_data="s3://hf-sagemaker-inference/model.tar.gz",  # path to your trained sagemaker model
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.17", # transformers version used
   pytorch_version="1.10", # pytorch version used
   py_version="py38", # python version of the DLC
)

In terms of serverless deployment, you seem to do everything right, as far as I can tell. Just make sure you use the latest DLC (i.e. specify the latest supported Transformers and Pytorch versions, 4.17 and 1.10 respectively, in this case). You can find the latest versions here: Reference

You can also check out these two sample notebooks and mix and match to fit your use case:

Hope that helps!

Cheers
Heiko

Thank you Heiko for the response and references. It’s really helpful. I tried the examples you shared, and what actually worked was both updating the library versions, and specifying the TASK in environment:

huggingface_model = HuggingFaceModel(
   model_data="s3://smbdata-development/models/MiniLM-L6-H384-uncased/model.tar.gz",  # path to your trained sagemaker model
   role=get_role(), # iam role with permissions to create an Endpoint
   sagemaker_session=session,
   transformers_version="4.17.0", # transformers version used
   pytorch_version="1.10.2", # pytorch version used
   env={"HF_TASK": "feature-extraction"},
   py_version="py38" # python version of the DLC
)

However, when I try doing the same for another model, where i’ve overridden the some functions, it doesn’t work:

huggingface_model = HuggingFaceModel(
   model_data="s3://smbdata-development/models/all-MiniLM-L6-v2/model.tar.gz",  # path to your trained sagemaker model
   role=get_role(), # iam role with permissions to create an Endpoint
   sagemaker_session=session,
   transformers_version="4.17.0", # transformers version used
   pytorch_version="1.10.2", # pytorch version used
   env={"HF_TASK": "feature-extraction"},
   py_version="py38" # python version of the DLC
)

gives me :message": “Can\u0027t load config for \u0027/.sagemaker/mms/models/model\u0027. If you were trying to load it from \u0027https://huggingface.co/models\u0027, make sure you don\u0027t have a local directory with the same name. Otherwise, make sure \u0027/.sagemaker/mms/models/model\u0027 is the correct path to a directory containing a config.json file”
}

Hey @AlexG,

You most likely have some small issue in your model.tar.gz. You can follow this example on how to create a custom inference.py: notebooks/sagemaker-notebook.ipynb at main · huggingface/notebooks · GitHub

@philschmid Thanks! I actually copied the exact implementation from your notebook for this test. Then placed it under “code” directory as you show…

Got suspicious following @philschmid 's comment, so I ran his deployment code as is, and got the same error. Then changed the model to msmarco-distilbert-dot-v5 and used the same inference.py file to override the functions.
All deployed now and no errors. Possibly some issue with MiniLM configuraiton.

Thanks for the help everyone! The references were extremely helpful