Unable to deploy on Sagemaker ml.g5.4xlarge
#127
by
arisin
- opened
I am facing challenge deploying this to a Sagemaker Endpoint.
Please advise.
It would be good to know the specs for the infra, framework version, DLC compatible, etc.
Here is my code
from sagemaker.huggingface import HuggingFaceModel
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'stabilityai/stable-diffusion-3-medium', # model_id from hf.co/models
'HF_TASK':'document-question-answering' # NLP task you want to use for predictions
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
env=hub,
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.26", # transformers version used
pytorch_version="1.13", # pytorch version used
py_version="py39", # python version of the DLC
)
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.4xlarge"
)
I get the following error:
File /opt/conda/lib/python3.10/site-packages/sagemaker/session.py:5354, in Session.wait_for_endpoint(self, endpoint, poll, live_logging)
5348 if "CapacityError" in str(reason):
5349 raise exceptions.CapacityError(
5350 message=message,
5351 allowed_statuses=["InService"],
5352 actual_status=status,
5353 )
-> 5354 raise exceptions.UnexpectedStatusException(
5355 message=message,
5356 allowed_statuses=["InService"],
5357 actual_status=status,
5358 )
5359 return desc
UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-inference-2024-06-18-01-45-44-422: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.. Try changing the instance type or reference the troubleshooting page https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-troubleshooting.html
arisin
changed discussion status to
closed
I had to pass the HF_TOKEN parameter with my API token in the hub configuration to get past the error as I am using a model that requires authn.