Unable to deploy on Sagemaker ml.g5.4xlarge

#127
by arisin - opened

I am facing challenge deploying this to a Sagemaker Endpoint.
Please advise.
It would be good to know the specs for the infra, framework version, DLC compatible, etc.

Here is my code

from sagemaker.huggingface import HuggingFaceModel

# Hub Model configuration. https://huggingface.co/models
hub = {
  'HF_MODEL_ID':'stabilityai/stable-diffusion-3-medium', # model_id from hf.co/models
  'HF_TASK':'document-question-answering' # NLP task you want to use for predictions
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.26", # transformers version used
   pytorch_version="1.13", # pytorch version used
   py_version="py39", # python version of the DLC
)

predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.g5.4xlarge"
)

I get the following error:

File /opt/conda/lib/python3.10/site-packages/sagemaker/session.py:5354, in Session.wait_for_endpoint(self, endpoint, poll, live_logging)
   5348     if "CapacityError" in str(reason):
   5349         raise exceptions.CapacityError(
   5350             message=message,
   5351             allowed_statuses=["InService"],
   5352             actual_status=status,
   5353         )
-> 5354     raise exceptions.UnexpectedStatusException(
   5355         message=message,
   5356         allowed_statuses=["InService"],
   5357         actual_status=status,
   5358     )
   5359 return desc

UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-inference-2024-06-18-01-45-44-422: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.. Try changing the instance type or reference the troubleshooting page https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-troubleshooting.html
arisin changed discussion status to closed

I had to pass the HF_TOKEN parameter with my API token in the hub configuration to get past the error as I am using a model that requires authn.

Sign up or log in to comment