So I want to put this code in my inference.py code, without using the pipeline. But I don’t know how to write this inference. Can someone help me? Thanks!
Which version are you using on SageMaker? and which version are you using locally? For transformers and pytorch.
Do you use a GPU on your local machine as well?
@philschmid There is a difference in latency because I have two different GPUs in local / remote mode, but it is not significant (it is very low). Using the same versions I have differences using the pipeline () method. I have also a difference in using a .pt or .bin model. I have now switched to the .bin model because I have fewer errors than .pt.
@GenV I might have misunderstood your question. Sorry
Since you have no switch to the pytorch_model.bin you should be able to deploy without the need to create a inference.py and just provide the env variables when creating the endpoint similar to the snippet below.
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'t5-base',
'HF_TASK':'text2text-generation'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.12',
pytorch_version='1.9',
py_version='py38',
env=hub,
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.g4dn.xlarge' # ec2 instance type
)
predictor.predict({
'inputs': "Меня зовут Вольфганг и я живу в Берлине"
})
@philschmid thank you for the answer. I’m using this code, but with my own model. So my code is:
role = sagemaker.get_execution_role()
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
model_data=model_data_url, # path to your trained sagemaker model
role=role, # iam role with permissions to create an Endpoint
transformers_version=transformers_version, # "4.12.3"
pytorch_version=pytorch_version, #"1.9.1"
py_version=py_version # "py38"
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.p3.2xlarge"
)
My question is how to create my own inference.py, (or how to implement the model_fn and transform_fn methods, because I don’t want to use the pipeline() method but my implementation)