Hey,
I’m trying to deploy a Hugging Face Model (GPT-Neo) on the SageMaker endpoint. I followed the official example and this forum. but it seems that the generate function is totally ignoring my parameters (it generates just one word despite setting the min length to 10000!) Any idea what is wrong?
My code:
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'EleutherAI/gpt-neo-1.3B',
'HF_TASK':'text-generation'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.6.1',
pytorch_version='1.7.1',
py_version='py36',
env=hub,
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.g4dn.xlarge' # ec2 instance type
)
prompt = "Some prompt"
gen_tex = predictor.predict({
"inputs": prompt,
"parameters" : {"min_length":10,}
})
print(gen_tex[0]['generated_text'])
Updated to the latest version but it still ignore the parameter:
predictor.predict({
'inputs': "Can you please let us know more details about your",
'parameters': {"min_length":1000}
})
output:
[{'generated_text': 'Can you please let us know more details about your account?\n\nHello,\nI am interested in the above-mentioned company and I have read some very interesting articles about it. I am interested in starting the work. Please let me know if'}]```
Hi Ali, what happens if you set the min_length and the max_length parameters explicitly? I’m asking because I believe the text generator uses the max_length parameter from the model configuration if you don’t set it explicitly. And if the max_length parameter in the config file is smaller than you min_length the output would be truncated. So, just wondering if setting both (e.g. min_length=1000 and max_length=2000) helps?
Awesome, glad it worked. It’d be great if you could mark this thread as Answered/Solved - it would make it easier and quicker in the future for other users with the same problem to find the solution