Not working in inference api. Goes in timeout after 120 sec.

#2
by Badilator - opened

import sys
import requests
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
YOUR_API_KEY = user_secrets.get_secret("YOUR_API_KEY")

if YOUR_API_KEY == "":
sys.exit("API key not found in secrets.")

API_URL = "/static-proxy?url=https%3A%2F%2Fapi-inference.huggingface.co%2Fmodels%2FSalesforce%2Fcodegen-16B-mono%3C%2Fa%3E"
headers = {"Authorization": f"Bearer {YOUR_API_KEY}"}

def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()

prompt="""def download_file(url, directory):
"""
This function downloads a file from a URL and saves it to a user-specified directory using the filename from the end of the URL.
Args:
url (str): The URL of the file to be downloaded.
directory (str): The directory where the file should be saved. If the directory does not exist, it will be created.
Raises:
ValueError: If the URL is not valid.
Returns:
None
""""""
pre_prompt="""Q:\n\nComplete the code of the following function:\n\n"""
post_prompt="\n\nA:\n\n"
output = query({
"inputs": pre_prompt+prompt+post_prompt,
"parameters": {"temperature": 0.1,
"repetition_penalty": 1.1,
"max_new_tokens":250,
"max_time":120,
"return_full_text":False,
"num_return_sequences":1,
"do_sample":True,
},
"options": {"use_cache":False,
"wait_for_model":True,
},
})

if type(output) == list:
generated_text = output[0]['generated_text']
else:
sys.exit(output['error'])

stop_seq='\n\n\n'
stop_idx = generated_text.find(stop_seq)
if stop_idx != -1:
generated_text=generated_text[:stop_idx].strip()
else:
generated_text=generated_text.strip()
print(post_prompt+generated_text)

Your code is a bit unformatted but there might be an error when you define prompt: After """def download_file(url, directory): you have an additional """ in the next line which closes the string. Thus the next lines are Python interpreted.

Other than that, I also have a time out: I specified "options": {"wait_for_model": True} in the API request and after some time the function returns but the response.json()[0]['generated_text'] has the following output:

'Error:M o d e l   S a l e s f o r c e / c o d e g e n - 1 6 B - m o n o   t i m e   o u t'

I suppose the model is too large for the inference API, see /static-proxy?url=https%3A%2F%2Fdiscuss.huggingface.co%2Ft%2Fcannot-run-large-models-using-api-token%2F31844%2F2%3C%2Fa%3E%3C%2Fp%3E