Problem with onnx export and usage

SnoozingSimian · June 25, 2022, 3:19pm

Hello everyone, I have been trying to speed up the GPT-Neo 1.3B model using Onnx, and have been facing significant issues.

I first exported the GPT-Neo 1.3B model using the Causal-LM feature. This created a folder with lots of files and the model.onnx file as well.

Thereafter I tried using the onnx model using onnx-runtime as shown in the this page.

Here is the code I used.

tokenizer = GPT2Tokenizer.from_pretrained(model_name)
ONNX_PROVIDERS = ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']

session = rt.InferenceSession("onnx/model.onnx", providers=ONNX_PROVIDERS)

inputs = tokenizer("Using gpt-neo with ONNX Runtime and ", return_tensors="np")
outputs = session.run(output_names=["logits"], input_feed=dict(inputs))

I used the %%time magic in the Jupyter cell and the above code took more than 5 minutes to execute.

After that I used a longer sentence and tried to inference again but the cell never completed execution (I waited for about an hour).

%%time
inputs = tokenizer("Using gpt-neo with ONNX Runtime again and this time with many more words which will put considerable load on the GPU as well as the CPU ", return_tensors="np")
outputs = session.run(output_names=["logits"], input_feed=dict(inputs))

I seem to be missing something, as I am certain this shouldn’t take so long. Could anyone please help me?

Topic		Replies	Views
Gpt2 inference with onnx and quantize Beginners	6	3736	February 3, 2021
Using GPT-Neo-125M with ONNX Intermediate	3	1324	July 5, 2022
Convert GPT-j to FP-16 Onnx Beginners	4	1323	March 10, 2023
Accelerated gpt2-chinese-cluecorpussmall model Beginners	0	405	September 17, 2021
Using onnx for text-generation with GPT-2 🤗Transformers	4	3794	February 3, 2023

Problem with onnx export and usage

Related topics