How to set batchsize of inference

I am a beginner and it seems that Transformers only supports processing one request at a time. Can we actually increase the parallel capability to process multiple requests in parallel like batchsize=n?

2 Likes

hi @allenwang37
I don’t know if this answers your question:

If you have multiple GPUs:

1 Like