Hi,
Just an aparté on ONNX inference (cross-post from Supporting ONNX optimized models)
I’d be interested in an nlp = pipeline("sentiment-analysis", onnx=True)
pipeline like @valhalla created, where the ONNX files are hosted on the model hub and stored in the transformers cache.
My use case is fast inference of pre-trained models on embedded applications (no network connection).
Cheers,
Alex