Portable, lightweight hugging face model inference locally

Hi there, I am researching ways to have a hugging face model that I trained run locally without having to manually install any dependencies such as Python and its 4 GB libraries including CUDA/Transformers/Torch/etc.

One option is the obvious pyinstaller that creates a completely portable executable file with respective dependencies included. The problem is, one of those dependencies is Torch, which is 3.5 GB in size. My model itself already weights 890 MB. Is there a way to have Python interface/do inference on the model without Torch? Or make Torch somewhow smaller when using pyinstaller?

Or perhaps exporting the model to ONNX format and communicating to it using a different code base.

Please advise.

1 Like

Or tflite?

Thank you. The model I have is VisionEncoderDecoderModel (Donut), fine tuned, and DonutProcessor. How would I start to convert this model to tflite? I have tried onnx conversion just now, so far the inference results in random texts rather than anything meaningful.

1 Like

ONNX is quite compatible, so if it doesn’t work with ONNX, tflite may be even less compatible…
There is a community for ONNX on HF, so it’s a good idea to ask them. Also, for relatively new architectures, using the github version of ONNX may work. There used to be one.
I can’t post the link right now, so it’s a bit vague…