Portable, lightweight hugging face model inference locally

gandg · January 13, 2025, 11:03am

Hi there, I am researching ways to have a hugging face model that I trained run locally without having to manually install any dependencies such as Python and its 4 GB libraries including CUDA/Transformers/Torch/etc.

One option is the obvious pyinstaller that creates a completely portable executable file with respective dependencies included. The problem is, one of those dependencies is Torch, which is 3.5 GB in size. My model itself already weights 890 MB. Is there a way to have Python interface/do inference on the model without Torch? Or make Torch somewhow smaller when using pyinstaller?

Or perhaps exporting the model to ONNX format and communicating to it using a different code base.

Please advise.

John6666 · January 13, 2025, 11:50am

Or tflite?

gandg · January 13, 2025, 12:23pm

Thank you. The model I have is VisionEncoderDecoderModel (Donut), fine tuned, and DonutProcessor. How would I start to convert this model to tflite? I have tried onnx conversion just now, so far the inference results in random texts rather than anything meaningful.

John6666 · January 13, 2025, 1:51pm

ONNX is quite compatible, so if it doesn’t work with ONNX, tflite may be even less compatible…
There is a community for ONNX on HF, so it’s a good idea to ask them. Also, for relatively new architectures, using the github version of ONNX may work. There used to be one.
I can’t post the link right now, so it’s a bit vague…

Topic		Replies	Views
Supporting ONNX optimized models 🤗Transformers	16	2950	September 1, 2021
Convert OpenAI whisper transformer model to Quantized tflite model 🤗Transformers	1	2162	November 7, 2023
Converting SWIN Transformers from Pytorch through ONNX or Others 🤗Transformers	0	466	May 16, 2023
Questions about ONNX 🤗Transformers	4	3032	January 25, 2022
Changing Default Model to Tensorflow in HF inference API Beginners	0	106	April 12, 2024

Portable, lightweight hugging face model inference locally

Related topics