Hi there, I am researching ways to have a hugging face model that I trained run locally without having to manually install any dependencies such as Python and its 4 GB libraries including CUDA/Transformers/Torch/etc.
One option is the obvious pyinstaller that creates a completely portable executable file with respective dependencies included. The problem is, one of those dependencies is Torch, which is 3.5 GB in size. My model itself already weights 890 MB. Is there a way to have Python interface/do inference on the model without Torch? Or make Torch somewhow smaller when using pyinstaller?
Or perhaps exporting the model to ONNX format and communicating to it using a different code base.
Please advise.