I would like to optimize AND quantize a NER model I have fine-tuned. I see that with the Optimum library it is easy to do both things separately, but I haven’t still managed to optimize and quantize the same model. Back with ONNXRuntime it was possible and easy to do. How can this be done with Optimum?
Thank you for your answer. The first blog post seems to solve my question. Anyways, I think it uses an old version of Optimum that has some attributes that actual ones don’t. When exporting the optimized/quantized model, in the blog post they use the export attribute. But, when I do the same (using version 1.4.0), it gives me an error saying 'ORTOptimizer' object has no attribute 'export'. I have searched in the documentation but I haven’t find anything similar to the article for more recent versions.
The problem I see with the latest version (1.4.0) compared to older versions is that it doesn’t have an export attribute implemented nor anything similar that could optimize a quantized model or quantize an optimized model.
As far as I understand, in version 1.4.0, it only allows you to go from model.onnx to model-optimized.onnx or from model.onnx to model-quantized.onnx, but then you can’t quantize model-optimized.onnx or optimize model-quantized.onnx to get a model-optimized-quantized.onnx, while in other versions, with export you could (such as in the blog post you linked).
Will be possible to do this in future versions? I think having a model quantized and optimized was such a good solution as it reduced a lot of space and achieved almost similar results as the original model.
@jorgealro, that is already possible and supported you can provide the file_name when either loading a ORTModel or creating a Optimizer/Quantizer. This is explained and document in our documentation.
Hi @jorgealro,
could you please explain how you solved the 'ORTOptimizer' object has no attribute 'export' issue? I’m facing the same problem trying to optimize a CrossEncoder with optimum 1.7.3.
Many thanks in advance!
I found the solution, it might be useful to someone until the documentation gets updated (currently available is for v1.3.0).
As of optimum==1.7.3, you should use the optimize method, instead of the export one: