Quantization
🤗 Optimum provides an optimum.onnxruntime
package that enables you to apply quantization on many model hosted on the 🤗 hub using the ONNX Runtime quantization tool.
ORTQuantizer
class optimum.onnxruntime.ORTQuantizer
< source >( preprocessor: typing.Union[transformers.models.auto.tokenization_auto.AutoTokenizer, transformers.models.auto.feature_extraction_auto.AutoFeatureExtractor] model: PreTrainedModel feature: str = 'default' opset: typing.Optional[int] = None )
Handles the ONNX Runtime quantization process for models shared on huggingface.co/models.
export
< source >( onnx_model_path: typing.Union[str, os.PathLike] onnx_quantized_model_output_path: typing.Union[str, os.PathLike] quantization_config: QuantizationConfig calibration_tensors_range: typing.Union[typing.Dict[str, typing.Tuple[float, float]], NoneType] = None use_external_data_format: bool = False preprocessor: typing.Optional[optimum.onnxruntime.preprocessors.quantization.QuantizationPreprocessor] = None )
Parameters
-
onnx_model_path (
Union[str, os.PathLike]
) — The path used to save the model exported to an ONNX Intermediate Representation (IR). -
onnx_quantized_model_output_path (
Union[str, os.PathLike]
) — The path used to save the quantized model exported to an ONNX Intermediate Representation (IR). -
quantization_config (
QuantizationConfig
) — The configuration containing the parameters related to quantization. -
calibration_tensors_range (
Dict[NodeName, Tuple[float, float]]
, optional) — The dictionary mapping the nodes name to their quantization ranges, used and required only when applying static quantization. -
use_external_data_format (
bool
, defaults toFalse
) — Whether to use external data format to store model which size is >= 2Gb. -
preprocessor (
QuantizationPreprocessor
, optional) — The preprocessor to use to collect the nodes to include or exclude from quantization.
Quantize a model given the optimization specifications defined in quantization_config
.
fit
< source >( dataset: Dataset calibration_config: CalibrationConfig onnx_model_path: typing.Union[str, os.PathLike, pathlib.Path] onnx_augmented_model_name: str = 'augmented_model.onnx' operators_to_quantize: typing.Optional[typing.List[str]] = None batch_size: int = 1 use_external_data_format: bool = False use_gpu: bool = False force_symmetric_range: bool = False )
Parameters
-
dataset (
Dataset
) — The dataset to use when performing the calibration step. -
calibration_config (
CalibrationConfig
) — The configuration containing the parameters related to the calibration step. -
onnx_model_path (
Union[str, os.PathLike]
) — The path used to save the model exported to an ONNX Intermediate Representation (IR). -
onnx_augmented_model_name (
Union[str, os.PathLike]
) — The path used to save the augmented model used to collect the quantization ranges. -
operators_to_quantize (
list
, optional) — List of the operators types to quantize. -
batch_size (
int
, defaults to 1) — The batch size to use when collecting the quantization ranges values. -
use_external_data_format (
bool
, defaults toFalse
) — Whether uto se external data format to store model which size is >= 2Gb. -
use_gpu (
bool
, defaults toFalse
) — Whether to use the GPU when collecting the quantization ranges values. -
force_symmetric_range (
bool
, defaults toFalse
) — Whether to make the quantization ranges symmetric.
Perform the calibration step and collect the quantization ranges.
from_pretrained
< source >( model_name_or_path: typing.Union[str, os.PathLike] feature: str opset: typing.Optional[int] = None )
Instantiate a ORTQuantizer
from a pretrained pytorch model and preprocessor.
get_calibration_dataset
< source >( dataset_name: str num_samples: int = 100 dataset_config_name: typing.Optional[str] = None dataset_split: typing.Optional[str] = None preprocess_function: typing.Optional[typing.Callable] = None preprocess_batch: bool = True seed: int = 2016 )
Parameters
-
dataset_name (
str
) — The dataset repository name on the Hugging Face Hub or path to a local directory containing data files to load to use for the calibration step. -
num_samples (
int
, defaults to 100) — The maximum number of samples composing the calibration dataset. -
dataset_config_name (
str
, optional) — The name of the dataset configuration. -
dataset_split (
str
, optional) — Which split of the dataset to use to perform the calibration step. -
preprocess_function (
Callable
, optional) — Processing function to apply to each example after loading dataset. -
preprocess_batch (
int
, defaults toTrue
) — Whether thepreprocess_function
should be batched. -
seed (
int
, defaults to 2016) — The random seed to use when shuffling the calibration dataset.
Create the calibration datasets.Dataset
to use for the post-training static quantization calibration step
partial_fit
< source >( dataset: Dataset calibration_config: CalibrationConfig onnx_model_path: typing.Union[str, os.PathLike] onnx_augmented_model_name: str = 'augmented_model.onnx' operators_to_quantize: typing.Optional[typing.List[str]] = None batch_size: int = 1 use_external_data_format: bool = False use_gpu: bool = False force_symmetric_range: bool = False )
Parameters
-
dataset (
Dataset
) — The dataset to use when performing the calibration step. -
calibration_config (
CalibrationConfig
) — The configuration containing the parameters related to the calibration step. -
onnx_model_path (
Union[str, os.PathLike]
) — The path used to save the model exported to an ONNX Intermediate Representation (IR). -
onnx_augmented_model_name (
Union[str, os.PathLike]
) — The path used to save the augmented model used to collect the quantization ranges. -
operators_to_quantize (
list
, optional) — List of the operators types to quantize. -
batch_size (
int
, defaults to 1) — The batch size to use when collecting the quantization ranges values. -
use_external_data_format (
bool
, defaults toFalse
) — Whether uto se external data format to store model which size is >= 2Gb. -
use_gpu (
bool
, defaults toFalse
) — Whether to use the GPU when collecting the quantization ranges values. -
force_symmetric_range (
bool
, defaults toFalse
) — Whether to make the quantization ranges symmetric.
Perform the calibration step and collect the quantization ranges.