(custom_qdq_case)= # **Add Custom QDQ Insertion Case** This toolkit's default quantization behavior for each supported layer is displayed in the [Add New Layer Support](new_layer_support) section. For the most part, it quantizes (adds Q/DQ nodes to) all inputs and weights (if the layer is weighted) of supported layers. However, the default behavior might not always lead to optimal INT8 fusions in TensorRT(TM). For example, Q/DQ nodes need to be added to residual connections in ResNet models. We provide a more in-depth explanation about this case in the "Custom Q/DQ Insertion Case Quantization" section later in this page. To tackle those scenarios, we added the `Custom Q/DQ Insertion Case` library feature, which allows users to programmatically decide how a specific layer should be quantized differently in specific situations. Note that providing an object of `QuantizationSpec` class is a hard coded way of achieving the same goal. Let's discuss the library-provided `ResNetV1QDQCase` to understand how passing custom Q/DQ insertion case objects affect Q/DQ insertion for the `Add` layer. ## **Why is this needed?** The main goal of the `Custom Q/DQ Insertion` feature is to twick the framework's behavior to meet network-specific quantization requirements. Let's check this through an example. **Goal**: Perform custom quantization on a ResNet-like model. More specifically, we aim to quantize a model's residual connections. We show three quantization scenarios: 1) default, 2) custom with `QuantizationSpec` (suboptimal), and 3) custom with `Custom Q/DQ Insertion Case` (optimal). ### **Default Quantization** ````{note} Refer to **`Full Default Quantization`** [mode](basic). ```` The default quantization of the model is done with the following code snippet: ```python # Quantize model q_nn_model = quantize_model(model=nn_model_original) ``` Figure 1, below, shows the baseline ResNet residual block and its corresponding quantized block with the default quantization scheme.
![resnet_base](./assets/special_qdq_base.png) ![resnet_default](./assets/special_qdq_default.png) Figure 1. ResNet residual block (left), and default quantized block (right).
Notice that the default quantization behavior is to not add Q/DQ nodes before `Add` layers. Since `AddQuantizeWrapper` is already implemented in the toolkit, and just disabled by default, the simplest way to quantize that layer would be to enable quantization of layers of class type `Add`. ### **Custom Quantization with 'QuantizationSpec' (suboptimal)** ````{note} Refer to **`Full Custom Quantization`** [mode](basic). ```` The following code snippet enables quantization of all layers of class type `Add`: ```python # 1. Enable `Add` layer quantization qspec = QuantizationSpec() qspec.add(name='Add', is_keras_class=True) # 2. Quantize model q_nn_model = quantize_model( model=nn_model_original, quantization_spec=qspec ) ``` Figure 2, below, shows the standard ResNet residual block and its corresponding quantized block with the suggested custom quantization.
![resnet_base](./assets/special_qdq_base.png) ![resnet_default](./assets/special_qdq_qspec.png) Figure 2. ResNet residual block (left), and Q/DQ node insertion for `Add` layer passed via `QuantizationSpec` (right).
Notice that all inputs of the `Add` layer were quantized. However, that still does not enable optimal [layer fusions in TensorRT(TM)](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#enable-fusion), where a Convolution layer followed by an ElementWise layer (such as `Add`) can be fused into a single Convolution kernel. The [recommendation](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#qdq-placement-recs__xxx), in this case, is to add Q/DQ nodes in the residual connection only (not between `Add` and `Conv`). ### **Custom Quantization with 'Custom Q/DQ Insertion Case' (optimal)** ````{note} Refer to **`Full Custom Quantization`** [mode](basic). ```` The library-provided `ResNetV1QDQCase` class solves this issue by programming `Add` layer class to skip Q/DQ in one path if that path connects to `Conv`. This time, we pass an object of `ResNetV1QDQCase` class to the `quantize_model` function: ```python # 1. Indicate one or more custom QDQ cases custom_qdq_case = ResNetV1QDQCase() # 3. Quantize model q_nn_model = quantize_model( model=nn_model_original, custom_qdq_cases=[custom_qdq_case] ) ``` Figure 3, below, shows the standard ResNet residual block and its corresponding quantized block with the suggested custom quantization.
![resnet_base](./assets/special_qdq_base.png) ![resnet_special](./assets/special_qdq_customqdqcase.png) Figure 3. ResNet residual block (left), and Q/DQ node insertion for `Add` layer passed via `ResNetV1QDQCase` (right).
Notice that Q/DQ nodes are not added to the path coming from `Conv` layer. Additionally, since both outputs of the first `Relu` layer were quantized, it was possible to perform a horizontal fusion with them, resulting in only one pair of Q/DQ nodes at that location. This quantization approach leads to an optimal graph for TensorRT INT8 fusions. ## **Library provided custom Q/DQ insertion cases** We provide custom Q/DQ insertion cases for the models available in the model zoo. The library-provided custom Q/DQ insertion case classes can be imported from `tensorflow_quantization.custom_qdq_cases` module and passed to the `quantize_model` function. ```{note} Refer to [tensorflow_quantization.custom_qdq_cases](https://gitlab-master.nvidia.com/TensorRT/Tools/tensorflow-quantization/-/blob/main/tensorflow_quantization/custom_qdq_cases.py) module for more details. ``` ## **How to add a new custom Q/DQ insertion case?** ```{eval-rst} #. Create a new class by inheriting ``tensorflow_quantization.CustomQDQInsertionCase`` class. #. Override two methods: 1. ``case`` (compulsory) This method has fixed signature as shown below. Library automatically calls ``case`` method of all members of ``custom_qdq_cases`` parameter inside ``quantize_model`` function. Logic for changing the default layer behavior should be encoded in this function and an object of ``QuantizationSpec`` class must be returned. .. code-block:: python (function)CustomQDQInsertionCase.case( self, keras_model : 'tf.keras.Model', qspec : 'QuantizationSpec' ) -> 'QuantizationSpec' 2. ``info`` (optional) This is just a helper method explaining the logic inside ``case`` method. #. Add object of this new class to a list and pass it to the ``custom_qdq_cases`` parameter of the ``quantize_model`` function. ``` ```{eval-rst} .. ATTENTION:: If ``CustomQDQInsertionCase`` is written, ``QuantizationSpec`` object MUST be returned. ``` Example, ```python class MaxPoolQDQCase(CustomQDQInsertionCase): def __init__(self) -> None: super().__init__() def info(self) -> str: return "Enables quantization of MaxPool layers." def case( self, keras_model: tf.keras.Model, qspec: QuantizationSpec ) -> QuantizationSpec: mp_qspec = QuantizationSpec() for layer in keras_model.layers: if isinstance(layer, tf.keras.layers.MaxPooling2D): if check_is_quantizable_by_layer_name(qspec, layer.name): mp_qspec.add( name=layer.name, quantize_input=True, quantize_weight=False ) return mp_qspec ``` As shown in the above MaxPool custom Q/DQ case class, the `case` method needs to be overridden. The optional `info` method returns a short description string. The logic written in the `case` method might or might not use the user-provided `QuantizationSpec` object, but it MUST return a new `QuantizationSpec` which holds information on the updated layer behavior. In the `MaxPoolQDQCase` case above, the custom Q/DQ insertion logic is dependent of the user-provided `QuantizationSpec` object (`check_is_quantizable_by_layer_name` checks if the layer name is in the user-provided object and gives priority to that specification).