Quantize

Quantize a float input tensor into an integer output tensor. The quantization computation is as follows: \(output_{i_0,..,i_n} = \text{clamp}(\text{round}(\frac{input_{i_0,..,i_n}}{scale} + \text{zero_point}))\).

Attributes

axis The axis to perform the quantization on.

scale The scale to use for the quantization.

zero_point The zero_point to use for the quantization.

Inputs

input: tensor of type T1.

Outputs

output: tensor of type T2.

Data Types

T1: float16, float32

T2: int8

Shape Information

input and output are tensors with a shape of \([a_0,...,a_n]\).

Examples

Quantize
in1 = network.add_input("input1", dtype=trt.float32, shape=(1, 1, 3, 3))
scale = network.add_constant(shape=(1,), weights=np.array([1 / 127], dtype=np.float32))
quantize = network.add_quantize(in1, scale.get_output(0))
quantize.axis = 3
dequantize = network.add_dequantize(quantize.get_output(0), scale.get_output(0))
dequantize.axis = 3
network.mark_output(dequantize.get_output(0))

inputs[in1.name] = np.array(
    [
        [
            [0.56, 0.89, 1.4],
            [-0.56, 0.39, 6.0],
            [0.67, 0.11, -3.6],
        ]
    ]
)

outputs[dequantize.get_output(0).name] = dequantize.get_output(0).shape
expected[dequantize.get_output(0).name] = np.array(
    [
        [
            [0.56, 0.89, 1],
            [-0.56, 0.39, 1.0],
            [0.67, 0.11, -1.0],
        ]
    ]
)

C++ API

For more information about the C++ IQuantizeLayer operator, refer to the C++ IQuantizeLayer documentation.

Python API

For more information about the Python IQuantizeLayer operator, refer to the Python IQuantizeLayer documentation.