TensorRT
8.0.1

A Quantize layer in a network definition. More...
#include <NvInfer.h>
Public Member Functions  
int32_t  getAxis () const noexcept 
Get the quantization axis. More...  
void  setAxis (int32_t axis) noexcept 
Set the quantization axis. More...  
Public Member Functions inherited from nvinfer1::ILayer  
LayerType  getType () const noexcept 
Return the type of a layer. More...  
void  setName (const char *name) noexcept 
Set the name of a layer. More...  
const char *  getName () const noexcept 
Return the name of a layer. More...  
int32_t  getNbInputs () const noexcept 
Get the number of inputs of a layer.  
ITensor *  getInput (int32_t index) const noexcept 
Get the layer input corresponding to the given index. More...  
int32_t  getNbOutputs () const noexcept 
Get the number of outputs of a layer.  
ITensor *  getOutput (int32_t index) const noexcept 
Get the layer output corresponding to the given index. More...  
void  setInput (int32_t index, ITensor &tensor) noexcept 
Replace an input of this layer with a specific tensor. More...  
void  setPrecision (DataType dataType) noexcept 
Set the computational precision of this layer. More...  
DataType  getPrecision () const noexcept 
get the computational precision of this layer More...  
bool  precisionIsSet () const noexcept 
whether the computational precision has been set for this layer More...  
void  resetPrecision () noexcept 
reset the computational precision for this layer More...  
void  setOutputType (int32_t index, DataType dataType) noexcept 
Set the output type of this layer. More...  
DataType  getOutputType (int32_t index) const noexcept 
get the output type of this layer More...  
bool  outputTypeIsSet (int32_t index) const noexcept 
whether the output type has been set for this layer More...  
void  resetOutputType (int32_t index) noexcept 
reset the output type for this layer More...  
Protected Attributes  
apiv::VQuantizeLayer *  mImpl 
Protected Attributes inherited from nvinfer1::ILayer  
apiv::VLayer *  mLayer 
Additional Inherited Members  
Protected Member Functions inherited from nvinfer1::INoCopy  
INoCopy (const INoCopy &other)=delete  
INoCopy &  operator= (const INoCopy &other)=delete 
INoCopy (INoCopy &&other)=delete  
INoCopy &  operator= (INoCopy &&other)=delete 
A Quantize layer in a network definition.
This layer accepts a floatingpoint data input tensor, and uses the scale and zeroPt inputs to quantize the data to an 8bit signed integer according to: output
= clamp(round(input
/ scale
) + zeroPt
)
Rounding type is roundingtonearest tiestoeven (https://en.wikipedia.org/wiki/Rounding#Round_half_to_even). Clamping is in the range [128, 127].
The first input (index 0) is the tensor to be quantized. The second (index 1) and third (index 2) are the scale and zero point respectively. Each of scale
and zeroPt
must be either a scalar, or a 1D tensor.
The zeroPt
tensor is optional, and if not set, will be assumed to be zero. Its data type must be DataType::kINT8. zeroPt
must only contain zerovalued coefficients, because only symmetric quantization is supported. The scale
value must be either a scalar for pertensor quantization, or a 1D tensor for perchannel quantization. All scale
coefficients must have positive values. The size of the 1D scale
tensor must match the size of the quantization axis. The size of the scale
must match the size of the zeroPt
.
The subgraph which terminates with the scale
tensor must be a buildtime constant. The same restrictions apply to the zeroPt
. The output type, if constrained, must be constrained to DataType::kINT8. The input type, if constrained, must be constrained to DataType::kFLOAT (FP16 input is not supported). The output size is the same as the input size. The quantization axis is in reference to the input tensor's dimensions.
IQuantizeLayer only supports DataType::kFLOAT precision and will default to this precision during instantiation. IQuantizeLayer only supports DataType::kINT8 output.
As an example of the operation of this layer, imagine a 4D NCHW activation input which can be quantized using a single scale coefficient (referred to as pertensor quantization): For each n in N: For each c in C: For each h in H: For each w in W: output[n,c,h,w] = clamp(round(input
[n,c,h,w] / scale
) + zeroPt
)
Perchannel quantization is supported only for weight inputs. Thus, Activations cannot be quantized perchannel. As an example of perchannel operation, imagine a 4D KCRS weights input and K (dimension 0) as the quantization axis. The scale is an array of coefficients, and must have the same size as the quantization axis. For each k in K: For each c in C: For each r in R: For each s in S: output[k,c,r,s] = clamp(round(input
[k,c,r,s] / scale
[k]) + zeroPt
[k])
scale
and \zeroPt subgraphs are:

inlinenoexcept 
Get the quantization axis.

inlinenoexcept 
Set the quantization axis.
Set the index of the quantization axis (with reference to the input tensor's dimensions). The axis must be a valid axis if the scale tensor has more than one coefficient. The axis value will be ignored if the scale tensor has exactly one coefficient (pertensor quantization).