TensorRT 10.3.0
nvinfer1::IQuantizeLayer Class Reference

A Quantize layer in a network definition. More...

#include <NvInfer.h>

Inheritance diagram for nvinfer1::IQuantizeLayer:
nvinfer1::ILayer nvinfer1::INoCopy

Public Member Functions

int32_t getAxis () const noexcept
 Get the quantization axis. More...
 
void setAxis (int32_t axis) noexcept
 Set the quantization axis. More...
 
void setToType (DataType toType) noexcept
 Set the Quantize layer output type. More...
 
DataType getToType () const noexcept
 Return the Quantize layer output type. More...
 
- Public Member Functions inherited from nvinfer1::ILayer
LayerType getType () const noexcept
 Return the type of a layer. More...
 
void setName (char const *name) noexcept
 Set the name of a layer. More...
 
char const * getName () const noexcept
 Return the name of a layer. More...
 
int32_t getNbInputs () const noexcept
 Get the number of inputs of a layer. More...
 
ITensorgetInput (int32_t index) const noexcept
 Get the layer input corresponding to the given index. More...
 
int32_t getNbOutputs () const noexcept
 Get the number of outputs of a layer. More...
 
ITensorgetOutput (int32_t index) const noexcept
 Get the layer output corresponding to the given index. More...
 
void setInput (int32_t index, ITensor &tensor) noexcept
 Replace an input of this layer with a specific tensor. More...
 
void setPrecision (DataType dataType) noexcept
 Set the preferred or required computational precision of this layer in a weakly-typed network. More...
 
DataType getPrecision () const noexcept
 get the computational precision of this layer More...
 
bool precisionIsSet () const noexcept
 whether the computational precision has been set for this layer More...
 
void resetPrecision () noexcept
 reset the computational precision for this layer More...
 
void setOutputType (int32_t index, DataType dataType) noexcept
 Set the output type of this layer in a weakly-typed network. More...
 
DataType getOutputType (int32_t index) const noexcept
 get the output type of this layer More...
 
bool outputTypeIsSet (int32_t index) const noexcept
 whether the output type has been set for this layer More...
 
void resetOutputType (int32_t index) noexcept
 reset the output type for this layer More...
 
void setMetadata (char const *metadata) noexcept
 Set the metadata for this layer. More...
 
char const * getMetadata () const noexcept
 Get the metadata of the layer. More...
 

Protected Member Functions

virtual ~IQuantizeLayer () noexcept=default
 
- Protected Member Functions inherited from nvinfer1::ILayer
virtual ~ILayer () noexcept=default
 
- Protected Member Functions inherited from nvinfer1::INoCopy
 INoCopy ()=default
 
virtual ~INoCopy ()=default
 
 INoCopy (INoCopy const &other)=delete
 
INoCopyoperator= (INoCopy const &other)=delete
 
 INoCopy (INoCopy &&other)=delete
 
INoCopyoperator= (INoCopy &&other)=delete
 

Protected Attributes

apiv::VQuantizeLayer * mImpl
 
- Protected Attributes inherited from nvinfer1::ILayer
apiv::VLayer * mLayer
 

Detailed Description

A Quantize layer in a network definition.

This layer accepts a floating-point data input tensor, and uses the scale and zeroPt inputs to quantize the data according to: output = clamp(round(input / scale) + zeroPt)

Rounding type is rounding-to-nearest ties-to-even (https://en.wikipedia.org/wiki/Rounding#Round_half_to_even). Clamping range according to data type:

  • FP8: [-448, 448]
  • INT4: [-8, 7]
  • INT8: [-128, 127]

The first input (index 0) is the tensor to be quantized. The second (index 1) and third (index 2) are the scale and zero point respectively. scale and zeroPt should have identical dimensions, and rank lower or equal to 2.

The zeroPt tensor is optional, and if not set, will be assumed to be zero. Its data type must match the output data type. zeroPt must only contain zero-valued coefficients, because only symmetric quantization is supported. The scale value must be a scalar for per-tensor quantization, a 1-D tensor for per-channel quantization, or a 2-D tensor for block quantization (supported for DataType::kINT4 only). All scale coefficients must have positive values. The size of the 1-D scale tensor must match the size of the quantization axis. For block quantization, the shape of scale tensor must match the shape of the input, except for one dimension in which blocking occurs. The size of zeroPt must match the size of scale.

The subgraph which terminates with the scale tensor must be a build-time constant. The same restrictions apply to the zeroPt. The output type, if constrained, must be constrained to DataType::kINT8, DataType::kFP8 or DataType::kINT4. The input type, if constrained, must be constrained to DataType::kFLOAT, DataType::kHALF, or DataType::kBF16. The output size is the same as the input size. The quantization axis is in reference to the input tensor's dimensions.

IQuantizeLayer supports DataType::kFLOAT, DataType::kHALF, or DataType::kBF16 precision and will default to DataType::kFLOAT precision during instantiation. For strongly typed networks, input data type must match the scale data type.

IQuantizeLayer supports DataType::kINT8, DataType::kFP8, or DataType::kINT4 output.

As an example of the operation of this layer, imagine a 4D NCHW activation input which can be quantized using a single scale coefficient (referred to as per-tensor quantization): For each n in N: For each c in C: For each h in H: For each w in W: output[n,c,h,w] = clamp(round(input[n,c,h,w] / scale) + zeroPt)

Per-channel quantization is supported only for weight inputs. Thus, Activations cannot be quantized per-channel. As an example of per-channel operation, imagine a 4D KCRS weights input and K (dimension 0) as the quantization axis. The scale is an array of coefficients, and must have the same size as the quantization axis. For each k in K: For each c in C: For each r in R: For each s in S: output[k,c,r,s] = clamp(round(input[k,c,r,s] / scale[k]) + zeroPt[k])

Block quantization is supported only for 2-D weight inputs of DataType::kINT4. As an example of blocked operation, imagine a 2-D RS weights input, R (dimension 0) as the blocking axis and B as the block size. The scale is a 2D array of coefficients, with dimensions (R//B, S). For each r in R: For each s in S: output[r,s] = clamp(round(input[r,s] / scale[r//B, s]) + zeroPt[r//B, s])

Note
Only symmetric quantization is supported.
Currently the only allowed build-time constant scale and zeroPt subgraphs are:
  1. Constant -> Quantize
  2. Constant -> Cast -> Quantize
The input tensor for this layer must not be a scalar.
Warning
Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.

Constructor & Destructor Documentation

◆ ~IQuantizeLayer()

virtual nvinfer1::IQuantizeLayer::~IQuantizeLayer ( )
protectedvirtualdefaultnoexcept

Member Function Documentation

◆ getAxis()

int32_t nvinfer1::IQuantizeLayer::getAxis ( ) const
inlinenoexcept

Get the quantization axis.

Returns
axis parameter set by setAxis(). The return value is the index of the quantization axis in the input tensor's dimensions. A value of -1 indicates per-tensor quantization. The default value is -1.

◆ getToType()

DataType nvinfer1::IQuantizeLayer::getToType ( ) const
inlinenoexcept

Return the Quantize layer output type.

Returns
toType parameter set during layer creation or by setToType(). The return value is the output type of the quantize layer. The default value is DataType::kINT8.

◆ setAxis()

void nvinfer1::IQuantizeLayer::setAxis ( int32_t  axis)
inlinenoexcept

Set the quantization axis.

Set the index of the quantization axis (with reference to the input tensor's dimensions). The axis must be a valid axis if the scale tensor has more than one coefficient. The axis value will be ignored if the scale tensor has exactly one coefficient (per-tensor quantization).

◆ setToType()

void nvinfer1::IQuantizeLayer::setToType ( DataType  toType)
inlinenoexcept

Set the Quantize layer output type.

Parameters
toTypeThe DataType of the output tensor.

Set the output type of the quantize layer. Valid values are DataType::kINT8 and DataType::kFP8. If the network is strongly typed, setToType must be used to set the output type, and use of setOutputType is an error. Otherwise, types passed to setOutputType and setToType must be the same.

See also
NetworkDefinitionCreationFlag::kSTRONGLY_TYPED

Member Data Documentation

◆ mImpl

apiv::VQuantizeLayer* nvinfer1::IQuantizeLayer::mImpl
protected

The documentation for this class was generated from the following file:

  Copyright © 2024 NVIDIA Corporation
  Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact