A Quantize layer in a network definition. More...

#include <NvInfer.h>

Inheritance diagram for nvinfer1::IQuantizeLayer:

Public Member Functions
int32_t	getAxis () const noexcept
	Get the quantization axis. More...

void	setAxis (int32_t axis) noexcept
	Set the quantization axis. More...

Public Member Functions inherited from nvinfer1::ILayer
LayerType	getType () const noexcept
	Return the type of a layer. More...

void	setName (char const *name) noexcept
	Set the name of a layer. More...

char const *	getName () const noexcept
	Return the name of a layer. More...

int32_t	getNbInputs () const noexcept
	Get the number of inputs of a layer. More...

ITensor *	getInput (int32_t index) const noexcept
	Get the layer input corresponding to the given index. More...

int32_t	getNbOutputs () const noexcept
	Get the number of outputs of a layer. More...

ITensor *	getOutput (int32_t index) const noexcept
	Get the layer output corresponding to the given index. More...

void	setInput (int32_t index, ITensor &tensor) noexcept
	Replace an input of this layer with a specific tensor. More...

void	setPrecision (DataType dataType) noexcept
	Set the computational precision of this layer. More...

DataType	getPrecision () const noexcept
	get the computational precision of this layer More...

bool	precisionIsSet () const noexcept
	whether the computational precision has been set for this layer More...

void	resetPrecision () noexcept
	reset the computational precision for this layer More...

void	setOutputType (int32_t index, DataType dataType) noexcept
	Set the output type of this layer. More...

DataType	getOutputType (int32_t index) const noexcept
	get the output type of this layer More...

bool	outputTypeIsSet (int32_t index) const noexcept
	whether the output type has been set for this layer More...

void	resetOutputType (int32_t index) noexcept
	reset the output type for this layer More...

Protected Member Functions
virtual	~IQuantizeLayer () noexcept=default

Protected Member Functions inherited from nvinfer1::ILayer
virtual	~ILayer () noexcept=default

Protected Member Functions inherited from nvinfer1::INoCopy
	INoCopy ()=default

virtual	~INoCopy ()=default

	INoCopy (INoCopy const &other)=delete

INoCopy &	operator= (INoCopy const &other)=delete

	INoCopy (INoCopy &&other)=delete

INoCopy &	operator= (INoCopy &&other)=delete

Protected Attributes
apiv::VQuantizeLayer *	mImpl

Protected Attributes inherited from nvinfer1::ILayer
apiv::VLayer *	mLayer

Detailed Description

A Quantize layer in a network definition.

This layer accepts a floating-point data input tensor, and uses the scale and zeroPt inputs to quantize the data to an 8-bit signed integer according to: output = clamp(round(input / scale) + zeroPt)

Rounding type is rounding-to-nearest ties-to-even (https://en.wikipedia.org/wiki/Rounding#Round_half_to_even). Clamping is in the range [-128, 127].

The first input (index 0) is the tensor to be quantized. The second (index 1) and third (index 2) are the scale and zero point respectively. Each of scale and zeroPt must be either a scalar, or a 1D tensor.

The zeroPt tensor is optional, and if not set, will be assumed to be zero. Its data type must be DataType::kINT8. zeroPt must only contain zero-valued coefficients, because only symmetric quantization is supported. The scale value must be either a scalar for per-tensor quantization, or a 1D tensor for per-channel quantization. All scale coefficients must have positive values. The size of the 1-D scale tensor must match the size of the quantization axis. The size of the scale must match the size of the zeroPt.

The subgraph which terminates with the scale tensor must be a build-time constant. The same restrictions apply to the zeroPt. The output type, if constrained, must be constrained to DataType::kINT8. The input type, if constrained, must be constrained to DataType::kFLOAT or DataType::kHALF. The output size is the same as the input size. The quantization axis is in reference to the input tensor's dimensions.

IQuantizeLayer only supports DataType::kFLOAT precision and will default to this precision during instantiation. IQuantizeLayer only supports DataType::kINT8 output.

As an example of the operation of this layer, imagine a 4D NCHW activation input which can be quantized using a single scale coefficient (referred to as per-tensor quantization): For each n in N: For each c in C: For each h in H: For each w in W: output[n,c,h,w] = clamp(round(input[n,c,h,w] / scale) + zeroPt)

Per-channel quantization is supported only for weight inputs. Thus, Activations cannot be quantized per-channel. As an example of per-channel operation, imagine a 4D KCRS weights input and K (dimension 0) as the quantization axis. The scale is an array of coefficients, and must have the same size as the quantization axis. For each k in K: For each c in C: For each r in R: For each s in S: output[k,c,r,s] = clamp(round(input[k,c,r,s] / scale[k]) + zeroPt[k])

Note

Only symmetric quantization is supported.

Currently the only allowed build-time constant scale and \zeroPt subgraphs are:

Constant -> Quantize
Constant -> Cast -> Quantize

Warning: Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.

Constructor & Destructor Documentation

◆ ~IQuantizeLayer()

virtual nvinfer1::IQuantizeLayer::~IQuantizeLayer ( )

protectedvirtualdefaultnoexcept

Member Function Documentation

◆ getAxis()

int32_t nvinfer1::IQuantizeLayer::getAxis ( ) const

inlinenoexcept

Get the quantization axis.

Returns: axis parameter set by setAxis(). The return value is the index of the quantization axis in the input tensor's dimensions. A value of -1 indicates per-tensor quantization. The default value is -1.

◆ setAxis()

void nvinfer1::IQuantizeLayer::setAxis ( int32_t axis )

inlinenoexcept

Set the quantization axis.

Set the index of the quantization axis (with reference to the input tensor's dimensions). The axis must be a valid axis if the scale tensor has more than one coefficient. The axis value will be ignored if the scale tensor has exactly one coefficient (per-tensor quantization).

Member Data Documentation

◆ mImpl

apiv::VQuantizeLayer* nvinfer1::IQuantizeLayer::mImpl

protected

The documentation for this class was generated from the following file:

NvInfer.h

Public Member Functions

Protected Member Functions

Protected Attributes