A network layer to perform dynamic quantization. More...

#include <NvInfer.h>

Inheritance diagram for nvinfer1::IDynamicQuantizeLayer:

Public Member Functions
void	setToType (DataType toType) noexcept
	Set DynamicQuantizeLayer’s quantized output type. More...

DataType	getToType () const noexcept
	Return DynamicQuantizeLayer’s quantized output type. More...

void	setScaleType (DataType scaleType) noexcept
	Set the data type of the scale factors used to quantize the data. More...

DataType	getScaleType () const noexcept
	Return the scale factors data type. More...

void	setAxis (int32_t axis) noexcept
	Set the axis along which block quantization occurs. More...

int32_t	getAxis () const noexcept
	Get the axis along which blocking occurs. More...

void	setBlockSize (int32_t size) noexcept
	Set the size of the quantization block. More...

int32_t	getBlockSize () const noexcept
	Get the size of the quantization block. More...

void	setInput (int32_t index, ITensor &tensor) noexcept
	Append or replace an input of this layer with a specific tensor. More...

Public Member Functions inherited from nvinfer1::ILayer
LayerType	getType () const noexcept
	Return the type of a layer. More...

void	setName (char const *name) noexcept
	Set the name of a layer. More...

char const *	getName () const noexcept
	Return the name of a layer. More...

int32_t	getNbInputs () const noexcept
	Get the number of inputs of a layer. More...

ITensor *	getInput (int32_t index) const noexcept
	Get the layer input corresponding to the given index. More...

int32_t	getNbOutputs () const noexcept
	Get the number of outputs of a layer. More...

ITensor *	getOutput (int32_t index) const noexcept
	Get the layer output corresponding to the given index. More...

void	setInput (int32_t index, ITensor &tensor) noexcept
	Replace an input of this layer with a specific tensor. More...

void	setPrecision (DataType dataType) noexcept
	Set the preferred or required computational precision of this layer in a weakly-typed network. More...

DataType	getPrecision () const noexcept
	get the computational precision of this layer More...

bool	precisionIsSet () const noexcept
	whether the computational precision has been set for this layer More...

void	resetPrecision () noexcept
	reset the computational precision for this layer More...

void	setOutputType (int32_t index, DataType dataType) noexcept
	Set the output type of this layer in a weakly-typed network. More...

DataType	getOutputType (int32_t index) const noexcept
	get the output type of this layer More...

bool	outputTypeIsSet (int32_t index) const noexcept
	whether the output type has been set for this layer More...

void	resetOutputType (int32_t index) noexcept
	reset the output type for this layer More...

void	setMetadata (char const *metadata) noexcept
	Set the metadata for this layer. More...

char const *	getMetadata () const noexcept
	Get the metadata of the layer. More...

Protected Member Functions
virtual	~IDynamicQuantizeLayer () noexcept=default

Protected Member Functions inherited from nvinfer1::ILayer
virtual	~ILayer () noexcept=default

Protected Member Functions inherited from nvinfer1::INoCopy
	INoCopy ()=default

virtual	~INoCopy ()=default

	INoCopy (INoCopy const &other)=delete

INoCopy &	operator= (INoCopy const &other)=delete

	INoCopy (INoCopy &&other)=delete

INoCopy &	operator= (INoCopy &&other)=delete

Protected Attributes
apiv::VDynamicQuantizeLayer *	mImpl

Protected Attributes inherited from nvinfer1::ILayer
apiv::VLayer *	mLayer

Detailed Description

A network layer to perform dynamic quantization.

This layer accepts a floating-point input tensor and computes the block scale factors needed to quantize the input’s data. It outputs the quantized tensor as its first output and the scale factors as its second output.

Use ILayer::setInput to add an input for the double-quantization scale factor.

Note: Only symmetric quantization is supported.; The input tensor for this layer must not be a scalar.

Warning: Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.

Constructor & Destructor Documentation

◆ ~IDynamicQuantizeLayer()

virtual nvinfer1::IDynamicQuantizeLayer::~IDynamicQuantizeLayer ( )

protectedvirtualdefaultnoexcept

Member Function Documentation

◆ getAxis()

int32_t nvinfer1::IDynamicQuantizeLayer::getAxis ( ) const

inlinenoexcept

Get the axis along which blocking occurs.

See also: setAxis()

◆ getBlockSize()

int32_t nvinfer1::IDynamicQuantizeLayer::getBlockSize ( ) const

inlinenoexcept

Get the size of the quantization block.

See also: setBlockSize()

◆ getScaleType()

DataType nvinfer1::IDynamicQuantizeLayer::getScaleType ( ) const

inlinenoexcept

Return the scale factors data type.

Returns: scaleType parameter set during layer creation or by setScaleType().

The return value is the type of the scale factors used to quantize the dynamic data. The default value is DataType::kFP8.

◆ getToType()

DataType nvinfer1::IDynamicQuantizeLayer::getToType ( ) const

inlinenoexcept

Return DynamicQuantizeLayer’s quantized output type.

Returns: toType parameter set during layer creation or by setToType().

The return value is the type of the quantized output tensor. The default value is DataType::kFP4.

◆ setAxis()

void nvinfer1::IDynamicQuantizeLayer::setAxis ( int32_t axis )

inlinenoexcept

Set the axis along which block quantization occurs.

The axis must be the last dimension or second to last dimension. The input's shape along the axis must be constant.

See also: getAxis()

◆ setBlockSize()

void nvinfer1::IDynamicQuantizeLayer::setBlockSize ( int32_t size )

inlinenoexcept

Set the size of the quantization block.

Note: The block size must divide the input in the blocked axis without remainder. Currently only 16-element blocks are supported.

See also: getBlockSize()

◆ setInput()

void nvinfer1::ILayer::setInput	(	int32_t	index,
		ITensor &	tensor
	)

inlinenoexcept

Append or replace an input of this layer with a specific tensor.

Parameters

index	the index of the input to modify.
tensor	the new input tensor

Input 0 is the input activation tensor. Input 1 is the double-quantization scale factor. This scale is used to quantize the dynamically computed high-precision scale factors that are used to quantize the activation data. Currently this input must be a positive scalar (a 0D tensor).

◆ setScaleType()

void nvinfer1::IDynamicQuantizeLayer::setScaleType ( DataType scaleType )

inlinenoexcept

Set the data type of the scale factors used to quantize the data.

Parameters

scaleType The scale factors data type.

Set the scale-factors type. Currently the only valid value is DataType::kFP8.

◆ setToType()

void nvinfer1::IDynamicQuantizeLayer::setToType ( DataType toType )

inlinenoexcept

Set DynamicQuantizeLayer’s quantized output type.

Parameters

toType The data type of the quantized output tensor.

Set the type of the dynamic quantization layer’s quantized output. Currently the only valid value is DataType::kFP4. If the network is strongly typed, setToType must be used to set the output type, and use of setOutputType is an error. Otherwise, types passed to setOutputType and setToType must be the same.

See also: NetworkDefinitionCreationFlag::kSTRONGLY_TYPED

Member Data Documentation

◆ mImpl

apiv::VDynamicQuantizeLayer* nvinfer1::IDynamicQuantizeLayer::mImpl

protected

The documentation for this class was generated from the following file:

NvInfer.h

Public Member Functions

Protected Member Functions

Protected Attributes

Detailed Description

Constructor & Destructor Documentation

◆ ~IDynamicQuantizeLayer()

Member Function Documentation

◆ getAxis()

◆ getBlockSize()

◆ getScaleType()

◆ getToType()

◆ setAxis()

◆ setBlockSize()

◆ setInput()

◆ setScaleType()

◆ setToType()

Member Data Documentation

◆ mImpl