TensorRT 10.8.0
nvinfer1::IDynamicQuantizeLayer Class Reference

A network layer to perform dynamic quantization. More...

#include <NvInfer.h>

Inheritance diagram for nvinfer1::IDynamicQuantizeLayer:
nvinfer1::ILayer nvinfer1::INoCopy

Public Member Functions

void setToType (DataType toType) noexcept
 Set DynamicQuantizeLayer’s quantized output type. More...
 
DataType getToType () const noexcept
 Return DynamicQuantizeLayer’s quantized output type. More...
 
void setScaleType (DataType scaleType) noexcept
 Set the data type of the scale factors used to quantize the data. More...
 
DataType getScaleType () const noexcept
 Return the scale factors data type. More...
 
void setAxis (int32_t axis) noexcept
 Set the axis along which block quantization occurs. More...
 
int32_t getAxis () const noexcept
 Get the axis along which blocking occurs. More...
 
void setBlockSize (int32_t size) noexcept
 Set the size of the quantization block. More...
 
int32_t getBlockSize () const noexcept
 Get the size of the quantization block. More...
 
void setInput (int32_t index, ITensor &tensor) noexcept
 Append or replace an input of this layer with a specific tensor. More...
 
- Public Member Functions inherited from nvinfer1::ILayer
LayerType getType () const noexcept
 Return the type of a layer. More...
 
void setName (char const *name) noexcept
 Set the name of a layer. More...
 
char const * getName () const noexcept
 Return the name of a layer. More...
 
int32_t getNbInputs () const noexcept
 Get the number of inputs of a layer. More...
 
ITensorgetInput (int32_t index) const noexcept
 Get the layer input corresponding to the given index. More...
 
int32_t getNbOutputs () const noexcept
 Get the number of outputs of a layer. More...
 
ITensorgetOutput (int32_t index) const noexcept
 Get the layer output corresponding to the given index. More...
 
void setInput (int32_t index, ITensor &tensor) noexcept
 Replace an input of this layer with a specific tensor. More...
 
void setPrecision (DataType dataType) noexcept
 Set the preferred or required computational precision of this layer in a weakly-typed network. More...
 
DataType getPrecision () const noexcept
 get the computational precision of this layer More...
 
bool precisionIsSet () const noexcept
 whether the computational precision has been set for this layer More...
 
void resetPrecision () noexcept
 reset the computational precision for this layer More...
 
void setOutputType (int32_t index, DataType dataType) noexcept
 Set the output type of this layer in a weakly-typed network. More...
 
DataType getOutputType (int32_t index) const noexcept
 get the output type of this layer More...
 
bool outputTypeIsSet (int32_t index) const noexcept
 whether the output type has been set for this layer More...
 
void resetOutputType (int32_t index) noexcept
 reset the output type for this layer More...
 
void setMetadata (char const *metadata) noexcept
 Set the metadata for this layer. More...
 
char const * getMetadata () const noexcept
 Get the metadata of the layer. More...
 

Protected Member Functions

virtual ~IDynamicQuantizeLayer () noexcept=default
 
- Protected Member Functions inherited from nvinfer1::ILayer
virtual ~ILayer () noexcept=default
 
- Protected Member Functions inherited from nvinfer1::INoCopy
 INoCopy ()=default
 
virtual ~INoCopy ()=default
 
 INoCopy (INoCopy const &other)=delete
 
INoCopyoperator= (INoCopy const &other)=delete
 
 INoCopy (INoCopy &&other)=delete
 
INoCopyoperator= (INoCopy &&other)=delete
 

Protected Attributes

apiv::VDynamicQuantizeLayer * mImpl
 
- Protected Attributes inherited from nvinfer1::ILayer
apiv::VLayer * mLayer
 

Detailed Description

A network layer to perform dynamic quantization.

This layer accepts a floating-point input tensor and computes the block scale factors needed to quantize the input’s data. It outputs the quantized tensor as its first output and the scale factors as its second output.

Use ILayer::setInput to add an input for the double-quantization scale factor.

Note
Only symmetric quantization is supported.
The input tensor for this layer must not be a scalar.
Warning
Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.

Constructor & Destructor Documentation

◆ ~IDynamicQuantizeLayer()

virtual nvinfer1::IDynamicQuantizeLayer::~IDynamicQuantizeLayer ( )
protectedvirtualdefaultnoexcept

Member Function Documentation

◆ getAxis()

int32_t nvinfer1::IDynamicQuantizeLayer::getAxis ( ) const
inlinenoexcept

Get the axis along which blocking occurs.

See also
setAxis()

◆ getBlockSize()

int32_t nvinfer1::IDynamicQuantizeLayer::getBlockSize ( ) const
inlinenoexcept

Get the size of the quantization block.

See also
setBlockSize()

◆ getScaleType()

DataType nvinfer1::IDynamicQuantizeLayer::getScaleType ( ) const
inlinenoexcept

Return the scale factors data type.

Returns
scaleType parameter set during layer creation or by setScaleType().

The return value is the type of the scale factors used to quantize the dynamic data. The default value is DataType::kFP8.

◆ getToType()

DataType nvinfer1::IDynamicQuantizeLayer::getToType ( ) const
inlinenoexcept

Return DynamicQuantizeLayer’s quantized output type.

Returns
toType parameter set during layer creation or by setToType().

The return value is the type of the quantized output tensor. The default value is DataType::kFP4.

◆ setAxis()

void nvinfer1::IDynamicQuantizeLayer::setAxis ( int32_t  axis)
inlinenoexcept

Set the axis along which block quantization occurs.

The axis must be the last dimension or second to last dimension. The input's shape along the axis must be constant.

See also
getAxis()

◆ setBlockSize()

void nvinfer1::IDynamicQuantizeLayer::setBlockSize ( int32_t  size)
inlinenoexcept

Set the size of the quantization block.

Note: The block size must divide the input in the blocked axis without remainder. Currently only 16-element blocks are supported.

See also
getBlockSize()

◆ setInput()

void nvinfer1::ILayer::setInput ( int32_t  index,
ITensor tensor 
)
inlinenoexcept

Append or replace an input of this layer with a specific tensor.

Parameters
indexthe index of the input to modify.
tensorthe new input tensor

Input 0 is the input activation tensor. Input 1 is the double-quantization scale factor. This scale is used to quantize the dynamically computed high-precision scale factors that are used to quantize the activation data. Currently this input must be a positive scalar (a 0D tensor).

◆ setScaleType()

void nvinfer1::IDynamicQuantizeLayer::setScaleType ( DataType  scaleType)
inlinenoexcept

Set the data type of the scale factors used to quantize the data.

Parameters
scaleTypeThe scale factors data type.

Set the scale-factors type. Currently the only valid value is DataType::kFP8.

◆ setToType()

void nvinfer1::IDynamicQuantizeLayer::setToType ( DataType  toType)
inlinenoexcept

Set DynamicQuantizeLayer’s quantized output type.

Parameters
toTypeThe data type of the quantized output tensor.

Set the type of the dynamic quantization layer’s quantized output. Currently the only valid value is DataType::kFP4. If the network is strongly typed, setToType must be used to set the output type, and use of setOutputType is an error. Otherwise, types passed to setOutputType and setToType must be the same.

See also
NetworkDefinitionCreationFlag::kSTRONGLY_TYPED

Member Data Documentation

◆ mImpl

apiv::VDynamicQuantizeLayer* nvinfer1::IDynamicQuantizeLayer::mImpl
protected

The documentation for this class was generated from the following file:

  Copyright © 2024 NVIDIA Corporation
  Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact