TensorRT 10.8.0
|
A network layer to perform dynamic quantization. More...
#include <NvInfer.h>
Public Member Functions | |
void | setToType (DataType toType) noexcept |
Set DynamicQuantizeLayer’s quantized output type. More... | |
DataType | getToType () const noexcept |
Return DynamicQuantizeLayer’s quantized output type. More... | |
void | setScaleType (DataType scaleType) noexcept |
Set the data type of the scale factors used to quantize the data. More... | |
DataType | getScaleType () const noexcept |
Return the scale factors data type. More... | |
void | setAxis (int32_t axis) noexcept |
Set the axis along which block quantization occurs. More... | |
int32_t | getAxis () const noexcept |
Get the axis along which blocking occurs. More... | |
void | setBlockSize (int32_t size) noexcept |
Set the size of the quantization block. More... | |
int32_t | getBlockSize () const noexcept |
Get the size of the quantization block. More... | |
void | setInput (int32_t index, ITensor &tensor) noexcept |
Append or replace an input of this layer with a specific tensor. More... | |
![]() | |
LayerType | getType () const noexcept |
Return the type of a layer. More... | |
void | setName (char const *name) noexcept |
Set the name of a layer. More... | |
char const * | getName () const noexcept |
Return the name of a layer. More... | |
int32_t | getNbInputs () const noexcept |
Get the number of inputs of a layer. More... | |
ITensor * | getInput (int32_t index) const noexcept |
Get the layer input corresponding to the given index. More... | |
int32_t | getNbOutputs () const noexcept |
Get the number of outputs of a layer. More... | |
ITensor * | getOutput (int32_t index) const noexcept |
Get the layer output corresponding to the given index. More... | |
void | setInput (int32_t index, ITensor &tensor) noexcept |
Replace an input of this layer with a specific tensor. More... | |
void | setPrecision (DataType dataType) noexcept |
Set the preferred or required computational precision of this layer in a weakly-typed network. More... | |
DataType | getPrecision () const noexcept |
get the computational precision of this layer More... | |
bool | precisionIsSet () const noexcept |
whether the computational precision has been set for this layer More... | |
void | resetPrecision () noexcept |
reset the computational precision for this layer More... | |
void | setOutputType (int32_t index, DataType dataType) noexcept |
Set the output type of this layer in a weakly-typed network. More... | |
DataType | getOutputType (int32_t index) const noexcept |
get the output type of this layer More... | |
bool | outputTypeIsSet (int32_t index) const noexcept |
whether the output type has been set for this layer More... | |
void | resetOutputType (int32_t index) noexcept |
reset the output type for this layer More... | |
void | setMetadata (char const *metadata) noexcept |
Set the metadata for this layer. More... | |
char const * | getMetadata () const noexcept |
Get the metadata of the layer. More... | |
Protected Member Functions | |
virtual | ~IDynamicQuantizeLayer () noexcept=default |
![]() | |
virtual | ~ILayer () noexcept=default |
![]() | |
INoCopy ()=default | |
virtual | ~INoCopy ()=default |
INoCopy (INoCopy const &other)=delete | |
INoCopy & | operator= (INoCopy const &other)=delete |
INoCopy (INoCopy &&other)=delete | |
INoCopy & | operator= (INoCopy &&other)=delete |
Protected Attributes | |
apiv::VDynamicQuantizeLayer * | mImpl |
![]() | |
apiv::VLayer * | mLayer |
A network layer to perform dynamic quantization.
This layer accepts a floating-point input tensor and computes the block scale factors needed to quantize the input’s data. It outputs the quantized tensor as its first output and the scale factors as its second output.
Use ILayer::setInput to add an input for the double-quantization scale factor.
|
protectedvirtualdefaultnoexcept |
|
inlinenoexcept |
Get the axis along which blocking occurs.
|
inlinenoexcept |
Get the size of the quantization block.
|
inlinenoexcept |
Return the scale factors data type.
The return value is the type of the scale factors used to quantize the dynamic data. The default value is DataType::kFP8.
|
inlinenoexcept |
Return DynamicQuantizeLayer’s quantized output type.
The return value is the type of the quantized output tensor. The default value is DataType::kFP4.
|
inlinenoexcept |
Set the axis along which block quantization occurs.
The axis must be the last dimension or second to last dimension. The input's shape along the axis must be constant.
|
inlinenoexcept |
Set the size of the quantization block.
Note: The block size must divide the input in the blocked axis without remainder. Currently only 16-element blocks are supported.
|
inlinenoexcept |
Append or replace an input of this layer with a specific tensor.
index | the index of the input to modify. |
tensor | the new input tensor |
Input 0 is the input activation tensor. Input 1 is the double-quantization scale factor. This scale is used to quantize the dynamically computed high-precision scale factors that are used to quantize the activation data. Currently this input must be a positive scalar (a 0D tensor).
|
inlinenoexcept |
Set the data type of the scale factors used to quantize the data.
scaleType | The scale factors data type. |
Set the scale-factors type. Currently the only valid value is DataType::kFP8.
|
inlinenoexcept |
Set DynamicQuantizeLayer’s quantized output type.
toType | The data type of the quantized output tensor. |
Set the type of the dynamic quantization layer’s quantized output. Currently the only valid value is DataType::kFP4. If the network is strongly typed, setToType must be used to set the output type, and use of setOutputType is an error. Otherwise, types passed to setOutputType and setToType must be the same.
|
protected |
Copyright © 2024 NVIDIA Corporation
Privacy Policy |
Manage My Privacy |
Do Not Sell or Share My Data |
Terms of Service |
Accessibility |
Corporate Policies |
Product Security |
Contact