TensorRT 8.2.5
|
A Dequantize layer in a network definition. More...
#include <NvInfer.h>
Public Member Functions | |
int32_t | getAxis () const noexcept |
Get the quantization axis. More... | |
void | setAxis (int32_t axis) noexcept |
Set the quantization axis. More... | |
![]() | |
LayerType | getType () const noexcept |
Return the type of a layer. More... | |
void | setName (const char *name) noexcept |
Set the name of a layer. More... | |
const char * | getName () const noexcept |
Return the name of a layer. More... | |
int32_t | getNbInputs () const noexcept |
Get the number of inputs of a layer. | |
ITensor * | getInput (int32_t index) const noexcept |
Get the layer input corresponding to the given index. More... | |
int32_t | getNbOutputs () const noexcept |
Get the number of outputs of a layer. | |
ITensor * | getOutput (int32_t index) const noexcept |
Get the layer output corresponding to the given index. More... | |
void | setInput (int32_t index, ITensor &tensor) noexcept |
Replace an input of this layer with a specific tensor. More... | |
void | setPrecision (DataType dataType) noexcept |
Set the computational precision of this layer. More... | |
DataType | getPrecision () const noexcept |
get the computational precision of this layer More... | |
bool | precisionIsSet () const noexcept |
whether the computational precision has been set for this layer More... | |
void | resetPrecision () noexcept |
reset the computational precision for this layer More... | |
void | setOutputType (int32_t index, DataType dataType) noexcept |
Set the output type of this layer. More... | |
DataType | getOutputType (int32_t index) const noexcept |
get the output type of this layer More... | |
bool | outputTypeIsSet (int32_t index) const noexcept |
whether the output type has been set for this layer More... | |
void | resetOutputType (int32_t index) noexcept |
reset the output type for this layer More... | |
Protected Attributes | |
apiv::VDequantizeLayer * | mImpl |
![]() | |
apiv::VLayer * | mLayer |
Additional Inherited Members | |
![]() | |
INoCopy (const INoCopy &other)=delete | |
INoCopy & | operator= (const INoCopy &other)=delete |
INoCopy (INoCopy &&other)=delete | |
INoCopy & | operator= (INoCopy &&other)=delete |
A Dequantize layer in a network definition.
This layer accepts a signed 8-bit integer input tensor, and uses the configured scale and zeroPt inputs to dequantize the input according to: output
= (input
- zeroPt
) * scale
The first input (index 0) is the tensor to be quantized. The second (index 1) and third (index 2) are the scale and zero point respectively. Each of scale
and zeroPt
must be either a scalar, or a 1D tensor.
The zeroPt
tensor is optional, and if not set, will be assumed to be zero. Its data type must be DataType::kINT8. zeroPt
must only contain zero-valued coefficients, because only symmetric quantization is supported. The scale
value must be either a scalar for per-tensor quantization, or a 1D tensor for per-channel quantization. All scale
coefficients must have positive values. The size of the 1-D scale
tensor must match the size of the quantization axis. The size of the scale
must match the size of the zeroPt
.
The subgraph which terminates with the scale
tensor must be a build-time constant. The same restrictions apply to the zeroPt
. The output type, if constrained, must be constrained to DataType::kINT8. The input type, if constrained, must be constrained to DataType::kFLOAT (FP16 input is not supported). The output size is the same as the input size. The quantization axis is in reference to the input tensor's dimensions.
IDequantizeLayer only supports DataType::kINT8 precision and will default to this precision during instantiation. IDequantizeLayer only supports DataType::kFLOAT output.
As an example of the operation of this layer, imagine a 4D NCHW activation input which can be quantized using a single scale coefficient (referred to as per-tensor quantization): For each n in N: For each c in C: For each h in H: For each w in W: output[n,c,h,w] = (input
[n,c,h,w] - zeroPt
) * scale
Per-channel dequantization is supported only for input that is rooted at an IConstantLayer (i.e. weights). Activations cannot be quantized per-channel. As an example of per-channel operation, imagine a 4D KCRS weights input and K (dimension 0) as the quantization axis. The scale is an array of coefficients, which is the same size as the quantization axis. For each k in K: For each c in C: For each r in R: For each s in S: output[k,c,r,s] = (input
[k,c,r,s] - zeroPt
[k]) * scale
[k]
scale
and \zeroPt subgraphs are:
|
inlinenoexcept |
Get the quantization axis.
|
inlinenoexcept |
Set the quantization axis.
Set the index of the quantization axis (with reference to the input tensor's dimensions). The axis must be a valid axis if the scale tensor has more than one coefficient. The axis value will be ignored if the scale tensor has exactly one coefficient (per-tensor quantization).
Copyright © 2024 NVIDIA Corporation
Privacy Policy |
Manage My Privacy |
Do Not Sell or Share My Data |
Terms of Service |
Accessibility |
Corporate Policies |
Product Security |
Contact