Helper for constructing an attention that consumes query, key and value tensors. More...

#include <NvInfer.h>

Inheritance diagram for nvinfer1::IAttention:

Public Member Functions
bool	setNormalizationOperation (AttentionNormalizationOp op) noexcept
	Set the normalization operation for the attention. More...

AttentionNormalizationOp	getNormalizationOperation () const noexcept
	Get the normalization operation for the attention. More...

bool	setMask (ITensor &mask) noexcept
	Set whether a mask will be used for the normalization operation. More...

ITensor *	getMask () noexcept
	Get the optional mask in attention. More...

bool	setCausal (bool isCausal) noexcept
	Set whether the attention will run a causal inference. Cannot be used together with setMask(). More...

bool	getCausal () const noexcept
	Get whether the attention will run a causal inference. More...

bool	setDecomposable (bool decomposable) noexcept
	Set whether the attention can be decomposed to use multiple kernels if no fused kernel support found. More...

bool	getDecomposable () const noexcept
	Get whether the attention can be decomposed to use multiple kernels if no fused kernel support found. More...

bool	setInput (int32_t index, ITensor &input) noexcept
	Append or replace an input of this layer with a specific tensor. More...

int32_t	getNbInputs () const noexcept
	Get the number of inputs of IAttention. IAttention has three inputs. More...

ITensor *	getInput (int32_t index) const noexcept
	Get the IAttention input corresponding to the given index. More...

int32_t	getNbOutputs () const noexcept
	Get the number of outputs of a layer. IAttention has one output. More...

ITensor *	getOutput (int32_t index) const noexcept
	Get the IAttention output corresponding to the given index. IAttention has only one output. More...

bool	setName (char const *name) noexcept
	Set the name of the attention. More...

char const *	getName () const noexcept
	Return the name of the attention. More...

Protected Member Functions
virtual	~IAttention () noexcept=default

Protected Member Functions inherited from nvinfer1::INoCopy
	INoCopy ()=default

virtual	~INoCopy ()=default

	INoCopy (INoCopy const &other)=delete

INoCopy &	operator= (INoCopy const &other)=delete

	INoCopy (INoCopy &&other)=delete

INoCopy &	operator= (INoCopy &&other)=delete

Protected Attributes
apiv::VAttention *	mImpl

Detailed Description

Helper for constructing an attention that consumes query, key and value tensors.

An attention subgraph implicitly includes three main components, two MatrixMultiply layers known as BMM1 and BMM2, and one normalization operation which defaults to be a Softmax. By default, IAttention is not decomposable and TensorRT will try to use a single fused kernel, which may be more efficient than if the subgraph is expressed without IAttention. Setting the IAttention to decomposable=True can allow IAttention to be decomposed to use multiple kernels if no fused kernel support found.

Query Key Value Mask (optional) | | | | | Transpose | | | | | | -—BMM1-— | | | | | *------------------------— | | Normalization | | | ----—BMM2---— | Output

The attention has the following inputs, in order of input index:

Query contains the input query. It is a tensor of type kFLOAT, kHALF or kBF16 with shape [batchSize, numHeadsQuery, sequenceLengthQuery, dimHead]
Key contains the input key. It is a tensor of type kFLOAT, kHALF or kBF16 with shape [batchSize, numHeadsKeyValue, sequenceLengthKeyValue, dimHead]
Value contains the input value. It is a tensor of type kFLOAT, kHALF or kBF16 with shape [batchSize, numHeadsKeyValue, sequenceLengthKeyValue, dimHead]
Mask (optional) contains the mask value. It is a tensor of type kBOOL or the same data type of BMM1 output with shape [batchSize, numHeadsQuery, sequenceLengthQuery, sequenceLengthKeyValue] with batchSize and numHeadsQuery broadcastable. For a kBOOL mask, a True value indicates that the corresponding position is allowed to attend. For other data types, the mask values will be added to the BMM1 output, known as an add mask.

Warning: Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.

Constructor & Destructor Documentation

◆ ~IAttention()

virtual nvinfer1::IAttention::~IAttention ( )

protectedvirtualdefaultnoexcept

Member Function Documentation

◆ getCausal()

bool nvinfer1::IAttention::getCausal ( ) const

inlinenoexcept

Get whether the attention will run a causal inference.

See also: setCausal

Returns: True if the attention will run a causal inference, false otherwise. Default is false.

◆ getDecomposable()

bool nvinfer1::IAttention::getDecomposable ( ) const

inlinenoexcept

Get whether the attention can be decomposed to use multiple kernels if no fused kernel support found.

Returns: True if the attention can be decomposed to use multiple kernels by the compiler, false otherwise. Default is false.

See also: setDecomposable

◆ getInput()

ITensor * nvinfer1::IAttention::getInput ( int32_t index ) const

inlinenoexcept

Get the IAttention input corresponding to the given index.

Parameters

index The index of the input tensor.

Returns: The input tensor, or nullptr if the index is out of range.

◆ getMask()

ITensor * nvinfer1::IAttention::getMask ( )

inlinenoexcept

Get the optional mask in attention.

See also: setMask

Returns: The optional mask in attention, nullptr if no mask is set.

◆ getName()

char const * nvinfer1::IAttention::getName ( ) const

inlinenoexcept

Return the name of the attention.

See also: setName()

Returns: The name of the attention.

◆ getNbInputs()

int32_t nvinfer1::IAttention::getNbInputs ( ) const

inlinenoexcept

Get the number of inputs of IAttention. IAttention has three inputs.

Returns: The number of inputs of IAttention.

◆ getNbOutputs()

int32_t nvinfer1::IAttention::getNbOutputs ( ) const

inlinenoexcept

Get the number of outputs of a layer. IAttention has one output.

◆ getNormalizationOperation()

AttentionNormalizationOp nvinfer1::IAttention::getNormalizationOperation ( ) const

inlinenoexcept

Get the normalization operation for the attention.

See also: setNormalizationOperation(), AttentionNormalizationOp

Returns: The normalization operation for the attention. Default is kSOFTMAX.

◆ getOutput()

ITensor * nvinfer1::IAttention::getOutput ( int32_t index ) const

inlinenoexcept

Get the IAttention output corresponding to the given index. IAttention has only one output.

Parameters

index The index of the output tensor.

Returns: The indexed output tensor, or nullptr if the index is out of range.

◆ setCausal()

bool nvinfer1::IAttention::setCausal ( bool isCausal )

inlinenoexcept

Set whether the attention will run a causal inference. Cannot be used together with setMask().

See also: getCausal

Returns: True if the causal inference is set successfully, false otherwise.

◆ setDecomposable()

bool nvinfer1::IAttention::setDecomposable ( bool decomposable )

inlinenoexcept

Set whether the attention can be decomposed to use multiple kernels if no fused kernel support found.

See also: getDecomposable

Returns: True if the decomposable attention is set successfully, false otherwise.

◆ setInput()

bool nvinfer1::IAttention::setInput	(	int32_t	index,
		ITensor &	input
	)

inlinenoexcept

Append or replace an input of this layer with a specific tensor.

Parameters

index	the index of the input to modify.
input	the new input tensor.

The indices are as follows:

Input 0 is the input query tensor. Input 1 is the input key tensor. Input 2 is the input value tensor.

Returns: True if the input tensor is set successfully, false otherwise.

◆ setMask()

bool nvinfer1::IAttention::setMask ( ITensor & mask )

inlinenoexcept

Set whether a mask will be used for the normalization operation.

Parameters

mask the mask tensor of type kBOOL or the same data type of BMM1 output with shape [batchSize, sequenceLengthQuery, sequenceLengthKeyValue]. For a kBOOL mask, a True value indicates that the corresponding position is allowed to attend. For other data types, the mask values will be added to the BMM1 output, known as an add mask.

See also: getMask

Returns: True if the mask is set successfully, false otherwise.

◆ setName()

bool nvinfer1::IAttention::setName ( char const * name )

inlinenoexcept

Set the name of the attention.

The name is used in error diagnostics. This method copies the name string.

Warning: The string name must be null-terminated, and be at most 4096 bytes including the terminator.

See also: getName()

Returns: True if the name is set successfully, false otherwise.

◆ setNormalizationOperation()

bool nvinfer1::IAttention::setNormalizationOperation ( AttentionNormalizationOp op )

inlinenoexcept

Set the normalization operation for the attention.

See also: getNormalizationOperation(), AttentionNormalizationOp

Returns: True if the normalization operation is set successfully, false otherwise.

Member Data Documentation

◆ mImpl

apiv::VAttention* nvinfer1::IAttention::mImpl

protected

The documentation for this class was generated from the following file:

NvInfer.h

Public Member Functions

Protected Member Functions

Protected Attributes

Detailed Description

Constructor & Destructor Documentation

◆ ~IAttention()

Member Function Documentation

◆ getCausal()

◆ getDecomposable()

◆ getInput()

◆ getMask()

◆ getName()

◆ getNbInputs()

◆ getNbOutputs()

◆ getNormalizationOperation()

◆ getOutput()

◆ setCausal()

◆ setDecomposable()

◆ setInput()

◆ setMask()

◆ setName()

◆ setNormalizationOperation()

Member Data Documentation

◆ mImpl