Context for executing inference using an engine, with functionally unsafe features. More...

#include <NvInferRuntime.h>

Inheritance diagram for nvinfer1::IExecutionContext:

Public Member Functions
bool	execute (int32_t batchSize, void const bindings) noexcept
	Synchronously execute inference on a batch. More...

bool	enqueue (int32_t batchSize, void const bindings, cudaStream_t stream, cudaEvent_t *inputConsumed) noexcept
	Asynchronously execute inference on a batch. More...

void	setDebugSync (bool sync) noexcept
	Set the debug sync flag. More...

bool	getDebugSync () const noexcept
	Get the debug sync flag. More...

void	setProfiler (IProfiler *profiler) noexcept
	Set the profiler. More...

IProfiler *	getProfiler () const noexcept
	Get the profiler. More...

const ICudaEngine &	getEngine () const noexcept
	Get the associated engine. More...

TRT_DEPRECATED void	destroy () noexcept
	Destroy this object. More...

void	setName (const char *name) noexcept
	Set the name of the execution context. More...

const char *	getName () const noexcept
	Return the name of the execution context. More...

void	setDeviceMemory (void *memory) noexcept
	Set the device memory for use by this execution context. More...

Dims	getStrides (int32_t bindingIndex) const noexcept
	Return the strides of the buffer for the given binding. More...

TRT_DEPRECATED bool	setOptimizationProfile (int32_t profileIndex) noexcept
	Select an optimization profile for the current context. More...

int32_t	getOptimizationProfile () const noexcept
	Get the index of the currently selected optimization profile. More...

bool	setBindingDimensions (int32_t bindingIndex, Dims dimensions) noexcept
	Set the dynamic dimensions of a binding. More...

Dims	getBindingDimensions (int32_t bindingIndex) const noexcept
	Get the dynamic dimensions of a binding. More...

bool	setInputShapeBinding (int32_t bindingIndex, int32_t const *data) noexcept
	Set values of input tensor required by shape calculations. More...

bool	getShapeBinding (int32_t bindingIndex, int32_t *data) const noexcept
	Get values of an input tensor required for shape calculations or an output tensor produced by shape calculations. More...

bool	allInputDimensionsSpecified () const noexcept
	Whether all dynamic dimensions of input tensors have been specified. More...

bool	allInputShapesSpecified () const noexcept
	Whether all input shape bindings have been specified. More...

void	setErrorRecorder (IErrorRecorder *recorder) noexcept
	Set the ErrorRecorder for this interface. More...

IErrorRecorder *	getErrorRecorder () const noexcept
	Get the ErrorRecorder assigned to this interface. More...

bool	executeV2 (void const bindings) noexcept
	Synchronously execute inference a network. More...

bool	enqueueV2 (void const bindings, cudaStream_t stream, cudaEvent_t *inputConsumed) noexcept
	Asynchronously execute inference. More...

bool	setOptimizationProfileAsync (int32_t profileIndex, cudaStream_t stream) noexcept
	Select an optimization profile for the current context with async semantics. More...

void	setEnqueueEmitsProfile (bool enqueueEmitsProfile) noexcept
	Set whether enqueue emits layer timing to the profiler. More...

bool	getEnqueueEmitsProfile () const noexcept
	Get the enqueueEmitsProfile state. More...

bool	reportToProfiler () const noexcept
	Calculate layer timing info for the current optimization profile in IExecutionContext and update the profiler after one iteration of inference launch. More...

Protected Attributes
apiv::VExecutionContext *	mImpl

Additional Inherited Members
Protected Member Functions inherited from nvinfer1::INoCopy
	INoCopy (const INoCopy &other)=delete

INoCopy &	operator= (const INoCopy &other)=delete

	INoCopy (INoCopy &&other)=delete

INoCopy &	operator= (INoCopy &&other)=delete

Detailed Description

Context for executing inference using an engine, with functionally unsafe features.

Multiple execution contexts may exist for one ICudaEngine instance, allowing the same engine to be used for the execution of multiple batches simultaneously. If the engine supports dynamic shapes, each execution context in concurrent use must use a separate optimization profile.

Warning: Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.

Member Function Documentation

◆ allInputDimensionsSpecified()

bool nvinfer1::IExecutionContext::allInputDimensionsSpecified ( ) const

inlinenoexcept

Whether all dynamic dimensions of input tensors have been specified.

Returns: True if all dynamic dimensions of input tensors have been specified by calling setBindingDimensions().

Trivially true if network has no dynamically shaped input tensors.

See also: setBindingDimensions(bindingIndex,dimensions)

◆ allInputShapesSpecified()

bool nvinfer1::IExecutionContext::allInputShapesSpecified ( ) const

inlinenoexcept

Whether all input shape bindings have been specified.

Returns: True if all input shape bindings have been specified by setInputShapeBinding().

Trivially true if network has no input shape bindings.

See also: isShapeBinding(bindingIndex)

◆ destroy()

TRT_DEPRECATED void nvinfer1::IExecutionContext::destroy ( )

inlinenoexcept

Destroy this object.

Deprecated:: Deprecated interface will be removed in TensorRT 10.0.

Warning: Calling destroy on a managed pointer will result in a double-free error.

◆ enqueue()

bool nvinfer1::IExecutionContext::enqueue	(	int32_t	batchSize,
		void const	bindings,
		cudaStream_t	stream,
		cudaEvent_t *	inputConsumed
	)

inlinenoexcept

Asynchronously execute inference on a batch.

This method requires an array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::getBindingIndex()

Parameters

batchSize	The batch size. This is at most the value supplied when the engine was built.
bindings	An array of pointers to input and output buffers for the network.
stream	A cuda stream on which the inference kernels will be enqueued.
inputConsumed	An optional event which will be signaled when the input buffers can be refilled with new data.

Returns: True if the kernels were enqueued successfully.

See also: ICudaEngine::getBindingIndex() ICudaEngine::getMaxBatchSize()

Warning: Calling enqueue() in from the same IExecutionContext object with different CUDA streams concurrently results in undefined behavior. To perform inference concurrently in multiple streams, use one execution context per stream.; This function will trigger layer resource updates if hasImplicitBatchDimension() returns true and batchSize changes between subsequent calls, possibly resulting in performance bottlenecks.

◆ enqueueV2()

bool nvinfer1::IExecutionContext::enqueueV2	(	void const	bindings,
		cudaStream_t	stream,
		cudaEvent_t *	inputConsumed
	)

inlinenoexcept

Asynchronously execute inference.

This method requires an array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::getBindingIndex(). This method only works for execution contexts built with full dimension networks.

Parameters

bindings	An array of pointers to input and output buffers for the network.
stream	A cuda stream on which the inference kernels will be enqueued
inputConsumed	An optional event which will be signaled when the input buffers can be refilled with new data

Returns: True if the kernels were enqueued successfully.

See also: ICudaEngine::getBindingIndex() ICudaEngine::getMaxBatchSize()

Note: Calling enqueueV2() with a stream in CUDA graph capture mode has a known issue. If dynamic shapes are used, the first enqueueV2() call after a setInputShapeBinding() call will cause failure in stream capture due to resource allocation. Please call enqueueV2() once before capturing the graph.

Warning: Calling enqueueV2() in from the same IExecutionContext object with different CUDA streams concurrently results in undefined behavior. To perform inference concurrently in multiple streams, use one execution context per stream.

◆ execute()

bool nvinfer1::IExecutionContext::execute	(	int32_t	batchSize,
		void const	bindings
	)

inlinenoexcept

Synchronously execute inference on a batch.

This method requires an array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::getBindingIndex()

Parameters

batchSize	The batch size. This is at most the value supplied when the engine was built.
bindings	An array of pointers to input and output buffers for the network.

Returns: True if execution succeeded.

Warning: This function will trigger layer resource updates if hasImplicitBatchDimension() returns true and batchSize changes between subsequent calls, possibly resulting in performance bottlenecks.

See also: ICudaEngine::getBindingIndex() ICudaEngine::getMaxBatchSize()

◆ executeV2()

bool nvinfer1::IExecutionContext::executeV2 ( void *const * bindings )

inlinenoexcept

Synchronously execute inference a network.

This method requires an array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::getBindingIndex(). This method only works for execution contexts built with full dimension networks.

Parameters

bindings An array of pointers to input and output buffers for the network.

Returns: True if execution succeeded.

See also: ICudaEngine::getBindingIndex() ICudaEngine::getMaxBatchSize()

◆ getBindingDimensions()

Dims nvinfer1::IExecutionContext::getBindingDimensions ( int32_t bindingIndex ) const

inlinenoexcept

Get the dynamic dimensions of a binding.

If the engine was built with an implicit batch dimension, same as ICudaEngine::getBindingDimensions.

If setBindingDimensions() has been called on this binding (or if there are no dynamic dimensions), all dimensions will be positive. Otherwise, it is necessary to call setBindingDimensions() before enqueue() or execute() may be called.

If the bindingIndex is out of range, an invalid Dims with nbDims == -1 is returned. The same invalid Dims will be returned if the engine was not built with an implicit batch dimension and if the execution context is not currently associated with a valid optimization profile (i.e. if getOptimizationProfile() returns -1).

If ICudaEngine::bindingIsInput(bindingIndex) is false, then both allInputDimensionsSpecified() and allInputShapesSpecified() must be true before calling this method.

Returns: Currently selected binding dimensions

For backwards compatibility with earlier versions of TensorRT, a bindingIndex that does not belong to the current profile is corrected as described for ICudaEngine::getProfileDimensions.

See also: ICudaEngine::getProfileDimensions

◆ getDebugSync()

bool nvinfer1::IExecutionContext::getDebugSync ( ) const

inlinenoexcept

Get the debug sync flag.

See also: setDebugSync()

◆ getEngine()

const ICudaEngine & nvinfer1::IExecutionContext::getEngine ( ) const

inlinenoexcept

Get the associated engine.

See also: ICudaEngine

◆ getEnqueueEmitsProfile()

bool nvinfer1::IExecutionContext::getEnqueueEmitsProfile ( ) const

inlinenoexcept

Get the enqueueEmitsProfile state.

Returns: The enqueueEmitsProfile state.

See also: IExecutionContext::setEnqueueEmitsProfile()

◆ getErrorRecorder()

IErrorRecorder * nvinfer1::IExecutionContext::getErrorRecorder ( ) const

inlinenoexcept

Get the ErrorRecorder assigned to this interface.

Retrieves the assigned error recorder object for the given class. A nullptr will be returned if an error handler has not been set.

Returns: A pointer to the IErrorRecorder object that has been registered.

See also: setErrorRecorder()

◆ getName()

const char * nvinfer1::IExecutionContext::getName ( ) const

inlinenoexcept

Return the name of the execution context.

See also: setName()

◆ getOptimizationProfile()

int32_t nvinfer1::IExecutionContext::getOptimizationProfile ( ) const

inlinenoexcept

Get the index of the currently selected optimization profile.

If the profile index has not been set yet (implicitly to 0 for the first execution context to be created, or explicitly for all subsequent contexts), an invalid value of -1 will be returned and all calls to enqueue() or execute() will fail until a valid profile index has been set.

◆ getProfiler()

IProfiler * nvinfer1::IExecutionContext::getProfiler ( ) const

inlinenoexcept

Get the profiler.

See also: IProfiler setProfiler()

◆ getShapeBinding()

bool nvinfer1::IExecutionContext::getShapeBinding	(	int32_t	bindingIndex,
		int32_t *	data
	)		const

inlinenoexcept

Get values of an input tensor required for shape calculations or an output tensor produced by shape calculations.

Parameters

bindingIndex	index of an input or output tensor for which ICudaEngine::isShapeBinding(bindingIndex) is true.
data	pointer to where values will be written. The number of values written is the product of the dimensions returned by getBindingDimensions(bindingIndex).

If ICudaEngine::bindingIsInput(bindingIndex) is false, then both allInputDimensionsSpecified() and allInputShapesSpecified() must be true before calling this method. The method will also fail if no valid optimization profile has been set for the current execution context, i.e. if getOptimizationProfile() returns -1.

See also: isShapeBinding(bindingIndex)

◆ getStrides()

Dims nvinfer1::IExecutionContext::getStrides ( int32_t bindingIndex ) const

inlinenoexcept

Return the strides of the buffer for the given binding.

The strides are in units of elements, not components or bytes. For example, for TensorFormat::kHWC8, a stride of one spans 8 scalars.

Note that strides can be different for different execution contexts with dynamic shapes.

If the bindingIndex is invalid or there are dynamic dimensions that have not been set yet, returns Dims with Dims::nbDims = -1.

Parameters

bindingIndex The binding index.

See also: getBindingComponentsPerElement

◆ reportToProfiler()

bool nvinfer1::IExecutionContext::reportToProfiler ( ) const

inlinenoexcept

Calculate layer timing info for the current optimization profile in IExecutionContext and update the profiler after one iteration of inference launch.

If IExecutionContext::getEnqueueEmitsProfile() returns true, the enqueue function will calculate layer timing implicitly if a profiler is provided. There is no need to call this function.

If IExecutionContext::getEnqueueEmitsProfile() returns false, the enqueue function will record the CUDA event timers if a profiler is provided. But it will not perform the layer timing calculation. IExecutionContext::reportToProfiler() needs to be called explicitly to calculate layer timing for the previous inference launch.

In the CUDA graph launch scenario, it will record the same set of CUDA events as in regular enqueue functions if the graph is captured from an IExecutionContext with profiler enabled. This function needs to be called after graph launch to report the layer timing info to the profiler.

Warning: profiling CUDA graphs is only available from CUDA 11.1 onwards.

Returns: true if the call succeeded, else false (e.g. profiler not provided, in CUDA graph capture mode, etc.)

See also: IExecutionContext::setEnqueueEmitsProfile(); IExecutionContext::getEnqueueEmitsProfile()

◆ setBindingDimensions()

bool nvinfer1::IExecutionContext::setBindingDimensions	(	int32_t	bindingIndex,
		Dims	dimensions
	)

inlinenoexcept

Set the dynamic dimensions of a binding.

Parameters

bindingIndex	index of an input tensor whose dimensions must be compatible with the network definition (i.e. only the wildcard dimension -1 can be replaced with a new dimension >= 0).
dimensions	specifies the dimensions of the input tensor. It must be in the valid range for the currently selected optimization profile, and the corresponding engine must not be safety-certified.

This method requires the engine to be built without an implicit batch dimension. This method will fail unless a valid optimization profile is defined for the current execution context (getOptimizationProfile() must not be -1).

For all dynamic non-output bindings (which have at least one wildcard dimension of -1), this method needs to be called before either enqueue() or execute() may be called. This can be checked using the method allInputDimensionsSpecified().

Warning: This function will trigger layer resource updates on the next call of enqueue[V2]()/execute[V2](), possibly resulting in performance bottlenecks, if the dimensions are different than the previous set dimensions.

Returns: false if an error occurs (e.g. bindingIndex is out of range for the currently selected optimization profile or binding dimension is inconsistent with min-max range of the optimization profile), else true. Note that the network can still be invalid for certain combinations of input shapes that lead to invalid output shapes. To confirm the correctness of the network input shapes, check whether the output binding has valid dimensions using getBindingDimensions() on the output bindingIndex.

See also: ICudaEngine::getBindingIndex

◆ setDebugSync()

void nvinfer1::IExecutionContext::setDebugSync ( bool sync )

inlinenoexcept

Set the debug sync flag.

If this flag is set to true, the engine will log the successful execution for each kernel during execute(). It has no effect when using enqueue().

See also: getDebugSync()

◆ setDeviceMemory()

void nvinfer1::IExecutionContext::setDeviceMemory ( void * memory )

inlinenoexcept

Set the device memory for use by this execution context.

The memory must be aligned with cuda memory alignment property (using cudaGetDeviceProperties()), and its size must be at least that returned by getDeviceMemorySize(). Setting memory to nullptr is acceptable if getDeviceMemorySize() returns 0. If using enqueue() to run the network, the memory is in use from the invocation of enqueue() until network execution is complete. If using execute(), it is in use until execute() returns. Releasing or otherwise using the memory for other purposes during this time will result in undefined behavior.

See also: ICudaEngine::getDeviceMemorySize() ICudaEngine::createExecutionContextWithoutDeviceMemory()

◆ setEnqueueEmitsProfile()

void nvinfer1::IExecutionContext::setEnqueueEmitsProfile ( bool enqueueEmitsProfile )

inlinenoexcept

Set whether enqueue emits layer timing to the profiler.

If set to true (default), enqueue is synchronous and does layer timing profiling implicitly if there is a profiler attached. If set to false, enqueue will be asynchronous if there is a profiler attached. An extra method reportToProfiler() needs to be called to obtain the profiling data and report to the profiler attached.

See also: IExecutionContext::getEnqueueEmitsProfile(); IExecutionContext::reportToProfiler()

◆ setErrorRecorder()

void nvinfer1::IExecutionContext::setErrorRecorder ( IErrorRecorder * recorder )

inlinenoexcept

Set the ErrorRecorder for this interface.

Assigns the ErrorRecorder to this interface. The ErrorRecorder will track all errors during execution. This function will call incRefCount of the registered ErrorRecorder at least once. Setting recorder to nullptr unregisters the recorder with the interface, resulting in a call to decRefCount if a recorder has been registered.

If an error recorder is not set, messages will be sent to the global log stream.

Parameters

recorder The error recorder to register with this interface.

See also: getErrorRecorder()

◆ setInputShapeBinding()

bool nvinfer1::IExecutionContext::setInputShapeBinding	(	int32_t	bindingIndex,
		int32_t const *	data
	)

inlinenoexcept

Set values of input tensor required by shape calculations.

Parameters

bindingIndex	index of an input tensor for which ICudaEngine::isShapeBinding(bindingIndex) and ICudaEngine::bindingIsInput(bindingIndex) are both true.
data	pointer to values of the input tensor. The number of values should be the product of the dimensions returned by getBindingDimensions(bindingIndex).

If ICudaEngine::isShapeBinding(bindingIndex) and ICudaEngine::bindingIsInput(bindingIndex) are both true, this method must be called before enqueue() or execute() may be called. This method will fail unless a valid optimization profile is defined for the current execution context (getOptimizationProfile() must not be -1).

Warning: This function will trigger layer resource updates on the next call of enqueue[V2]()/execute[V2](), possibly resulting in performance bottlenecks, if the shapes are different than the previous set shapes.

Returns: false if an error occurs (e.g. bindingIndex is out of range for the currently selected optimization profile or shape data is inconsistent with min-max range of the optimization profile), else true. Note that the network can still be invalid for certain combinations of input shapes that lead to invalid output shapes. To confirm the correctness of the network input shapes, check whether the output binding has valid dimensions using getBindingDimensions() on the output bindingIndex.

◆ setName()

void nvinfer1::IExecutionContext::setName ( const char * name )

inlinenoexcept

Set the name of the execution context.

This method copies the name string.

See also: getName()

◆ setOptimizationProfile()

TRT_DEPRECATED bool nvinfer1::IExecutionContext::setOptimizationProfile ( int32_t profileIndex )

inlinenoexcept

Select an optimization profile for the current context.

Parameters

profileIndex Index of the profile. It must lie between 0 and getEngine().getNbOptimizationProfiles() - 1

The selected profile will be used in subsequent calls to execute() or enqueue().

When an optimization profile is switched via this API, TensorRT may enqueue GPU memory copy operations required to set up the new profile during the subsequent enqueue() operations. To avoid these calls during enqueue(), use setOptimizationProfileAsync() instead.

If the associated CUDA engine has dynamic inputs, this method must be called at least once with a unique profileIndex before calling execute or enqueue (i.e. the profile index may not be in use by another execution context that has not been destroyed yet). For the first execution context that is created for an engine, setOptimizationProfile(0) is called implicitly.

If the associated CUDA engine does not have inputs with dynamic shapes, this method need not be called, in which case the default profile index of 0 will be used (this is particularly the case for all safe engines).

setOptimizationProfile() must be called before calling setBindingDimensions() and setInputShapeBinding() for all dynamic input tensors or input shape tensors, which in turn must be called before either execute() or enqueue().

Warning: This function will trigger layer resource updates on the next call of enqueue[V2]()/execute[V2](), possibly resulting in performance bottlenecks.

Returns: true if the call succeeded, else false (e.g. input out of range)

Deprecated:: This API is superseded by setOptimizationProfileAsync and will be removed in TensorRT 9.0.

See also: ICudaEngine::getNbOptimizationProfiles() IExecutionContext::setOptimizationProfileAsync()

◆ setOptimizationProfileAsync()

bool nvinfer1::IExecutionContext::setOptimizationProfileAsync	(	int32_t	profileIndex,
		cudaStream_t	stream
	)

inlinenoexcept

Select an optimization profile for the current context with async semantics.

Parameters

profileIndex	Index of the profile. The value must lie between 0 and getEngine().getNbOptimizationProfiles() - 1
stream	A cuda stream on which the cudaMemcpyAsyncs may be enqueued

When an optimization profile is switched via this API, TensorRT may require that data is copied via cudaMemcpyAsync. It is the application’s responsibility to guarantee that synchronization between the profile sync stream and the enqueue stream occurs.

The selected profile will be used in subsequent calls to execute() or enqueue(). If the associated CUDA engine has inputs with dynamic shapes, the optimization profile must be set with a unique profileIndex before calling execute or enqueue. For the first execution context that is created for an engine, setOptimizationProfile(0) is called implicitly.

If the associated CUDA engine does not have inputs with dynamic shapes, this method need not be called, in which case the default profile index of 0 will be used.

setOptimizationProfileAsync() must be called before calling setBindingDimensions() and setInputShapeBinding() for all dynamic input tensors or input shape tensors, which in turn must be called before either execute() or enqueue().

Warning: This function will trigger layer resource updates on the next call of enqueue[V2]()/execute[V2](), possibly resulting in performance bottlenecks.; Not synchronizing the stream used at enqueue with the stream used to set optimization profile asynchronously using this API will result in undefined behavior.

Returns: true if the call succeeded, else false (e.g. input out of range)

See also: ICudaEngine::getNbOptimizationProfiles(); IExecutionContext::setOptimizationProfile()

◆ setProfiler()

void nvinfer1::IExecutionContext::setProfiler ( IProfiler * profiler )

inlinenoexcept

Set the profiler.

See also: IProfiler getProfiler()

The documentation for this class was generated from the following file:

NvInferRuntime.h

Public Member Functions

Protected Attributes

Additional Inherited Members

Detailed Description

Member Function Documentation

◆ allInputDimensionsSpecified()

◆ allInputShapesSpecified()

◆ destroy()

◆ enqueue()

◆ enqueueV2()

◆ execute()

◆ executeV2()

◆ getBindingDimensions()

◆ getDebugSync()

◆ getEngine()

◆ getEnqueueEmitsProfile()

◆ getErrorRecorder()

◆ getName()

◆ getOptimizationProfile()

◆ getProfiler()

◆ getShapeBinding()

◆ getStrides()

◆ reportToProfiler()

◆ setBindingDimensions()

◆ setDebugSync()

◆ setDeviceMemory()

◆ setEnqueueEmitsProfile()

◆ setErrorRecorder()

◆ setInputShapeBinding()

◆ setName()

◆ setOptimizationProfile()

◆ setOptimizationProfileAsync()

◆ setProfiler()