TensorRT 8.5.3
nvinfer1::IExecutionContext Class Reference

Context for executing inference using an engine, with functionally unsafe features. More...

#include <NvInferRuntime.h>

Inheritance diagram for nvinfer1::IExecutionContext:

Public Member Functions

virtual ~IExecutionContext () noexcept=default
TRT_DEPRECATED bool execute (int32_t batchSize, void *const *bindings) noexcept
 Synchronously execute inference on a batch. More...
TRT_DEPRECATED bool enqueue (int32_t batchSize, void *const *bindings, cudaStream_t stream, cudaEvent_t *inputConsumed) noexcept
 Asynchronously execute inference on a batch. More...
void setDebugSync (bool sync) noexcept
 Set the debug sync flag. More...
bool getDebugSync () const noexcept
 Get the debug sync flag. More...
void setProfiler (IProfiler *profiler) noexcept
 Set the profiler. More...
IProfilergetProfiler () const noexcept
 Get the profiler. More...
ICudaEngine const & getEngine () const noexcept
 Get the associated engine. More...
TRT_DEPRECATED void destroy () noexcept
 Destroy this object. More...
void setName (char const *name) noexcept
 Set the name of the execution context. More...
char const * getName () const noexcept
 Return the name of the execution context. More...
void setDeviceMemory (void *memory) noexcept
 Set the device memory for use by this execution context. More...
TRT_DEPRECATED Dims getStrides (int32_t bindingIndex) const noexcept
 Return the strides of the buffer for the given binding. More...
Dims getTensorStrides (char const *tensorName) const noexcept
 Return the strides of the buffer for the given tensor name. More...
TRT_DEPRECATED bool setOptimizationProfile (int32_t profileIndex) noexcept
 Select an optimization profile for the current context. More...
int32_t getOptimizationProfile () const noexcept
 Get the index of the currently selected optimization profile. More...
TRT_DEPRECATED bool setBindingDimensions (int32_t bindingIndex, Dims dimensions) noexcept
 Set the dynamic dimensions of an input binding. More...
bool setInputShape (char const *tensorName, Dims const &dims) noexcept
 Set shape of given input. More...
TRT_DEPRECATED Dims getBindingDimensions (int32_t bindingIndex) const noexcept
 Get the dynamic dimensions of a binding. More...
Dims getTensorShape (char const *tensorName) const noexcept
 Return the shape of the given input or output. More...
TRT_DEPRECATED bool setInputShapeBinding (int32_t bindingIndex, int32_t const *data) noexcept
 Set values of input tensor required by shape calculations. More...
TRT_DEPRECATED bool getShapeBinding (int32_t bindingIndex, int32_t *data) const noexcept
 Get values of an input tensor required for shape calculations or an output tensor produced by shape calculations. More...
bool allInputDimensionsSpecified () const noexcept
 Whether all dynamic dimensions of input tensors have been specified. More...
bool allInputShapesSpecified () const noexcept
 Whether all input shape bindings have been specified. More...
void setErrorRecorder (IErrorRecorder *recorder) noexcept
 Set the ErrorRecorder for this interface. More...
IErrorRecordergetErrorRecorder () const noexcept
 Get the ErrorRecorder assigned to this interface. More...
bool executeV2 (void *const *bindings) noexcept
 Synchronously execute inference a network. More...
TRT_DEPRECATED bool enqueueV2 (void *const *bindings, cudaStream_t stream, cudaEvent_t *inputConsumed) noexcept
 Asynchronously execute inference. More...
bool setOptimizationProfileAsync (int32_t profileIndex, cudaStream_t stream) noexcept
 Select an optimization profile for the current context with async semantics. More...
void setEnqueueEmitsProfile (bool enqueueEmitsProfile) noexcept
 Set whether enqueue emits layer timing to the profiler. More...
bool getEnqueueEmitsProfile () const noexcept
 Get the enqueueEmitsProfile state. More...
bool reportToProfiler () const noexcept
 Calculate layer timing info for the current optimization profile in IExecutionContext and update the profiler after one iteration of inference launch. More...
bool setTensorAddress (char const *tensorName, void *data) noexcept
 Set memory address for given input or output tensor. More...
void const * getTensorAddress (char const *tensorName) const noexcept
 Get memory address bound to given input or output tensor, or nullptr if the provided name does not map to an input or output tensor. More...
bool setInputTensorAddress (char const *tensorName, void const *data) noexcept
 Set memory address for given input. More...
void * getOutputTensorAddress (char const *tensorName) const noexcept
 Get memory address for given output. More...
int32_t inferShapes (int32_t nbMaxNames, char const **tensorNames) noexcept
 Run shape calculations. More...
bool setInputConsumedEvent (cudaEvent_t event) noexcept
 Mark input as consumed. More...
cudaEvent_t getInputConsumedEvent () const noexcept
 The event associated with consuming the input. More...
bool setOutputAllocator (char const *tensorName, IOutputAllocator *outputAllocator) noexcept
 Set output allocator to use for output tensor of given name. Pass nullptr to outputAllocator to unset. The allocator is called by enqueueV3(). More...
IOutputAllocatorgetOutputAllocator (char const *tensorName) const noexcept
 Get output allocator associated with output tensor of given name, or nullptr if the provided name does not map to an output tensor. More...
int64_t getMaxOutputSize (char const *tensorName) const noexcept
 Get upper bound on an output tensor's size, in bytes, based on the current optimization profile and input dimensions. More...
bool setTemporaryStorageAllocator (IGpuAllocator *allocator) noexcept
 Specify allocator to use for internal temporary storage. More...
IGpuAllocatorgetTemporaryStorageAllocator () const noexcept
 Get allocator set by setTemporaryStorageAllocator. More...
bool enqueueV3 (cudaStream_t stream) noexcept
 Asynchronously execute inference. More...
void setPersistentCacheLimit (size_t size) noexcept
 Set the maximum size for persistent cache usage. More...
size_t getPersistentCacheLimit () const noexcept
 Get the maximum size for persistent cache usage. More...
bool setNvtxVerbosity (ProfilingVerbosity verbosity) noexcept
 Set the verbosity of the NVTX markers in the execution context. More...
ProfilingVerbosity getNvtxVerbosity () const noexcept
 Get the NVTX verbosity of the execution context. More...

Protected Attributes

apiv::VExecutionContext * mImpl

Additional Inherited Members

- Protected Member Functions inherited from nvinfer1::INoCopy
 INoCopy ()=default
virtual ~INoCopy ()=default
 INoCopy (INoCopy const &other)=delete
INoCopyoperator= (INoCopy const &other)=delete
 INoCopy (INoCopy &&other)=delete
INoCopyoperator= (INoCopy &&other)=delete

Detailed Description

Context for executing inference using an engine, with functionally unsafe features.

Multiple execution contexts may exist for one ICudaEngine instance, allowing the same engine to be used for the execution of multiple batches simultaneously. If the engine supports dynamic shapes, each execution context in concurrent use must use a separate optimization profile.

Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.

Constructor & Destructor Documentation

◆ ~IExecutionContext()

virtual nvinfer1::IExecutionContext::~IExecutionContext ( )

Member Function Documentation

◆ allInputDimensionsSpecified()

bool nvinfer1::IExecutionContext::allInputDimensionsSpecified ( ) const

Whether all dynamic dimensions of input tensors have been specified.

True if all dynamic dimensions of input tensors have been specified by calling setBindingDimensions().

Trivially true if network has no dynamically shaped input tensors.

Does not work with name-base interfaces eg. IExecutionContext::setInputShape(). Use IExecutionContext::inferShapes() instead.

See also

◆ allInputShapesSpecified()

bool nvinfer1::IExecutionContext::allInputShapesSpecified ( ) const

Whether all input shape bindings have been specified.

True if all input shape bindings have been specified by setInputShapeBinding().

Trivially true if network has no input shape bindings.

Does not work with name-base interfaces eg. IExecutionContext::setInputShape(). Use IExecutionContext::inferShapes() instead.

See also

◆ destroy()

TRT_DEPRECATED void nvinfer1::IExecutionContext::destroy ( )

Destroy this object.

Deprecated in TRT 8.0. Superseded by delete.
Calling destroy on a managed pointer will result in a double-free error.

◆ enqueue()

TRT_DEPRECATED bool nvinfer1::IExecutionContext::enqueue ( int32_t  batchSize,
void *const *  bindings,
cudaStream_t  stream,
cudaEvent_t *  inputConsumed 

Asynchronously execute inference on a batch.

This method requires an array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::getBindingIndex()

batchSizeThe batch size. This is at most the max batch size value supplied to the builder when the engine was built. If the network is created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag, please use enqueueV3() instead, and this batchSize argument has no effect.
bindingsAn array of pointers to input and output buffers for the network.
streamA cuda stream on which the inference kernels will be enqueued.
inputConsumedAn optional event which will be signaled when the input buffers can be refilled with new data.
True if the kernels were enqueued successfully.
Deprecated in TensorRT 8.4. Superseded by enqueueV2() if the network is created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag.
See also
ICudaEngine::getBindingIndex() ICudaEngine::getMaxBatchSize()
Calling enqueue() in from the same IExecutionContext object with different CUDA streams concurrently results in undefined behavior. To perform inference concurrently in multiple streams, use one execution context per stream.
This function will trigger layer resource updates if hasImplicitBatchDimension() returns true and batchSize changes between subsequent calls, possibly resulting in performance bottlenecks.

◆ enqueueV2()

TRT_DEPRECATED bool nvinfer1::IExecutionContext::enqueueV2 ( void *const *  bindings,
cudaStream_t  stream,
cudaEvent_t *  inputConsumed 

Asynchronously execute inference.

This method requires an array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::getBindingIndex(). This method only works for execution contexts built with full dimension networks.

bindingsAn array of pointers to input and output buffers for the network.
streamA cuda stream on which the inference kernels will be enqueued
inputConsumedAn optional event which will be signaled when the input buffers can be refilled with new data
True if the kernels were enqueued successfully.
Superseded by enqueueV3(). Deprecated in TensorRT 8.5
See also
ICudaEngine::getBindingIndex() ICudaEngine::getMaxBatchSize() IExecutionContext::enqueueV3()
Calling enqueueV2() with a stream in CUDA graph capture mode has a known issue. If dynamic shapes are used, the first enqueueV2() call after a setInputShapeBinding() call will cause failure in stream capture due to resource allocation. Please call enqueueV2() once before capturing the graph.
Calling enqueueV2() in from the same IExecutionContext object with different CUDA streams concurrently results in undefined behavior. To perform inference concurrently in multiple streams, use one execution context per stream.

◆ enqueueV3()

bool nvinfer1::IExecutionContext::enqueueV3 ( cudaStream_t  stream)

Asynchronously execute inference.

streamA cuda stream on which the inference kernels will be enqueued.
True if the kernels were enqueued successfully, false otherwise.

Modifying or releasing memory that has been registered for the tensors before stream synchronization or the event passed to setInputConsumedEvent has been being triggered results in undefined behavior. Input tensor can be released after the setInputConsumedEvent whereas output tensors require stream synchronization.

◆ execute()

TRT_DEPRECATED bool nvinfer1::IExecutionContext::execute ( int32_t  batchSize,
void *const *  bindings 

Synchronously execute inference on a batch.

This method requires an array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::getBindingIndex()

batchSizeThe batch size. This is at most the max batch size value supplied to the builder when the engine was built. If the network is created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag, please use executeV2() instead, and this batchSize argument has no effect.
bindingsAn array of pointers to input and output buffers for the network.
True if execution succeeded.
Deprecated in TensorRT 8.4. Superseded by executeV2() if the network is created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag.
This function will trigger layer resource updates if hasImplicitBatchDimension() returns true and batchSize changes between subsequent calls, possibly resulting in performance bottlenecks.
See also
ICudaEngine::getBindingIndex() ICudaEngine::getMaxBatchSize()

◆ executeV2()

bool nvinfer1::IExecutionContext::executeV2 ( void *const *  bindings)

Synchronously execute inference a network.

This method requires an array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::getBindingIndex(). This method only works for execution contexts built with full dimension networks.

bindingsAn array of pointers to input and output buffers for the network.
True if execution succeeded.
See also
ICudaEngine::getBindingIndex() ICudaEngine::getMaxBatchSize()

◆ getBindingDimensions()

TRT_DEPRECATED Dims nvinfer1::IExecutionContext::getBindingDimensions ( int32_t  bindingIndex) const

Get the dynamic dimensions of a binding.

If the engine was built with an implicit batch dimension, same as ICudaEngine::getBindingDimensions.

If setBindingDimensions() has been called on this binding (or if there are no dynamic dimensions), all dimensions will be positive. Otherwise, it is necessary to call setBindingDimensions() before enqueueV2() or executeV2() may be called.

If the bindingIndex is out of range, an invalid Dims with nbDims == -1 is returned. The same invalid Dims will be returned if the engine was not built with an implicit batch dimension and if the execution context is not currently associated with a valid optimization profile (i.e. if getOptimizationProfile() returns -1).

If ICudaEngine::bindingIsInput(bindingIndex) is false, then both allInputDimensionsSpecified() and allInputShapesSpecified() must be true before calling this method.

Currently selected binding dimensions

For backwards compatibility with earlier versions of TensorRT, a bindingIndex that does not belong to the current profile is corrected as described for ICudaEngine::getProfileDimensions.

Deprecated in TensorRT 8.5. Superseded by getTensorShape().
See also

◆ getDebugSync()

bool nvinfer1::IExecutionContext::getDebugSync ( ) const

Get the debug sync flag.

See also

◆ getEngine()

ICudaEngine const & nvinfer1::IExecutionContext::getEngine ( ) const

Get the associated engine.

See also

◆ getEnqueueEmitsProfile()

bool nvinfer1::IExecutionContext::getEnqueueEmitsProfile ( ) const

Get the enqueueEmitsProfile state.

The enqueueEmitsProfile state.
See also

◆ getErrorRecorder()

IErrorRecorder * nvinfer1::IExecutionContext::getErrorRecorder ( ) const

Get the ErrorRecorder assigned to this interface.

Retrieves the assigned error recorder object for the given class. A nullptr will be returned if an error handler has not been set.

A pointer to the IErrorRecorder object that has been registered.
See also

◆ getInputConsumedEvent()

cudaEvent_t nvinfer1::IExecutionContext::getInputConsumedEvent ( ) const

The event associated with consuming the input.

The cuda event. Nullptr will be returned if the event is not set yet.

◆ getMaxOutputSize()

int64_t nvinfer1::IExecutionContext::getMaxOutputSize ( char const *  tensorName) const

Get upper bound on an output tensor's size, in bytes, based on the current optimization profile and input dimensions.

If the profile or input dimensions are not yet set, or the provided name does not map to an output, returns -1.

tensorNameThe name of an output tensor.
Upper bound in bytes.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.

◆ getName()

char const * nvinfer1::IExecutionContext::getName ( ) const

Return the name of the execution context.

See also

◆ getNvtxVerbosity()

ProfilingVerbosity nvinfer1::IExecutionContext::getNvtxVerbosity ( ) const

Get the NVTX verbosity of the execution context.

The current NVTX verbosity of the execution context.
See also

◆ getOptimizationProfile()

int32_t nvinfer1::IExecutionContext::getOptimizationProfile ( ) const

Get the index of the currently selected optimization profile.

If the profile index has not been set yet (implicitly to 0 for the first execution context to be created, or explicitly for all subsequent contexts), an invalid value of -1 will be returned and all calls to enqueueV2()/enqueueV3()/executeV2() will fail until a valid profile index has been set.

◆ getOutputAllocator()

IOutputAllocator * nvinfer1::IExecutionContext::getOutputAllocator ( char const *  tensorName) const

Get output allocator associated with output tensor of given name, or nullptr if the provided name does not map to an output tensor.

The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
See also

◆ getOutputTensorAddress()

void * nvinfer1::IExecutionContext::getOutputTensorAddress ( char const *  tensorName) const

Get memory address for given output.

tensorNameThe name of an output tensor.
Raw output data pointer (void*) for given output tensor, or nullptr if the provided name does not map to an output tensor.

If only a (void const*) pointer is needed, an alternative is to call method getTensorAddress().

The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
See also

◆ getPersistentCacheLimit()

size_t nvinfer1::IExecutionContext::getPersistentCacheLimit ( ) const

Get the maximum size for persistent cache usage.

The size of the persistent cache limit
See also

◆ getProfiler()

IProfiler * nvinfer1::IExecutionContext::getProfiler ( ) const

Get the profiler.

See also
IProfiler setProfiler()

◆ getShapeBinding()

TRT_DEPRECATED bool nvinfer1::IExecutionContext::getShapeBinding ( int32_t  bindingIndex,
int32_t *  data 
) const

Get values of an input tensor required for shape calculations or an output tensor produced by shape calculations.

bindingIndexindex of an input or output tensor for which ICudaEngine::isShapeBinding(bindingIndex) is true.
datapointer to where values will be written. The number of values written is the product of the dimensions returned by getBindingDimensions(bindingIndex).

If ICudaEngine::bindingIsInput(bindingIndex) is false, then both allInputDimensionsSpecified() and allInputShapesSpecified() must be true before calling this method. The method will also fail if no valid optimization profile has been set for the current execution context, i.e. if getOptimizationProfile() returns -1.

Deprecated in TensorRT 8.5. Superseded by getTensorAddress() or getOutputTensorAddress().
See also
isShapeBinding() getTensorAddress() getOutputTensorAddress()

◆ getStrides()

TRT_DEPRECATED Dims nvinfer1::IExecutionContext::getStrides ( int32_t  bindingIndex) const

Return the strides of the buffer for the given binding.

The strides are in units of elements, not components or bytes. For example, for TensorFormat::kHWC8, a stride of one spans 8 scalars.

Note that strides can be different for different execution contexts with dynamic shapes.

If the bindingIndex is invalid or there are dynamic dimensions that have not been set yet, returns Dims with Dims::nbDims = -1.

bindingIndexThe binding index.
Deprecated in TensorRT 8.5. Superseded by getTensorStrides().
See also

◆ getTemporaryStorageAllocator()

IGpuAllocator * nvinfer1::IExecutionContext::getTemporaryStorageAllocator ( ) const

Get allocator set by setTemporaryStorageAllocator.

Returns a nullptr if a nullptr was passed with setTemporaryStorageAllocator().

◆ getTensorAddress()

void const * nvinfer1::IExecutionContext::getTensorAddress ( char const *  tensorName) const

Get memory address bound to given input or output tensor, or nullptr if the provided name does not map to an input or output tensor.

tensorNameThe name of an input or output tensor.

Use method getOutputTensorAddress() if a non-const pointer for an output tensor is required.

The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
See also

◆ getTensorShape()

Dims nvinfer1::IExecutionContext::getTensorShape ( char const *  tensorName) const

Return the shape of the given input or output.

tensorNameThe name of an input or output tensor.

Return Dims{-1, {}} if the provided name does not map to an input or output tensor. Otherwise return the shape of the input or output tensor.

A dimension in an input tensor will have a -1 wildcard value if all the following are true:

  • setInputShape() has not yet been called for this tensor
  • The dimension is a runtime dimension that is not implicitly constrained to be a single value.

A dimension in an output tensor will have a -1 wildcard value if the dimension depends on values of execution tensors OR if all the following are true:

  • It is a runtime dimension.
  • setInputShape() has NOT been called for some input tensor(s) with a runtime shape.
  • setTensorAddress() has NOT been called for some input tensor(s) with isShapeInferenceIO() = true.

An output tensor may also have -1 wildcard dimensions if its shape depends on values of tensors supplied to enqueueV3().

If the request is for the shape of an output tensor with runtime dimensions, all input tensors with isShapeInferenceIO() = true should have their value already set, since these values might be needed to compute the output shape.

Examples of an input dimension that is implicitly constrained to a single value:

  • The optimization profile specifies equal min and max values.
  • The dimension is named and only one value meets the optimization profile requirements for dimensions with that name.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.

◆ getTensorStrides()

Dims nvinfer1::IExecutionContext::getTensorStrides ( char const *  tensorName) const

Return the strides of the buffer for the given tensor name.

The strides are in units of elements, not components or bytes. For example, for TensorFormat::kHWC8, a stride of one spans 8 scalars.

Note that strides can be different for different execution contexts with dynamic shapes.

If the provided name does not map to an input or output tensor, or there are dynamic dimensions that have not been set yet, return Dims{-1, {}}

tensorNameThe name of an input or output tensor.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.

◆ inferShapes()

int32_t nvinfer1::IExecutionContext::inferShapes ( int32_t  nbMaxNames,
char const **  tensorNames 

Run shape calculations.

nbMaxNamesMaximum number of names to write to tensorNames. When the return value is a positive value n and tensorNames != nullptr, the names of min(n,nbMaxNames) insufficiently specified input tensors are written to tensorNames.
tensorNamesBuffer in which to place names of insufficiently specified input tensors.
0 on success. Positive value n if n input tensors were not sufficiently specified. -1 for other errors.

An input tensor is insufficiently specified if either of the following is true:

  • It has dynamic dimensions and its runtime dimensions have not yet been specified via IExecutionContext::setInputShape.
  • isShapeInferenceIO(t)=true and the tensor's address has not yet been set.

If an output tensor has isShapeInferenceIO(t)=true and its address has been specified, then its value is written.

Returns -1 if tensorNames == nullptr and nbMaxNames != 0. Returns -1 if nbMaxNames < 0. Returns -1 if a tensor's dimensions are invalid, e.g. a tensor ends up with a negative dimension.

◆ reportToProfiler()

bool nvinfer1::IExecutionContext::reportToProfiler ( ) const

Calculate layer timing info for the current optimization profile in IExecutionContext and update the profiler after one iteration of inference launch.

If IExecutionContext::getEnqueueEmitsProfile() returns true, the enqueue function will calculate layer timing implicitly if a profiler is provided. This function returns true and does nothing.

If IExecutionContext::getEnqueueEmitsProfile() returns false, the enqueue function will record the CUDA event timers if a profiler is provided. But it will not perform the layer timing calculation. IExecutionContext::reportToProfiler() needs to be called explicitly to calculate layer timing for the previous inference launch.

In the CUDA graph launch scenario, it will record the same set of CUDA events as in regular enqueue functions if the graph is captured from an IExecutionContext with profiler enabled. This function needs to be called after graph launch to report the layer timing info to the profiler.

profiling CUDA graphs is only available from CUDA 11.1 onwards.
reportToProfiler uses the stream of the previous enqueue call, so the stream must be live otherwise behavior is undefined.
true if the call succeeded, else false (e.g. profiler not provided, in CUDA graph capture mode, etc.)
See also

◆ setBindingDimensions()

TRT_DEPRECATED bool nvinfer1::IExecutionContext::setBindingDimensions ( int32_t  bindingIndex,
Dims  dimensions 

Set the dynamic dimensions of an input binding.

bindingIndexindex of an input tensor whose dimensions must be compatible with the network definition (i.e. only the wildcard dimension -1 can be replaced with a new dimension >= 0).
dimensionsspecifies the dimensions of the input tensor. It must be in the valid range for the currently selected optimization profile, and the corresponding engine must not be safety-certified.

This method requires the engine to be built without an implicit batch dimension. This method will fail unless a valid optimization profile is defined for the current execution context (getOptimizationProfile() must not be -1).

For all dynamic non-output bindings (which have at least one wildcard dimension of -1), this method needs to be called before either enqueueV2() or executeV2() may be called. This can be checked using the method allInputDimensionsSpecified().

This function will trigger layer resource updates on the next call of enqueueV2()/executeV2(), possibly resulting in performance bottlenecks, if the dimensions are different than the previous set dimensions.
false if an error occurs (e.g. bindingIndex is out of range for the currently selected optimization profile or binding dimension is inconsistent with min-max range of the optimization profile), else true. Note that the network can still be invalid for certain combinations of input shapes that lead to invalid output shapes. To confirm the correctness of the network input shapes, check whether the output binding has valid dimensions using getBindingDimensions() on the output bindingIndex.
Deprecated in TensorRT 8.5. Superseded by setInputShape().
See also

◆ setDebugSync()

void nvinfer1::IExecutionContext::setDebugSync ( bool  sync)

Set the debug sync flag.

If this flag is set to true, the engine will log the successful execution for each kernel during executeV2(). It has no effect when using enqueueV2()/enqueueV3().

See also

◆ setDeviceMemory()

void nvinfer1::IExecutionContext::setDeviceMemory ( void *  memory)

Set the device memory for use by this execution context.

The memory must be aligned with cuda memory alignment property (using cudaGetDeviceProperties()), and its size must be at least that returned by getDeviceMemorySize(). Setting memory to nullptr is acceptable if getDeviceMemorySize() returns 0. If using enqueueV2()/enqueueV3() to run the network, the memory is in use from the invocation of enqueueV2()/enqueueV3() until network execution is complete. If using executeV2(), it is in use until executeV2() returns. Releasing or otherwise using the memory for other purposes during this time will result in undefined behavior.

See also
ICudaEngine::getDeviceMemorySize() ICudaEngine::createExecutionContextWithoutDeviceMemory()

◆ setEnqueueEmitsProfile()

void nvinfer1::IExecutionContext::setEnqueueEmitsProfile ( bool  enqueueEmitsProfile)

Set whether enqueue emits layer timing to the profiler.

If set to true (default), enqueue is synchronous and does layer timing profiling implicitly if there is a profiler attached. If set to false, enqueue will be asynchronous if there is a profiler attached. An extra method reportToProfiler() needs to be called to obtain the profiling data and report to the profiler attached.

See also

◆ setErrorRecorder()

void nvinfer1::IExecutionContext::setErrorRecorder ( IErrorRecorder recorder)

Set the ErrorRecorder for this interface.

Assigns the ErrorRecorder to this interface. The ErrorRecorder will track all errors during execution. This function will call incRefCount of the registered ErrorRecorder at least once. Setting recorder to nullptr unregisters the recorder with the interface, resulting in a call to decRefCount if a recorder has been registered.

If an error recorder is not set, messages will be sent to the global log stream.

recorderThe error recorder to register with this interface.
See also

◆ setInputConsumedEvent()

bool nvinfer1::IExecutionContext::setInputConsumedEvent ( cudaEvent_t  event)

Mark input as consumed.

eventThe cuda event that is triggered after all input tensors have been consumed.
The set event must be valid during the inferece.
True on success, false if error occurred.

Passing event==nullptr removes whatever event was set, if any.

◆ setInputShape()

bool nvinfer1::IExecutionContext::setInputShape ( char const *  tensorName,
Dims const &  dims 

Set shape of given input.

tensorNameThe name of an input tensor.
dimsThe shape of an input tensor.
True on success, false if the provided name does not map to an input tensor, or if some other error occurred.

Each dimension must agree with the network dimension unless the latter was -1.

The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.

◆ setInputShapeBinding()

TRT_DEPRECATED bool nvinfer1::IExecutionContext::setInputShapeBinding ( int32_t  bindingIndex,
int32_t const *  data 

Set values of input tensor required by shape calculations.

bindingIndexindex of an input tensor for which ICudaEngine::isShapeBinding(bindingIndex) and ICudaEngine::bindingIsInput(bindingIndex) are both true.
datapointer to values of the input tensor. The number of values should be the product of the dimensions returned by getBindingDimensions(bindingIndex).

If ICudaEngine::isShapeBinding(bindingIndex) and ICudaEngine::bindingIsInput(bindingIndex) are both true, this method must be called before enqueueV2() or executeV2() may be called. This method will fail unless a valid optimization profile is defined for the current execution context (getOptimizationProfile() must not be -1).

This function will trigger layer resource updates on the next call of enqueueV2()/executeV2(), possibly resulting in performance bottlenecks, if the shapes are different than the previous set shapes.
false if an error occurs (e.g. bindingIndex is out of range for the currently selected optimization profile or shape data is inconsistent with min-max range of the optimization profile), else true. Note that the network can still be invalid for certain combinations of input shapes that lead to invalid output shapes. To confirm the correctness of the network input shapes, check whether the output binding has valid dimensions using getBindingDimensions() on the output bindingIndex.
Deprecated in TensorRT 8.5. Superseded by setInputTensorAddress() or setTensorAddress().
See also
setInputTensorAddress() setTensorAddress()

◆ setInputTensorAddress()

bool nvinfer1::IExecutionContext::setInputTensorAddress ( char const *  tensorName,
void const *  data 

Set memory address for given input.

tensorNameThe name of an input tensor.
dataThe pointer (void const*) to the const data owned by the user.
True on success, false if the provided name does not map to an input tensor, does not meet alignment requirements, or some other error occurred.

Input addresses can also be set using method setTensorAddress, which requires a (void*).

See description of method setTensorAddress() for alignment and data type constraints.

The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
See also

◆ setName()

void nvinfer1::IExecutionContext::setName ( char const *  name)

Set the name of the execution context.

This method copies the name string.

The string name must be null-terminated, and be at most 4096 bytes including the terminator.
See also

◆ setNvtxVerbosity()

bool nvinfer1::IExecutionContext::setNvtxVerbosity ( ProfilingVerbosity  verbosity)

Set the verbosity of the NVTX markers in the execution context.

Building with kDETAILED verbosity will generally increase latency in enqueueV2/enqueueV3(). Call this method to select NVTX verbosity in this execution context at runtime.

The default is the verbosity with which the engine was built, and the verbosity may not be raised above that level.

This function does not affect how IEngineInspector interacts with the engine.

verbosityThe verbosity of the NVTX markers.
True if the NVTX verbosity is set successfully. False if the provided verbosity level is higher than the profiling verbosity of the corresponding engine.
See also

◆ setOptimizationProfile()

TRT_DEPRECATED bool nvinfer1::IExecutionContext::setOptimizationProfile ( int32_t  profileIndex)

Select an optimization profile for the current context.

profileIndexIndex of the profile. It must lie between 0 and getEngine().getNbOptimizationProfiles() - 1

The selected profile will be used in subsequent calls to executeV2()/enqueueV2()/enqueueV3().

When an optimization profile is switched via this API, TensorRT may enqueue GPU memory copy operations required to set up the new profile during the subsequent enqueueV2()/enqueueV3() operations. To avoid these calls during enqueueV2()/enqueueV3(), use setOptimizationProfileAsync() instead.

If the associated CUDA engine has dynamic inputs, this method must be called at least once with a unique profileIndex before calling execute or enqueue (i.e. the profile index may not be in use by another execution context that has not been destroyed yet). For the first execution context that is created for an engine, setOptimizationProfile(0) is called implicitly.

If the associated CUDA engine does not have inputs with dynamic shapes, this method need not be called, in which case the default profile index of 0 will be used (this is particularly the case for all safe engines).

setOptimizationProfile() must be called before calling setBindingDimensions() and setInputShapeBinding() for all dynamic input tensors or input shape tensors, which in turn must be called before executeV2()/enqueueV2()/enqueueV3().

This function will trigger layer resource updates on the next call of enqueueV2()/enqueueV3()/executeV2(), possibly resulting in performance bottlenecks.
true if the call succeeded, else false (e.g. input out of range)
Superseded by setOptimizationProfileAsync. Deprecated prior to TensorRT 8.0 and will be removed in 9.0.
See also
ICudaEngine::getNbOptimizationProfiles() IExecutionContext::setOptimizationProfileAsync()

◆ setOptimizationProfileAsync()

bool nvinfer1::IExecutionContext::setOptimizationProfileAsync ( int32_t  profileIndex,
cudaStream_t  stream 

Select an optimization profile for the current context with async semantics.

profileIndexIndex of the profile. The value must lie between 0 and getEngine().getNbOptimizationProfiles() - 1
streamA cuda stream on which the cudaMemcpyAsyncs may be enqueued

When an optimization profile is switched via this API, TensorRT may require that data is copied via cudaMemcpyAsync. It is the application’s responsibility to guarantee that synchronization between the profile sync stream and the enqueue stream occurs.

The selected profile will be used in subsequent calls to executeV2()/enqueueV2()/enqueueV3(). If the associated CUDA engine has inputs with dynamic shapes, the optimization profile must be set with a unique profileIndex before calling execute or enqueue. For the first execution context that is created for an engine, setOptimizationProfile(0) is called implicitly.

If the associated CUDA engine does not have inputs with dynamic shapes, this method need not be called, in which case the default profile index of 0 will be used.

setOptimizationProfileAsync() must be called before calling setBindingDimensions() and setInputShapeBinding() for all dynamic input tensors or input shape tensors, which in turn must be called before executeV2()/enqueueV2()/enqueueV3().

This function will trigger layer resource updates on the next call of enqueueV2()/executeV2()/enqueueV3(), possibly resulting in performance bottlenecks.
Not synchronizing the stream used at enqueue with the stream used to set optimization profile asynchronously using this API will result in undefined behavior.
true if the call succeeded, else false (e.g. input out of range)
See also

◆ setOutputAllocator()

bool nvinfer1::IExecutionContext::setOutputAllocator ( char const *  tensorName,
IOutputAllocator outputAllocator 

Set output allocator to use for output tensor of given name. Pass nullptr to outputAllocator to unset. The allocator is called by enqueueV3().

tensorNameThe name of an output tensor.
outputAllocatorIOutputAllocator for the tensors.
True if success, false if the provided name does not map to an output or, if some other error occurred.
The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
See also
enqueueV3() IOutputAllocator

◆ setPersistentCacheLimit()

void nvinfer1::IExecutionContext::setPersistentCacheLimit ( size_t  size)

Set the maximum size for persistent cache usage.

This function sets the maximum persistent L2 cache that this execution context may use for activation caching. Activation caching is not supported on all architectures - see "How TensorRT uses Memory" in the developer guide for details

sizethe size of persistent cache limitation in bytes. The default is 0 Bytes.
See also

◆ setProfiler()

void nvinfer1::IExecutionContext::setProfiler ( IProfiler profiler)

Set the profiler.

See also
IProfiler getProfiler()

◆ setTemporaryStorageAllocator()

bool nvinfer1::IExecutionContext::setTemporaryStorageAllocator ( IGpuAllocator allocator)

Specify allocator to use for internal temporary storage.

This allocator is used only by enqueueV3() for temporary storage whose size cannot be predicted ahead of enqueueV3(). It is not used for output tensors, because memory allocation for those is allocated by the allocator set by setOutputAllocator(). All memory allocated is freed by the time enqueueV3() returns.

allocatorpointer to allocator to use. Pass nullptr to revert to using TensorRT's default allocator.
True on success, false if error occurred.
See also
enqueueV3() setOutputAllocator()

◆ setTensorAddress()

bool nvinfer1::IExecutionContext::setTensorAddress ( char const *  tensorName,
void *  data 

Set memory address for given input or output tensor.

tensorNameThe name of an input or output tensor.
dataThe pointer (void*) to the data owned by the user.
True on success, false if error occurred.

An address defaults to nullptr. Pass data=nullptr to reset to the default state.

Return false if the provided name does not map to an input or output tensor.

If an input pointer has type (void const*), use setInputTensorAddress() instead.

Before calling enqueueV3(), each input must have a non-null address and each output must have a non-null address or an IOutputAllocator to set it later.

If the TensorLocation of the tensor is kHOST, the pointer must point to a host buffer of sufficient size. For shape tensors, the only supported data type is int32_t. If the TensorLocation of the tensor is kDEVICE, the pointer must point to a device buffer of sufficient size and alignment, or be nullptr if the tensor is an output tensor that will be allocated by IOutputAllocator.

If getTensorShape(name) reports a -1 for any dimension of an output after all input shapes have been set, then to find out the dimensions, use setOutputAllocator() to associate an IOutputAllocator to which the dimensions will be reported when known.

Calling both setTensorAddress and setOutputAllocator() for the same output is allowed, and can be useful for preallocating memory, and then reallocating if it's not big enough.

The pointer must have at least 256-byte alignment.

The string tensorName must be null-terminated, and be at most 4096 bytes including the terminator.
See also
setInputTensorAddress() getTensorShape() setOutputAllocator() IOutputAllocator

Member Data Documentation

◆ mImpl

apiv::VExecutionContext* nvinfer1::IExecutionContext::mImpl

The documentation for this class was generated from the following file:

  Copyright © 2024 NVIDIA Corporation
  Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact