Context for executing inference using an engine, with functionally unsafe features.
More...
|
virtual | ~IExecutionContext () noexcept=default |
|
TRT_DEPRECATED bool | execute (int32_t batchSize, void *const *bindings) noexcept |
| Synchronously execute inference on a batch. More...
|
|
TRT_DEPRECATED bool | enqueue (int32_t batchSize, void *const *bindings, cudaStream_t stream, cudaEvent_t *inputConsumed) noexcept |
| Asynchronously execute inference on a batch. More...
|
|
void | setDebugSync (bool sync) noexcept |
| Set the debug sync flag. More...
|
|
bool | getDebugSync () const noexcept |
| Get the debug sync flag. More...
|
|
void | setProfiler (IProfiler *profiler) noexcept |
| Set the profiler. More...
|
|
IProfiler * | getProfiler () const noexcept |
| Get the profiler. More...
|
|
ICudaEngine const & | getEngine () const noexcept |
| Get the associated engine. More...
|
|
TRT_DEPRECATED void | destroy () noexcept |
| Destroy this object. More...
|
|
void | setName (char const *name) noexcept |
| Set the name of the execution context. More...
|
|
char const * | getName () const noexcept |
| Return the name of the execution context. More...
|
|
void | setDeviceMemory (void *memory) noexcept |
| Set the device memory for use by this execution context. More...
|
|
Dims | getStrides (int32_t bindingIndex) const noexcept |
| Return the strides of the buffer for the given binding. More...
|
|
TRT_DEPRECATED bool | setOptimizationProfile (int32_t profileIndex) noexcept |
| Select an optimization profile for the current context. More...
|
|
int32_t | getOptimizationProfile () const noexcept |
| Get the index of the currently selected optimization profile. More...
|
|
bool | setBindingDimensions (int32_t bindingIndex, Dims dimensions) noexcept |
| Set the dynamic dimensions of a binding. More...
|
|
Dims | getBindingDimensions (int32_t bindingIndex) const noexcept |
| Get the dynamic dimensions of a binding. More...
|
|
bool | setInputShapeBinding (int32_t bindingIndex, int32_t const *data) noexcept |
| Set values of input tensor required by shape calculations. More...
|
|
bool | getShapeBinding (int32_t bindingIndex, int32_t *data) const noexcept |
| Get values of an input tensor required for shape calculations or an output tensor produced by shape calculations. More...
|
|
bool | allInputDimensionsSpecified () const noexcept |
| Whether all dynamic dimensions of input tensors have been specified. More...
|
|
bool | allInputShapesSpecified () const noexcept |
| Whether all input shape bindings have been specified. More...
|
|
void | setErrorRecorder (IErrorRecorder *recorder) noexcept |
| Set the ErrorRecorder for this interface. More...
|
|
IErrorRecorder * | getErrorRecorder () const noexcept |
| Get the ErrorRecorder assigned to this interface. More...
|
|
bool | executeV2 (void *const *bindings) noexcept |
| Synchronously execute inference a network. More...
|
|
bool | enqueueV2 (void *const *bindings, cudaStream_t stream, cudaEvent_t *inputConsumed) noexcept |
| Asynchronously execute inference. More...
|
|
bool | setOptimizationProfileAsync (int32_t profileIndex, cudaStream_t stream) noexcept |
| Select an optimization profile for the current context with async semantics. More...
|
|
void | setEnqueueEmitsProfile (bool enqueueEmitsProfile) noexcept |
| Set whether enqueue emits layer timing to the profiler. More...
|
|
bool | getEnqueueEmitsProfile () const noexcept |
| Get the enqueueEmitsProfile state. More...
|
|
bool | reportToProfiler () const noexcept |
| Calculate layer timing info for the current optimization profile in IExecutionContext and update the profiler after one iteration of inference launch. More...
|
|
Context for executing inference using an engine, with functionally unsafe features.
Multiple execution contexts may exist for one ICudaEngine instance, allowing the same engine to be used for the execution of multiple batches simultaneously. If the engine supports dynamic shapes, each execution context in concurrent use must use a separate optimization profile.
- Warning
- Do not inherit from this class, as doing so will break forward-compatibility of the API and ABI.
Dims nvinfer1::IExecutionContext::getBindingDimensions |
( |
int32_t |
bindingIndex | ) |
const |
|
inlinenoexcept |
Dims nvinfer1::IExecutionContext::getStrides |
( |
int32_t |
bindingIndex | ) |
const |
|
inlinenoexcept |
Return the strides of the buffer for the given binding.
The strides are in units of elements, not components or bytes. For example, for TensorFormat::kHWC8, a stride of one spans 8 scalars.
Note that strides can be different for different execution contexts with dynamic shapes.
If the bindingIndex is invalid or there are dynamic dimensions that have not been set yet, returns Dims with Dims::nbDims = -1.
- Parameters
-
bindingIndex | The binding index. |
- See also
- getBindingComponentsPerElement
bool nvinfer1::IExecutionContext::reportToProfiler |
( |
| ) |
const |
|
inlinenoexcept |
Calculate layer timing info for the current optimization profile in IExecutionContext and update the profiler after one iteration of inference launch.
If IExecutionContext::getEnqueueEmitsProfile() returns true, the enqueue function will calculate layer timing implicitly if a profiler is provided. This function returns true and does nothing.
If IExecutionContext::getEnqueueEmitsProfile() returns false, the enqueue function will record the CUDA event timers if a profiler is provided. But it will not perform the layer timing calculation. IExecutionContext::reportToProfiler() needs to be called explicitly to calculate layer timing for the previous inference launch.
In the CUDA graph launch scenario, it will record the same set of CUDA events as in regular enqueue functions if the graph is captured from an IExecutionContext with profiler enabled. This function needs to be called after graph launch to report the layer timing info to the profiler.
- Warning
- profiling CUDA graphs is only available from CUDA 11.1 onwards.
-
reportToProfiler uses the stream of the previous enqueue call, so the stream must be live otherwise behavior is undefined.
- Returns
- true if the call succeeded, else false (e.g. profiler not provided, in CUDA graph capture mode, etc.)
- See also
- IExecutionContext::setEnqueueEmitsProfile()
-
IExecutionContext::getEnqueueEmitsProfile()
TRT_DEPRECATED bool nvinfer1::IExecutionContext::setOptimizationProfile |
( |
int32_t |
profileIndex | ) |
|
|
inlinenoexcept |
Select an optimization profile for the current context.
- Parameters
-
profileIndex | Index of the profile. It must lie between 0 and getEngine().getNbOptimizationProfiles() - 1 |
The selected profile will be used in subsequent calls to executeV2() or enqueueV2().
When an optimization profile is switched via this API, TensorRT may enqueue GPU memory copy operations required to set up the new profile during the subsequent enqueueV2() operations. To avoid these calls during enqueueV2(), use setOptimizationProfileAsync() instead.
If the associated CUDA engine has dynamic inputs, this method must be called at least once with a unique profileIndex before calling execute or enqueue (i.e. the profile index may not be in use by another execution context that has not been destroyed yet). For the first execution context that is created for an engine, setOptimizationProfile(0) is called implicitly.
If the associated CUDA engine does not have inputs with dynamic shapes, this method need not be called, in which case the default profile index of 0 will be used (this is particularly the case for all safe engines).
setOptimizationProfile() must be called before calling setBindingDimensions() and setInputShapeBinding() for all dynamic input tensors or input shape tensors, which in turn must be called before either executeV2() or enqueueV2().
- Warning
- This function will trigger layer resource updates on the next call of enqueueV2()/executeV2(), possibly resulting in performance bottlenecks.
- Returns
- true if the call succeeded, else false (e.g. input out of range)
- Deprecated:
- Superseded by setOptimizationProfileAsync. Deprecated prior to TensorRT 8.0 and will be removed in 9.0.
- See also
- ICudaEngine::getNbOptimizationProfiles() IExecutionContext::setOptimizationProfileAsync()
bool nvinfer1::IExecutionContext::setOptimizationProfileAsync |
( |
int32_t |
profileIndex, |
|
|
cudaStream_t |
stream |
|
) |
| |
|
inlinenoexcept |
Select an optimization profile for the current context with async semantics.
- Parameters
-
profileIndex | Index of the profile. The value must lie between 0 and getEngine().getNbOptimizationProfiles() - 1 |
stream | A cuda stream on which the cudaMemcpyAsyncs may be enqueued |
When an optimization profile is switched via this API, TensorRT may require that data is copied via cudaMemcpyAsync. It is the application’s responsibility to guarantee that synchronization between the profile sync stream and the enqueue stream occurs.
The selected profile will be used in subsequent calls to executeV2() or enqueueV2(). If the associated CUDA engine has inputs with dynamic shapes, the optimization profile must be set with a unique profileIndex before calling execute or enqueue. For the first execution context that is created for an engine, setOptimizationProfile(0) is called implicitly.
If the associated CUDA engine does not have inputs with dynamic shapes, this method need not be called, in which case the default profile index of 0 will be used.
setOptimizationProfileAsync() must be called before calling setBindingDimensions() and setInputShapeBinding() for all dynamic input tensors or input shape tensors, which in turn must be called before either executeV2() or enqueueV2().
- Warning
- This function will trigger layer resource updates on the next call of enqueueV2()/executeV2(), possibly resulting in performance bottlenecks.
-
Not synchronizing the stream used at enqueue with the stream used to set optimization profile asynchronously using this API will result in undefined behavior.
- Returns
- true if the call succeeded, else false (e.g. input out of range)
- See also
- ICudaEngine::getNbOptimizationProfiles()
-
IExecutionContext::setOptimizationProfile()