Is this page helpful?

Migrating Safety Runtime Code from TensorRT 8.x to 10.x#

This page describes how to update safety runtime code when migrating from TensorRT 8.x to 10.x. The safety runtime targets safety-critical applications with clearer error handling and memory management. The numbered walkthrough below pairs 8.x and 10.x C++ snippets for the same tasks, and the final section lists removed safety APIs and replacements.

Migrating from TensorRT 8.x Safety Runtime to 10.x#

The numbered steps below show TensorRT 8.x first, then TensorRT 10.x, for loading the engine, querying I/O tensors, execution context setup, tensor addresses, and inference.

Load the engine from the user’s local file system into a memory buffer.
Create an IRuntime object with which to deserialize the CUDA engine. Use the IRuntime object’s deserializeCudaEngine method to retrieve the ICudaEngine object from the memory buffer.

Before (TensorRT 8.x)
```
1using namespace nvinfer1::safe;
2auto infer = createInferRuntime(logger);
3infer->setErrorRecorder(recorder.get());
4auto engine = infer->deserializeCudaEngine(enginebuffer.data(), enginebufferSize);
```
The infer object (an IRuntime instance) provides the primary C++ entry point for TensorRT. It deserializes a serialized engine buffer and produces an ICudaEngine object.

After (TensorRT 10.x)
```
1using namespace nvinfer2::safe;
2ITRTGraph *graph = nullptr;
3ErrorCode code = createTRTGraph(/* ITRTGraph *& */graph,
4/*Engine buffer*/ buffer.data(),
5/*Engine buffer size*/ buffer.size(),
6/*ISafeRecorder& */ recorder,
7/*trtManagedScratch*/ true,
8/*ISafeMemAllocator* */ allocator);
```
ITRTGraph represents a neural network graph. The createTRTGraph function takes the engine data buffer and constructs the graph object from the serialized engine file.
- recorder — a reference to an ISafeRecorder object (a derived class of IErrorRecorder). A sample implementation is provided.
- trtManagedScratch — when true (default), the memory allocator (user-supplied or built-in) allocates scratch memory, equivalent to createExecutionContext in TensorRT 8.x. When false (equivalent to createExecutionContextWithoutScratch), the user must supply and manage scratch memory directly.
- allocator — an optional custom memory allocator implementing the ISafeMemAllocator interface. If nullptr (default), the built-in allocator is used.

Get the input/output tensor attributes from the engine.

Before (TensorRT 8.x)

int32_t nb = engine->getNbIOTensors();
char const* inputName = engine->getIOTensorName(inputIndex);
char const* outputName = engine->getIOTensorName(outputIndex);
Dims inputDims = engine->getTensorShape(inputName);
Dims outputDims = engine->getTensorShape(outputName);
DataType inputType = engine->getTensorDataType(inputName);
DataType outputType = engine->getTensorDataType(outputName);

To calculate the tensor volume, you might also need getTensorVectorizedDim, getTensorBytesPerComponent, and getTensorComponentsPerElement.

After (TensorRT 10.x)

int64_t nb;
ErrorCode code = graph->getNbIOTensors(nb);
char const* inputName;
code = graph->getIOTensorName(inputName, inputIndex);
char const* outputName;
code = graph->getIOTensorName(outputName, outputIndex);
TensorDescriptor inputDescriptor;
code = graph->getIOTensorDescriptor(inputDescriptor, inputName);
TensorDescriptor outputDescriptor;
code = graph->getIOTensorDescriptor(outputDescriptor, outputName);

In TensorRT 10.x, the TensorDescriptor convenience class contains all the information needed to construct an IOTensor outside of TensorRT and pass its address back to TensorRT.

struct TensorDescriptor
{
    //! Name of the IO Tensor.
    AsciiChar const* tensorName{nullptr};
    //! Static shape of the IO Tensor.
    //! \note The static shape will depend on the IO Profile selected.
    PhysicalDims shape{-1, {}};
    //! Stride vector for each element of the IO Tensor.
    PhysicalDims stride{-1, {}};
    //! DataType enum for the IO Tensor.
    //! \warning The actual type of the tensor data must correspond to this DataType.
    DataType dataType{DataType::kFLOAT};
    //! The size of the tensor data type in bytes (4 for float and int32, 2 for half, 1 for int8)
    uint64_t bytesPerComponent{0U};
    //! The vector length (in scalars) for a vectorized tensor, 1 if the tensor is not vectorized.
    uint64_t componentsPerVector{1U};
    //! The dimension index along which the tensor is vectorized, -1 if the tensor is not vectorized.
    int64_t vectorizedDim{-1};
    //! Total size in bytes for the IO Tensor.
    uint64_t sizeInBytes{0U};
    //! Enum to denote whether the Tensor is for input or output.
    TensorIOMode ioMode{TensorIOMode::kNONE};
    //! Enum to denote whether the Tensor memory is allocated on the GPU, CPU, or CPU_PINNED.
    MemoryPlacement memPlacement{MemoryPlacement::kNONE};
    //! The order in which the dimensions are laid out in memory.
    PhysicalDims strideOrder{-1, {}};
    //! The original user-specified shape of the tensor, provided as a reference if the tensor is
    //! vectorized. When a tensor is vectorized by size T along some dimension with size K, the
    //! vectorized dimension is split into two dimensions of sizes ceil(K/T) and T.
    Dims userShape{-1, {}};
};

The sizeInBytes field directly provides the memory required for the tensor.

The descriptor captures all properties needed to query, allocate, read, and write IO tensors. The physical tensor shape and layout chosen by the compiler can differ from the original user-specified shape due to padding or tiling for vectorized layouts. For example, a user-specified shape [1,10,3,4] with a TensorFormat of NHWC8 tiles by 8 elements along the C dimension, producing the following descriptor fields:

User-specified properties (NHWC8):

userShape: [1,10,3,4] (in NCHW order)
componentsPerVector: 8
vectorizedDim: 1 (C dimension)

Compiler-determined layout:

shape: [1,2,8,3,4] (C dimension split into two dimensions)
stride: [192,8,1,64,16]
strideOrder: [4,1,0,3,2] (tiled C dimensions change fastest)

Allocate device/host buffers for the input/output tensors of the engine.

Create an ExecutionContext from the engine with or without scratch memory. If without scratch memory, the user also needs to allocate and set memory for the context.

Before (TensorRT 8.x)

auto context = engine->createExecutionContext();

If without scratchMemory:

auto context = engine->createExecutionContextWithoutDeviceMemory();
size_t size = engine->getDeviceMemorySize();
void* mem;
cudaError_t code = cudaMalloc(&mem, size);
context->setDeviceMemory(mem);

After (TensorRT 10.x)

There is no need to call createExecutionContext because the ITRTGraph object carries its own context.

If trtManagedScratch was set to false when creating the graph, scratch memory must be supplied manually:

size_t size;
code = graph->getScratchMemorySize(size);
void* mem;
cudaError_t code = cudaMalloc(&mem, size);
code = graph->setScratchMemory(mem);

Preprocess the input data with the host buffer and copy it to the device buffer.

Set the input/output tensor addresses of the context with the device buffer addresses accordingly.

Before (TensorRT 8.x)

bool result = context->setInputTensorAddress(inputName,inDataPtr);
result = context->setOutputTensorAddress(outputName, outDataPtr);

After (TensorRT 10.x)

TypedArray inputData(inDataPtr, inSize);
code = graph->setInputTensorAddress(inputName, inputData);
TypedArray outputData(outDataPtr, outSize);
graph->setOutputTensorAddress(outputName, outputData);

In TensorRT 10.x, IO tensor memory must be strongly typed and wrapped in a TypedArray object.

Call enqueue to start inference.

Before (TensorRT 8.x)
```
1result = context->enqueueV3(stream);
2cudaStreamSynchronize(stream);
```
After (TensorRT 10.x)
```
1code = graph->executeAsync(stream);
2code = graph->sync();
```
In TensorRT 10.x, CUDA stream synchronization is handled by the sync() API, which also performs additional finalization.
Postprocess the output device buffer to retrieve the inference output.

Removed Safety C++ APIs and Replacements#

Warning

The following Safety C++ APIs have been removed in TensorRT 10.x. The naming pattern follows the general TensorRT migration from binding-based APIs to name-based (tensor) APIs. Update your code to use the replacements listed below.

safe::ICudaEngine::bindingIsInput(): safe::ICudaEngine::tensorIOMode()
safe::ICudaEngine::getBindingBytesPerComponent(): safe::ICudaEngine::getTensorBytesPerComponent()
safe::ICudaEngine::getBindingComponentsPerElement(): safe::ICudaEngine::getTensorComponentsPerElement()
safe::ICudaEngine::getBindingDataType(): safe::ICudaEngine::getTensorDataType()
safe::ICudaEngine::getBindingDimensions(): safe::ICudaEngine::getTensorShape()
safe::ICudaEngine::getBindingIndex(): safe::name-based methods
safe::ICudaEngine::getBindingName(): safe::name-based methods
safe::ICudaEngine::getBindingVectorizedDim(): safe::ICudaEngine::getTensorVectorizedDim()
safe::ICudaEngine::getNbBindings(): safe::ICudaEngine::getNbIOTensors()
safe::ICudaEngine::getTensorFormat(): safe::ICudaEngine::getBindingFormat()