Migrating Safety Runtime Code from TensorRT 8.x to 10.x#
This page describes how to update safety runtime code when migrating from TensorRT 8.x to 10.x. The safety runtime targets safety-critical applications with clearer error handling and memory management. The numbered walkthrough below pairs 8.x and 10.x C++ snippets for the same tasks, and the final section lists removed safety APIs and replacements.
See also
- Migration Guide Overview
Landing page with links to all migration surfaces.
- C++ API Reference
Full C++ API documentation.
Warning
The safety runtime namespace has changed from nvinfer1::safe to nvinfer2::safe in TensorRT 10.x. The runtime object model has also changed: IRuntime / ICudaEngine / IExecutionContext are replaced by a single ITRTGraph object. All I/O tensor memory must be wrapped in TypedArray objects, and enqueueV3 is replaced by executeAsync with sync().
Migrating from TensorRT 8.x Safety Runtime to 10.x#
The numbered steps below show TensorRT 8.x first, then TensorRT 10.x, for loading the engine, querying I/O tensors, execution context setup, tensor addresses, and inference.
Load the engine from the user’s local file system into a memory buffer.
Create an
IRuntimeobject with which to deserialize the CUDA engine. Use theIRuntimeobject’sdeserializeCudaEnginemethod to retrieve theICudaEngineobject from the memory buffer.Before (TensorRT 8.x)
1using namespace nvinfer1::safe; 2auto infer = createInferRuntime(logger); 3infer->setErrorRecorder(recorder.get()); 4auto engine = infer->deserializeCudaEngine(enginebuffer.data(), enginebufferSize);
The
inferobject (anIRuntimeinstance) provides the primary C++ entry point for TensorRT. It deserializes a serialized engine buffer and produces anICudaEngineobject.After (TensorRT 10.x)
1using namespace nvinfer2::safe; 2ITRTGraph *graph = nullptr; 3ErrorCode code = createTRTGraph(/* ITRTGraph *& */graph, 4/*Engine buffer*/ buffer.data(), 5/*Engine buffer size*/ buffer.size(), 6/*ISafeRecorder& */ recorder, 7/*trtManagedScratch*/ true, 8/*ISafeMemAllocator* */ allocator);
ITRTGraphrepresents a neural network graph. ThecreateTRTGraphfunction takes the engine data buffer and constructs the graph object from the serialized engine file.recorder— a reference to anISafeRecorderobject (a derived class ofIErrorRecorder). A sample implementation is provided.trtManagedScratch— whentrue(default), the memory allocator (user-supplied or built-in) allocates scratch memory, equivalent tocreateExecutionContextin TensorRT 8.x. Whenfalse(equivalent tocreateExecutionContextWithoutScratch), the user must supply and manage scratch memory directly.allocator— an optional custom memory allocator implementing theISafeMemAllocatorinterface. Ifnullptr(default), the built-in allocator is used.
Get the input/output tensor attributes from the engine.
Before (TensorRT 8.x)
1int32_t nb = engine->getNbIOTensors(); 2char const* inputName = engine->getIOTensorName(inputIndex); 3char const* outputName = engine->getIOTensorName(outputIndex); 4Dims inputDims = engine->getTensorShape(inputName); 5Dims outputDims = engine->getTensorShape(outputName); 6DataType inputType = engine->getTensorDataType(inputName); 7DataType outputType = engine->getTensorDataType(outputName);
To calculate the tensor volume, you might also need
getTensorVectorizedDim,getTensorBytesPerComponent, andgetTensorComponentsPerElement.After (TensorRT 10.x)
1int64_t nb; 2ErrorCode code = graph->getNbIOTensors(nb); 3char const* inputName; 4code = graph->getIOTensorName(inputName, inputIndex); 5char const* outputName; 6code = graph->getIOTensorName(outputName, outputIndex); 7TensorDescriptor inputDescriptor; 8code = graph->getIOTensorDescriptor(inputDescriptor, inputName); 9TensorDescriptor outputDescriptor; 10code = graph->getIOTensorDescriptor(outputDescriptor, outputName);
In TensorRT 10.x, the
TensorDescriptorconvenience class contains all the information needed to construct anIOTensoroutside of TensorRT and pass its address back to TensorRT.1struct TensorDescriptor 2{ 3 //! Name of the IO Tensor. 4 AsciiChar const* tensorName{nullptr}; 5 //! Static shape of the IO Tensor. 6 //! \note The static shape will depend on the IO Profile selected. 7 PhysicalDims shape{-1, {}}; 8 //! Stride vector for each element of the IO Tensor. 9 PhysicalDims stride{-1, {}}; 10 //! DataType enum for the IO Tensor. 11 //! \warning The actual type of the tensor data must correspond to this DataType. 12 DataType dataType{DataType::kFLOAT}; 13 //! The size of the tensor data type in bytes (4 for float and int32, 2 for half, 1 for int8) 14 uint64_t bytesPerComponent{0U}; 15 //! The vector length (in scalars) for a vectorized tensor, 1 if the tensor is not vectorized. 16 uint64_t componentsPerVector{1U}; 17 //! The dimension index along which the tensor is vectorized, -1 if the tensor is not vectorized. 18 int64_t vectorizedDim{-1}; 19 //! Total size in bytes for the IO Tensor. 20 uint64_t sizeInBytes{0U}; 21 //! Enum to denote whether the Tensor is for input or output. 22 TensorIOMode ioMode{TensorIOMode::kNONE}; 23 //! Enum to denote whether the Tensor memory is allocated on the GPU, CPU, or CPU_PINNED. 24 MemoryPlacement memPlacement{MemoryPlacement::kNONE}; 25 //! The order in which the dimensions are laid out in memory. 26 PhysicalDims strideOrder{-1, {}}; 27 //! The original user-specified shape of the tensor, provided as a reference if the tensor is 28 //! vectorized. When a tensor is vectorized by size T along some dimension with size K, the 29 //! vectorized dimension is split into two dimensions of sizes ceil(K/T) and T. 30 Dims userShape{-1, {}}; 31};
The
sizeInBytesfield directly provides the memory required for the tensor.The descriptor captures all properties needed to query, allocate, read, and write IO tensors. The physical tensor shape and layout chosen by the compiler can differ from the original user-specified shape due to padding or tiling for vectorized layouts. For example, a user-specified shape
[1,10,3,4]with aTensorFormatofNHWC8tiles by 8 elements along the C dimension, producing the following descriptor fields:User-specified properties (
NHWC8):userShape:[1,10,3,4](in NCHW order)componentsPerVector:8vectorizedDim:1(C dimension)
Compiler-determined layout:
shape:[1,2,8,3,4](C dimension split into two dimensions)stride:[192,8,1,64,16]strideOrder:[4,1,0,3,2](tiled C dimensions change fastest)
Allocate device/host buffers for the input/output tensors of the engine.
Create an
ExecutionContextfrom the engine with or without scratch memory. If without scratch memory, the user also needs to allocate and set memory for the context.Before (TensorRT 8.x)
1auto context = engine->createExecutionContext();
If without
scratchMemory:1auto context = engine->createExecutionContextWithoutDeviceMemory(); 2size_t size = engine->getDeviceMemorySize(); 3void* mem; 4cudaError_t code = cudaMalloc(&mem, size); 5context->setDeviceMemory(mem);
After (TensorRT 10.x)
There is no need to call
createExecutionContextbecause theITRTGraphobject carries its own context.If
trtManagedScratchwas set tofalsewhen creating the graph, scratch memory must be supplied manually:1size_t size; 2code = graph->getScratchMemorySize(size); 3void* mem; 4cudaError_t code = cudaMalloc(&mem, size); 5code = graph->setScratchMemory(mem);
Preprocess the input data with the host buffer and copy it to the device buffer.
Set the input/output tensor addresses of the context with the device buffer addresses accordingly.
Before (TensorRT 8.x)
1bool result = context->setInputTensorAddress(inputName,inDataPtr); 2result = context->setOutputTensorAddress(outputName, outDataPtr);
After (TensorRT 10.x)
1TypedArray inputData(inDataPtr, inSize); 2code = graph->setInputTensorAddress(inputName, inputData); 3TypedArray outputData(outDataPtr, outSize); 4graph->setOutputTensorAddress(outputName, outputData);
In TensorRT 10.x, IO tensor memory must be strongly typed and wrapped in a
TypedArrayobject.Call enqueue to start inference.
Before (TensorRT 8.x)
1result = context->enqueueV3(stream); 2cudaStreamSynchronize(stream);
After (TensorRT 10.x)
1code = graph->executeAsync(stream); 2code = graph->sync();
In TensorRT 10.x, CUDA stream synchronization is handled by the
sync()API, which also performs additional finalization.Postprocess the output device buffer to retrieve the inference output.
Removed Safety C++ APIs and Replacements#
Warning
The following Safety C++ APIs have been removed in TensorRT 10.x. The naming pattern follows the general TensorRT migration from binding-based APIs to name-based (tensor) APIs. Update your code to use the replacements listed below.
safe::ICudaEngine::bindingIsInput()safe::ICudaEngine::tensorIOMode()safe::ICudaEngine::getBindingBytesPerComponent()safe::ICudaEngine::getTensorBytesPerComponent()safe::ICudaEngine::getBindingComponentsPerElement()safe::ICudaEngine::getTensorComponentsPerElement()safe::ICudaEngine::getBindingDataType()safe::ICudaEngine::getTensorDataType()safe::ICudaEngine::getBindingDimensions()safe::ICudaEngine::getTensorShape()safe::ICudaEngine::getBindingIndex()safe::name-based methodssafe::ICudaEngine::getBindingName()safe::name-based methodssafe::ICudaEngine::getBindingVectorizedDim()safe::ICudaEngine::getTensorVectorizedDim()safe::ICudaEngine::getNbBindings()safe::ICudaEngine::getNbIOTensors()safe::ICudaEngine::getTensorFormat()safe::ICudaEngine::getBindingFormat()