Migrating Safety Runtime Code from TensorRT 10.x to 11.x#
This page describes how to update safety runtime code when migrating from TensorRT 10.x to 11.x.
The most significant change is the removal of frontend safety scope validation. In 10.x, the builder performed a static pre-build check against a “Minimal Safety Scope” allowlist before engine building began. In 11.x, this check is removed and the build-time compiler is the sole authority for safety scope enforcement. As a result, safety build errors now appear as tactic failures with layer-level traceback rather than pre-build scope rejections.
This shift affects builder flag usage, the behavior of
isNetworkSupported(), error handling patterns, and
trtexec command-line options. The kernel checker tool also
gains MLIR validation in this release. Each section below pairs
10.x and 11.x C++ snippets where applicable.
Important
This section is only applicable when using the TensorRT 11.x safety runtime, which is only available on NVIDIA DriveOS 7.x.
Adapting to Build-Time Safety Scope Validation#
TensorRT 11.x removes pre-build safety scope validation and
relies exclusively on build-time enforcement. In 10.x, setting
BuilderFlag::kSAFETY_SCOPE triggered a pre-compile check
against a static “Minimal Safety Scope” allowlist. In 11.x, this
pre-build check is removed and the safety scope is now enforced
entirely during engine building.
Update your code in three areas: builder configuration, build
failure handling, and isNetworkSupported() usage.
BuilderFlag::kSAFETY_SCOPE is Deprecated#
BuilderFlag::kSAFETY_SCOPE is retained for API compatibility
but has no effect in TensorRT 11.x. Setting it no longer triggers
pre-build validation. You can leave existing calls in place or
remove them; either way, the flag is silently ignored.
Before (TensorRT 10.x)#
1config->setFlag(BuilderFlag::kSAFETY_SCOPE);
2// Triggers static per-layer supportsSafety() checks before engine building.
3// Returns an error at network definition time for out-of-scope layers.
4auto serialized = builder->buildSerializedNetwork(*network, *config);
After (TensorRT 11.x)#
1// kSAFETY_SCOPE is now a no-op. Remove it or leave it - it has no effect.
2// Safety validation occurs exclusively at build time.
3auto serialized = builder->buildSerializedNetwork(*network, *config);
Summary of Changes#
BuilderFlag::kSAFETY_SCOPEis deprecated and ignored.Safety build failures now surface as build-time tactic errors (for example,
No tactic available) with layer-level traceback, rather than pre-build scope errors.
isNetworkSupported() No Longer Validates Safety Scope#
In TensorRT 10.x, IBuilder::isNetworkSupported() performed
static pre-build safety scope checks when kSAFETY_SCOPE was
set. In 11.x, the function is simplified: it checks only
architectural constraint violations (for example, hybrid DLA/GPU
mode conflicts and kSAFETY_SCOPE flag combinations). A true return value does not
guarantee a successful build.
Before (TensorRT 10.x)#
1// isNetworkSupported() checked static safety scope rules
2// and could be used as a fast pre-build safety gate.
3if (!builder->isNetworkSupported(*network, *config)) {
4 // Network was outside the static Minimal Safety Scope.
5 return false;
6}
7auto serialized = builder->buildSerializedNetwork(*network, *config);
After (TensorRT 11.x)#
1// isNetworkSupported() checks only architectural constraints (flag compatibility,
2// hybrid DLA/GPU mode). It does NOT validate safety scope.
3// Use buildSerializedNetwork() for a definitive safety determination.
4if (!builder->isNetworkSupported(*network, *config)) {
5 // Network has an invalid configuration (e.g., incompatible builder flags).
6 return false;
7}
8auto serialized = builder->buildSerializedNetwork(*network, *config);
9if (!serialized) {
10 // Build failed; build-time safety checks rejected the network.
11 // Inspect error log for layer-level traceback.
12 return false;
13}
Summary of Changes#
isNetworkSupported()no longer performs per-layer safety scope checks.If you relied on
isNetworkSupported()as a definitive safety gate, switch tobuildSerializedNetwork()for a complete build-time determination.Tensor volume limit and boolean tensor checks have been removed from
isNetworkSupported()(these are now validated at build-time).
Interpreting Build-Time Safety Build Failures#
Because all safety scope enforcement now occurs during engine
building, build failures manifest as build-time errors rather
than pre-build scope rejections. When a network is outside the
certified scope, the builder reports a
No tactic available error that traces back to the originating
layer.
Review your error handling code to process these build-time messages:
1auto serialized = builder->buildSerializedNetwork(*network, *config);
2if (!serialized) {
3 // Examine the error log for messages such as:
4 // "Autotuner: no tactics to implement operation: ... layers=[ONNX Layer: <LayerName>]"
5 // Use this layer name to identify the unsupported operation and consult the
6 // TensorRT Safety Delta Document for known limitations relative to standard scope.
7}
Removed trtexec Flag --restricted#
Warning
The --restricted flag has been removed in TensorRT 11.x. Using it will cause trtexec to exit with an error.
The --restricted trtexec flag, which previously enabled pre-build safety scope validation during engine builds, has been removed. Safety restrictions can no longer be applied during standard engine building. Remove --restricted from any build scripts or CI pipelines that reference it.
Kernel Checker Tool Improvements#
Important
This section is only applicable when using the TensorRT 11.x safety runtime, which is only available on NVIDIA DriveOS 7.x.
The TensorRT kernel checker tool adds new checks to validate kernels in MLIR form, in addition to the existing checks that validate kernels in CUDA C++ form. Refer to the NVIDIA Deep Learning Inference SEooC 2.2 Safety Tools Manual V0.1 for more information.
Recommendation to Use Safety Proxy for Early Bring-Up#
For safety workflows, begin developing with the TensorRT safety proxy runtime on x86 for early bring-up rather than with the TensorRT standard runtime. The standard and safety runtime APIs differ significantly, as described in the Migrating Safety Runtime Code from TensorRT 8.x to 10.x section.
Deprecated trtexec_safe Flags and Replacements#
The following trtexec_safe flags have been deprecated in 11.x but are still accepted. Each entry shows the deprecated flag and its replacement.
--useCudaGraphEnabled by default; flag accepted but has no effect. A deprecation warning is issued. Use
--noCudaGraphto disable CUDA graph usage.--separateProfileRunAlways enabled: flag accepted but has no effect. A deprecation warning is issued. This flag will be removed in a future release.
Adapting to Auxiliary CUDA Stream Support#
Important
This section is only applicable when using the TensorRT 11.x safety runtime, which is only available on NVIDIA DriveOS 7.x.
The TensorRT 11.x safety runtime now supports auxiliary CUDA streams, lifting the TensorRT 10.x restriction that forced every safety engine to execute on a single stream. By default, the builder may produce engines that require one or more auxiliary streams; when it does, the application is responsible for allocating, registering, and destroying those streams at runtime.
Warning
Existing TensorRT 10.x safety application code that loads a serialized engine and calls executeAsync() without first registering auxiliary streams may fail at runtime if the engine was built with maxAuxStreams > 0.
There are two migration paths:
Preserve 10.x single-stream behavior by setting
maxAuxStreamsto0at build time. No runtime code changes are required.Adopt multi-stream execution by querying the engine’s auxiliary-stream count and registering streams before the first
executeAsync()call.
Note
The standard (non-safety) runtime documents the equivalent APIs in the Within-Inference Multi-Streaming section. The key safety-specific differences are:
The application must allocate and register auxiliary streams; the safety runtime does not auto-create them.
The runtime APIs are on
nvinfer2::safe::ITRTGraph(notIExecutionContext) and live inNvInferSafeRuntime.h.
Preserving 10.x Single-Stream Behavior#
When maxAuxStreams is omitted at build time, the builder may select a non-zero value via internal heuristics. To guarantee single-stream behavior across releases, set the flag explicitly to 0 at build time:
1auto config = builder->createBuilderConfig();
2config->setMaxAuxStreams(0); // produces a single-stream engine
3// ... rest of build configuration ...
The same flag is available in trtexec:
trtexec --onnx=model.onnx --maxAuxStreams=0 ...
Adopting Auxiliary Streams#
If your workflow allows multi-stream engines, the aux-stream-specific steps to add to your runtime code are:
Query the number of auxiliary streams the engine expects (after creating the safety graph from the serialized engine).
Allocate that many CUDA streams with the
cudaStreamNonBlockingflag.Register the streams via
setAuxStreams()before the first call toexecuteAsync(). This is an INIT-phase operation (before any inference launches).Destroy the streams after the final
sync()call.
If getNbAuxStreams() returns 0, the engine is single-stream and no auxiliary-stream handling is required — you can skip the allocation and registration steps.
The relevant API lives in namespace nvinfer2::safe in NvInferSafeRuntime.h:
1// 1. Create the safety graph from the serialized engine.
2nvinfer2::safe::ITRTGraph* graph = nullptr;
3nvinfer2::safe::createTRTGraph(graph, planData, planSize, recorder, true);
4
5// 2. Query the number of auxiliary streams this engine requires.
6int32_t nbAuxStreams = 0;
7graph->getNbAuxStreams(nbAuxStreams);
8
9// 3. Allocate that many non-blocking streams. The application owns
10// the lifetime of these streams.
11std::vector<cudaStream_t> auxStreams(nbAuxStreams);
12for (int32_t i = 0; i < nbAuxStreams; ++i)
13{
14 cudaStreamCreateWithFlags(&auxStreams[i], cudaStreamNonBlocking);
15}
16
17// 4. Register the streams BEFORE the first executeAsync call.
18// The count must exactly match getNbAuxStreams().
19graph->setAuxStreams(auxStreams.data(), nbAuxStreams);
20
21// 5. Create the main inference stream.
22cudaStream_t mainStream;
23cudaStreamCreateWithFlags(&mainStream, cudaStreamNonBlocking);
24
25// 6. Run inference.
26graph->executeAsync(mainStream);
27
28// 7. Wait for the full inference to complete.
29graph->sync();
30
31// 8. Cleanup — must occur AFTER the final sync().
32for (auto s : auxStreams)
33{
34 cudaStreamDestroy(s);
35}
36cudaStreamDestroy(mainStream);
37nvinfer2::safe::destroyTRTGraph(graph);