Migrating Safety Runtime Code from TensorRT 10.x to 11.x#

This page describes how to update safety runtime code when migrating from TensorRT 10.x to 11.x.

The most significant change is the removal of frontend safety scope validation. In 10.x, the builder performed a static pre-build check against a “Minimal Safety Scope” allowlist before engine building began. In 11.x, this check is removed and the build-time compiler is the sole authority for safety scope enforcement. As a result, safety build errors now appear as tactic failures with layer-level traceback rather than pre-build scope rejections.

This shift affects builder flag usage, the behavior of isNetworkSupported(), error handling patterns, and trtexec command-line options. The kernel checker tool also gains MLIR validation in this release. Each section below pairs 10.x and 11.x C++ snippets where applicable.

Important

This section is only applicable when using the TensorRT 11.x safety runtime, which is only available on NVIDIA DriveOS 7.x.

Adapting to Build-Time Safety Scope Validation#

TensorRT 11.x removes pre-build safety scope validation and relies exclusively on build-time enforcement. In 10.x, setting BuilderFlag::kSAFETY_SCOPE triggered a pre-compile check against a static “Minimal Safety Scope” allowlist. In 11.x, this pre-build check is removed and the safety scope is now enforced entirely during engine building.

Update your code in three areas: builder configuration, build failure handling, and isNetworkSupported() usage.

`BuilderFlag::kSAFETY_SCOPE` is Deprecated#

BuilderFlag::kSAFETY_SCOPE is retained for API compatibility but has no effect in TensorRT 11.x. Setting it no longer triggers pre-build validation. You can leave existing calls in place or remove them; either way, the flag is silently ignored.

Before (TensorRT 10.x)#

config->setFlag(BuilderFlag::kSAFETY_SCOPE);
// Triggers static per-layer supportsSafety() checks before engine building.
// Returns an error at network definition time for out-of-scope layers.
auto serialized = builder->buildSerializedNetwork(*network, *config);

After (TensorRT 11.x)#

// kSAFETY_SCOPE is now a no-op. Remove it or leave it - it has no effect.
// Safety validation occurs exclusively at build time.
auto serialized = builder->buildSerializedNetwork(*network, *config);

Summary of Changes#

BuilderFlag::kSAFETY_SCOPE is deprecated and ignored.
Safety build failures now surface as build-time tactic errors (for example, No tactic available) with layer-level traceback, rather than pre-build scope errors.

`isNetworkSupported()` No Longer Validates Safety Scope#

In TensorRT 10.x, IBuilder::isNetworkSupported() performed static pre-build safety scope checks when kSAFETY_SCOPE was set. In 11.x, the function is simplified: it checks only architectural constraint violations (for example, hybrid DLA/GPU mode conflicts and kSAFETY_SCOPE flag combinations). A true return value does not guarantee a successful build.

Before (TensorRT 10.x)#

// isNetworkSupported() checked static safety scope rules
// and could be used as a fast pre-build safety gate.
if (!builder->isNetworkSupported(*network, *config)) {
    // Network was outside the static Minimal Safety Scope.
    return false;
}
auto serialized = builder->buildSerializedNetwork(*network, *config);

After (TensorRT 11.x)#

// isNetworkSupported() checks only architectural constraints (flag compatibility,
// hybrid DLA/GPU mode). It does NOT validate safety scope.
// Use buildSerializedNetwork() for a definitive safety determination.
if (!builder->isNetworkSupported(*network, *config)) {
    // Network has an invalid configuration (e.g., incompatible builder flags).
    return false;
}
auto serialized = builder->buildSerializedNetwork(*network, *config);
if (!serialized) {
    // Build failed; build-time safety checks rejected the network.
    // Inspect error log for layer-level traceback.
    return false;
}

Summary of Changes#

isNetworkSupported() no longer performs per-layer safety scope checks.
If you relied on isNetworkSupported() as a definitive safety gate, switch to buildSerializedNetwork() for a complete build-time determination.
Tensor volume limit and boolean tensor checks have been removed from isNetworkSupported() (these are now validated at build-time).

Interpreting Build-Time Safety Build Failures#

Because all safety scope enforcement now occurs during engine building, build failures manifest as build-time errors rather than pre-build scope rejections. When a network is outside the certified scope, the builder reports a No tactic available error that traces back to the originating layer.

Review your error handling code to process these build-time messages:

auto serialized = builder->buildSerializedNetwork(*network, *config);
if (!serialized) {
    // Examine the error log for messages such as:
    //   "Autotuner: no tactics to implement operation: ... layers=[ONNX Layer: <LayerName>]"
    // Use this layer name to identify the unsupported operation and consult the
    // TensorRT Safety Delta Document for known limitations relative to standard scope.
}

Removed `trtexec` Flag `--restricted`#

Warning

The --restricted flag has been removed in TensorRT 11.x. Using it will cause trtexec to exit with an error.

The --restricted trtexec flag, which previously enabled pre-build safety scope validation during engine builds, has been removed. Safety restrictions can no longer be applied during standard engine building. Remove --restricted from any build scripts or CI pipelines that reference it.

Kernel Checker Tool Improvements#

Important

This section is only applicable when using the TensorRT 11.x safety runtime, which is only available on NVIDIA DriveOS 7.x.

The TensorRT kernel checker tool adds new checks to validate kernels in MLIR form, in addition to the existing checks that validate kernels in CUDA C++ form. Refer to the NVIDIA Deep Learning Inference SEooC 2.2 Safety Tools Manual V0.1 for more information.

Recommendation to Use Safety Proxy for Early Bring-Up#

For safety workflows, begin developing with the TensorRT safety proxy runtime on x86 for early bring-up rather than with the TensorRT standard runtime. The standard and safety runtime APIs differ significantly, as described in the Migrating Safety Runtime Code from TensorRT 8.x to 10.x section.

Deprecated `trtexec_safe` Flags and Replacements#

The following trtexec_safe flags have been deprecated in 11.x but are still accepted. Each entry shows the deprecated flag and its replacement.

--useCudaGraph: Enabled by default; flag accepted but has no effect. A deprecation warning is issued. Use --noCudaGraph to disable CUDA graph usage.
--separateProfileRun: Always enabled: flag accepted but has no effect. A deprecation warning is issued. This flag will be removed in a future release.

Adapting to Auxiliary CUDA Stream Support#

Important

This section is only applicable when using the TensorRT 11.x safety runtime, which is only available on NVIDIA DriveOS 7.x.

The TensorRT 11.x safety runtime now supports auxiliary CUDA streams, lifting the TensorRT 10.x restriction that forced every safety engine to execute on a single stream. By default, the builder may produce engines that require one or more auxiliary streams; when it does, the application is responsible for allocating, registering, and destroying those streams at runtime.

Warning

Existing TensorRT 10.x safety application code that loads a serialized engine and calls executeAsync() without first registering auxiliary streams may fail at runtime if the engine was built with maxAuxStreams > 0.

There are two migration paths:

Preserve 10.x single-stream behavior by setting maxAuxStreams to 0 at build time. No runtime code changes are required.
Adopt multi-stream execution by querying the engine’s auxiliary-stream count and registering streams before the first executeAsync() call.

Note

The standard (non-safety) runtime documents the equivalent APIs in the Within-Inference Multi-Streaming section. The key safety-specific differences are:

The application must allocate and register auxiliary streams; the safety runtime does not auto-create them.
The runtime APIs are on nvinfer2::safe::ITRTGraph (not IExecutionContext) and live in NvInferSafeRuntime.h.

Preserving 10.x Single-Stream Behavior#

When maxAuxStreams is omitted at build time, the builder may select a non-zero value via internal heuristics. To guarantee single-stream behavior across releases, set the flag explicitly to 0 at build time:

auto config = builder->createBuilderConfig();
config->setMaxAuxStreams(0);  // produces a single-stream engine
// ... rest of build configuration ...

The same flag is available in trtexec:

trtexec --onnx=model.onnx --maxAuxStreams=0 ...

Adopting Auxiliary Streams#

If your workflow allows multi-stream engines, the aux-stream-specific steps to add to your runtime code are:

Query the number of auxiliary streams the engine expects (after creating the safety graph from the serialized engine).
Allocate that many CUDA streams with the cudaStreamNonBlocking flag.
Register the streams via setAuxStreams() before the first call to executeAsync(). This is an INIT-phase operation (before any inference launches).
Destroy the streams after the final sync() call.

If getNbAuxStreams() returns 0, the engine is single-stream and no auxiliary-stream handling is required — you can skip the allocation and registration steps.

The relevant API lives in namespace nvinfer2::safe in NvInferSafeRuntime.h:

// 1. Create the safety graph from the serialized engine.
nvinfer2::safe::ITRTGraph* graph = nullptr;
nvinfer2::safe::createTRTGraph(graph, planData, planSize, recorder, true);

// 2. Query the number of auxiliary streams this engine requires.
int32_t nbAuxStreams = 0;
graph->getNbAuxStreams(nbAuxStreams);

// 3. Allocate that many non-blocking streams. The application owns
//    the lifetime of these streams.
std::vector<cudaStream_t> auxStreams(nbAuxStreams);
for (int32_t i = 0; i < nbAuxStreams; ++i)
{
   cudaStreamCreateWithFlags(&auxStreams[i], cudaStreamNonBlocking);
}

// 4. Register the streams BEFORE the first executeAsync call.
//    The count must exactly match getNbAuxStreams().
graph->setAuxStreams(auxStreams.data(), nbAuxStreams);

// 5. Create the main inference stream.
cudaStream_t mainStream;
cudaStreamCreateWithFlags(&mainStream, cudaStreamNonBlocking);

// 6. Run inference.
graph->executeAsync(mainStream);

// 7. Wait for the full inference to complete.
graph->sync();

// 8. Cleanup — must occur AFTER the final sync().
for (auto s : auxStreams)
{
   cudaStreamDestroy(s);
}
cudaStreamDestroy(mainStream);
nvinfer2::safe::destroyTRTGraph(graph);

Migrating Safety Runtime Code from TensorRT 10.x to 11.x#

Adapting to Build-Time Safety Scope Validation#

BuilderFlag::kSAFETY_SCOPE is Deprecated#

Before (TensorRT 10.x)#

After (TensorRT 11.x)#

Summary of Changes#

isNetworkSupported() No Longer Validates Safety Scope#

Before (TensorRT 10.x)#

After (TensorRT 11.x)#

Summary of Changes#

Interpreting Build-Time Safety Build Failures#

Removed trtexec Flag --restricted#

Kernel Checker Tool Improvements#

Recommendation to Use Safety Proxy for Early Bring-Up#

Deprecated trtexec_safe Flags and Replacements#

Adapting to Auxiliary CUDA Stream Support#

Preserving 10.x Single-Stream Behavior#

Adopting Auxiliary Streams#

`BuilderFlag::kSAFETY_SCOPE` is Deprecated#

`isNetworkSupported()` No Longer Validates Safety Scope#

Removed `trtexec` Flag `--restricted`#

Deprecated `trtexec_safe` Flags and Replacements#