Working with DLA#

Important

DLA is not supported in TensorRT 11.0 or 11.1. The guidance in this section describes DLA behavior for earlier and future TensorRT releases. DLA support will be reintroduced in a later minor version update.

NVIDIA DLA (Deep Learning Accelerator) is a fixed-function accelerator engine targeted for deep learning operations. It is designed to fully hardware accelerate convolutional neural networks. DLA supports various layers, such as convolution, deconvolution, fully connected, activation, pooling, and batch normalization. Refer to the DLA Supported Layers and Restrictions section for more information about DLA support in TensorRT layers.

DLA is useful for offloading CNN processing from the GPU and is significantly more power-efficient for these workloads. In addition, it can provide an independent execution pipeline in cases where redundancy is important, such as mission-critical or safety applications.

For more information, refer to the DLA Developer page and the DLA tutorial Getting Started with the Deep Learning Accelerator on NVIDIA Jetson Orin.

When building a model for DLA, the TensorRT builder parses the network and calls the DLA compiler to compile the network into a DLA loadable. Refer to Using trtexec to learn how to build and run networks on DLA.

DLA Software Stack: Build and Runtime Phases

In this guide

Building and Launching the Loadable — build and launch DLA loadables with trtexec, the TensorRT API, and cuDLA
DLA Supported Layers and Restrictions — supported layers, formats, and hardware limits
GPU Fallback Mode — GPU fallback mode and I/O format requirements
DLA Standalone Mode — generate standalone DLA loadables outside TensorRT
Customizing DLA Memory Pools — customize DLA SRAM/DRAM pools and structured sparsity