DLA Standalone Mode#

If you need to run inference outside of TensorRT, you can use EngineCapability::kDLA_STANDALONE to generate a DLA loadable instead of a TensorRT engine. This loadable can then be used with the cuDLA API.

Building A DLA Loadable#

  1. Set the default device type and engine capability to DLA standalone mode.

    1builderConfig->setDefaultDeviceType(DeviceType::kDLA);
    2builderConfig->setEngineCapability(EngineCapability::kDLA_STANDALONE);
    
    1builder_config.default_device_type = trt.DeviceType.DLA
    2builder_config.engine_capability = trt.EngineCapability.DLA_STANDALONE
    
  2. Specify the desired precision (FP16, INT8, or mixed) on the network rather than via builder flags. In TensorRT 11.0 and later, create the network with NetworkDefinitionCreationFlag::kSTRONGLY_TYPED and let the tensor types in the network (or the pre-quantized ONNX import) drive precision; the per-precision BuilderFlag values such as kFP16, kINT8, kBF16, and kFP8 have been removed. Refer to Migrating from TensorRT 10.x to 11.x for end-to-end DLA precision examples.

  3. DLA standalone mode disallows reformatting; therefore, BuilderFlag::kDIRECT_IO needs to be set.

    1builderConfig->setFlag(BuilderFlag::kDIRECT_IO);
    
    1builder_config.set_flag(trt.BuilderFlag.DIRECT_IO)
    
  4. Set the allowed formats for I/O tensors to one or more of those that are DLA-supported.

  5. Build as normal.

Using trtexec To Generate A DLA Loadable#

The trtexec tool can generate a DLA loadable instead of a TensorRT engine. Specifying both --useDLACore and --safe parameters sets the builder capability to EngineCapability::kDLA_STANDALONE. Specifying --inputIOFormats and --outputIOFormats restricts I/O data type and memory layout. The DLA loadable is saved into a file by specifying --saveEngine parameter.

For example, to generate an FP16 DLA loadable for an ONNX model using trtexec, run:

./trtexec --onnx=model.onnx --saveEngine=model_loadable.bin --useDLACore=0 --fp16 --inputIOFormats=fp16:chw16 --outputIOFormats=fp16:chw16 --skipInference --safe