DLA Standalone Mode#
If you need to run inference outside of TensorRT, you can use EngineCapability::kDLA_STANDALONE to generate a DLA loadable instead of a TensorRT engine. This loadable can then be used with the cuDLA API.
Building A DLA Loadable#
Set the default device type and engine capability to DLA standalone mode.
1builderConfig->setDefaultDeviceType(DeviceType::kDLA); 2builderConfig->setEngineCapability(EngineCapability::kDLA_STANDALONE);
1builder_config.default_device_type = trt.DeviceType.DLA 2builder_config.engine_capability = trt.EngineCapability.DLA_STANDALONE
Specify the desired precision (FP16, INT8, or mixed) on the network rather than via builder flags. In TensorRT 11.0 and later, create the network with
NetworkDefinitionCreationFlag::kSTRONGLY_TYPEDand let the tensor types in the network (or the pre-quantized ONNX import) drive precision; the per-precisionBuilderFlagvalues such askFP16,kINT8,kBF16, andkFP8have been removed. Refer to Migrating from TensorRT 10.x to 11.x for end-to-end DLA precision examples.DLA standalone mode disallows reformatting; therefore,
BuilderFlag::kDIRECT_IOneeds to be set.1builderConfig->setFlag(BuilderFlag::kDIRECT_IO);
1builder_config.set_flag(trt.BuilderFlag.DIRECT_IO)
Set the allowed formats for I/O tensors to one or more of those that are DLA-supported.
Build as normal.
Using trtexec To Generate A DLA Loadable#
The trtexec tool can generate a DLA loadable instead of a TensorRT engine. Specifying both --useDLACore and --safe parameters sets the builder capability to EngineCapability::kDLA_STANDALONE. Specifying --inputIOFormats and --outputIOFormats restricts I/O data type and memory layout. The DLA loadable is saved into a file by specifying --saveEngine parameter.
For example, to generate an FP16 DLA loadable for an ONNX model using trtexec, run:
./trtexec --onnx=model.onnx --saveEngine=model_loadable.bin --useDLACore=0 --fp16 --inputIOFormats=fp16:chw16 --outputIOFormats=fp16:chw16 --skipInference --safe