Exporting the model decouples the training process from inference and allows conversion to TensorRT engines outside the TAO environment. TensorRT engines are specific to each hardware configuration and should be generated for each unique inference environment. The exported model may be used universally across training and deployment hardware.

The exported model format is referred to as .etlt . Like the .tlt model format, .etlt is an encrypted model format, and it uses the same key as the .tlt model that it is exported from. This key is required when deploying this model.

TensorRT engines can be generated in INT8 mode to improve performance, but require a calibration cache at engine creation-time. The calibration cache is generated using a calibration tensor file, if export is run with the --data_type flag set to int8 . Pre-generating the calibration information and caching it removes the need for calibrating the model on the inference machine. Moving the calibration cache is usually much more convenient than moving the calibration tensorfile, since it is a much smaller file and can be moved with the exported model. Using the calibration cache also speeds up engine creation as building the cache can take several minutes to generate depending on the size of the Tensorfile and the model itself.

The export tool can generate the INT8 calibration cache by ingesting training data using one of these options:

Option 1 : Use the training data loader to load the training images for INT8 calibration. This option is now the recommended approach to support multiple image directories by leveraging the training dataset loader. This also ensures two important aspects of data during calibration: Data pre-processing in the INT8 calibration step is the same as in the training process. The data batches are sampled randomly across the entire training dataset, thereby improving the accuracy of the INT8 model.

Option 2: Point the tool to a directory of images that you want to use to calibrate the model. For this option, make sure to create a sub-sampled directory of random images that best represent your training dataset.

The calibration.bin is only required if you need to run inference at INT8 precision. For FP16/FP32-based inference, the export step is much simpler: All you need to do is provide a .tlt model from the training/retraining step to be converted into .etlt format.

Here’s an example of the command line arguments of the tao model mask_rcnn export command:

Copy Copied! tao model mask_rcnn export [-h] -m <path to the .tlt model file generated by tao model train> -k <key> --experiment_spec <path to experiment spec file> [-o <path to output file>] [--gen_ds_config <Flag to generate ds config and label file>] [--gpu_index <gpu_index>] [--log_file <log_file_path>]

-m, --model : The path to the .tlt model file to be exported using export .

-k, --key : The key used to save the .tlt model file.

-e, --experiment_spec : The path to the spec file.

-o, --output_file : The path to save the exported model to. The default path is ./<input_file>.etlt .

--gen_ds_config : A Boolean flag indicating whether to generate the template DeepStream related configuration (“nvinfer_config.txt”) as well as a label file (“labels.txt”) in the same directory as the output_file . Note that the config file is NOT a complete configuration file and requires the user to update the sample config files in DeepStream with the parameters generated.

--gpu_index : The index of the (discrete) GPU for exporting the model if the machine has multiple GPUs installed. Note that export can only run on a single GPU.

--log_file : The path to the log file. The default path is stdout .

-h, --help : Show this help message and exit.

Note MaskRCNN does not support QAT.

Here’s a sample command to export a MaskRCNN model in INT8 mode: