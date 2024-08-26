The UNet model application in the TAO Toolkit includes an export sub-task to export and prepare a trained UNet model for Deploying to DeepStream. The export sub-task optionally generates the calibration cache for TensorRT INT8 engine calibration.

Exporting the model decouples the training process from deployment and allows conversion to TensorRT engines outside the TAO environment. TensorRT engines are specific to each hardware configuration and should be generated for each unique inference environment. This may be interchangeably referred to as the .trt or .engine file. The same exported TAO model may be used universally across training and deployment hardware. This is referred to as the .etlt file, or encrypted TAO file. During model export, the TAO model is encrypted with a private key. This key is required when you deploy this model for inference.

TensorRT engines can be generated in INT8 mode to run with lower precision, and thus improve performance. This process requires a cache file that contains scale factors for the tensors to help combat quantization errors, which may arise due to low-precision arithmetic. The calibration cache is generated using a calibration tensorfile when export is run with the --data_type flag set to int8 . Pre-generating the calibration information and caching it removes the need for calibrating the model on the inference machine. Moving the calibration cache is usually much more convenient than moving the calibration tensorfile since it is a much smaller file and can be moved with the exported model. Using the calibration cache also speeds up engine creation as building the cache can take several minutes to generate depending on the size of the Tensorfile and the model itself.

The export tool can generate an INT8 calibration cache by ingesting training data. You will need to point the tool to a directory of images to use for calibrating the model. You will also need to create a sub-sampled directory of random images that best represent your training dataset.

The calibration.bin is only required if you need to run inference at INT8 precision. For FP16/FP32 based inference, the export step is much simpler. All that is required is to provide a model from the train step to export to convert into an encrypted TAO model.

Here’s an example of the command line arguments for the export command:

Copy Copied! tao model unet export [-h] -m </path/to the .tlt model file generated by tao model train> -k <key> -e </path/to/experiment/spec_file> [-o </path/to/output/file>] [-s <strict_type_constraints>] [--cal_data_file </path/to/tensor/file>] [--cal_image_dir </path/to/the/directory/images/to/calibrate/the/model] [--cal_cache_file </path/to/output/calibration/file>] [--data_type <Data type for the TensorRT backend during export>] [--batches <Number of batches to calibrate over>] [--max_batch_size <maximum trt batch size>] [--max_workspace_size <maximum workspace size] [--batch_size <batch size to TensorRT engine>] [--engine_file </path/to/the/TensorRT/engine_file>] [--gen_ds_config] <Flag to generate ds config and label file>] [--verbose Verbosity of the logger]

Required Arguments

-m, --model : The path to the .tlt model file to be exported using export .

-k, --key : The key used to save the .tlt model file.

-e, --experiment_spec : The path to the spec file.

Optional Arguments

-o, --output_file : The path to save the exported model to. The default path is ./<input_file>.etlt .

--data_type : The engine data type for generating calibration cache if in INT8 mode. The options are fp32 , fp16 , and int8 . The default value is fp32 . If using int8, the int8 argument is required.

--gen_ds_config : A Boolean flag indicating whether to generate the template DeepStream related configuration (“nvinfer_config.txt”) as well as a label file (“labels.txt”) in the same directory as the output_file . Note that the config file is NOT a complete configuration file and requires the user to update the sample config files in DeepStream with the parameters generated.

-s, --strict_type_constraints : A Boolean flag to indicate whether or not to apply the TensorRT strict_type_constraints when building the TensorRT engine. Note this is only for applying the strict type of INT8 mode.

--cal_data_file : The output file used with --cal_image_dir .

--cal_image_dir : The directory of images to use for calibration.

Note If a valid path is provided to the --cal_data_file argument over the command line, the export tool produces an intermediate TensorFile for re-use from random batches of images in the --cal_image_dir directory of images . This tensorfile is used for calibration. If --cal_image_dir is not provided, random input tensors are used for calibration. The number of batches in the generated tensorfile is obtained from the value set to the --batches parameter, and the batch_size is obtained from the value set to the --batch_size parameter. Ensure that the directory mentioned in --cal_image_dir has at least batch_size * batches number of images in it. The valid image extensions are “.jpg”, “.jpeg”, and “.png”. In this case, the input_dimensions of the calibration tensors are derived from the input layer of the .tlt model.





--cal_cache_file : The path to save the calibration cache file. The default value is ./cal.bin .

--batches : The number of batches to use for calibration and inference testing. The default value is 10.

--batch_size : The batch size to use for calibration. The default value is 8.

--max_batch_size : The maximum batch size of the TensorRT engine. The default value is 1.

--min_batch_size : The minimum batch size of the TensorRT engine. The default value is 1.

--opt_batch_size : The optimum batch size of the TensorRT engine. The default value is 1.

--max_workspace_size : The maximum workspace size of the TensorRT engine. The default value is 1073741824 = 1<<30

--experiment_spec : The experiment_spec for training/inference/evaluation.

--engine_file : The path to the serialized TensorRT engine file. Note that this file is hardware specific and cannot be generalized across GPUs. The engine file allows you to quickly test your model accuracy using TensorRT on the host. Since a TensorRT engine file is hardware specific, you cannot use an engine file for deployment unless the deployment GPU is identical to the training GPU.

--force_ptq : A Boolean flag to force post-training quantization on the exported .etlt model.

Note When exporting a model that was trained with QAT enabled, the tensor scale factors for calibrating the activations are peeled out of the model and serialized to a TensorRT-readable cache file defined by the cal_cache_file argument. However, the current version of QAT doesn’t natively support DLA int8 deployment on Jetson. To deploy this model on Jetson with DLA int8 , use the --force_ptq flag to use TensorRT post-training quantization to generate the calibration cache file.





Here’s a sample command using the --cal_image_dir option for a UNet model.

Copy Copied! tao model unet export -m $USER_EXPERIMENT_DIR/unet/model.tlt -o $USER_EXPERIMENT_DIR/unet/model.int8.etlt -e $SPECS_DIR/unet_train_spec.txt --key $KEY --cal_image_dir $USER_EXPERIMENT_DIR/data/isbi/images/val --data_type int8 --batch_size 8 --batches 10 --cal_data_file $USER_EXPERIMENT_DIR/export/isbi_cal_data_file.txt --cal_cache_file $USER_EXPERIMENT_DIR/export/isbi_cal.bin --engine_file $USER_EXPERIMENT_DIR/export/int8.isbi.engine



