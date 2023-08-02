TAO Toolkit provides a utility for exporting a trained model to an encrypted onnx format or a TensorRT deployable engine format.

The export sub-task optionally generates the calibration cache for TensorRT INT8 engine calibration.

Exporting the model decouples the training process from deployment and allows for conversion to TensorRT engines outside the TAO environment. TensorRT engines are specific to each hardware configuration and should be generated for each unique inference environment. This may be interchangeably referred to as a .trt or .engine file. The same exported TAO model may be used universally across training and deployment hardware. This is referred to as the .etlt file, or encrypted TAO file. During model export, the TAO model is encrypted with a private key, which is required when you deploy this model for inference.

TensorRT engines can be generated in INT8 mode to run with lower precision, thus improving performance. This process requires a cache file that contains scale factors for the tensors to help combat quantization errors, which may arise due to low-precision arithmetic. The calibration cache is generated using a calibration tensorfile when export is run with the --data_type flag set to int8 . Pre-generating the calibration information and caching it removes the need for calibrating the model on the inference machine. Moving the calibration cache is usually much more convenient than moving the calibration tensorfile since it is a much smaller file and can be moved with the exported model. Using the calibration cache also speeds up engine creation, as building the cache can take several minutes to generate depending on the size of the Tensorfile and the model itself.

The export tool can generate an INT8 calibration cache by ingesting a sampled subset of training data. You need to create a sub-sampled directory of random images that best represent your test dataset. We recommend using at least 10-20% of the training data. The more data provided during calibration, the closer int8 inferences are to fp32 inferences. A helper script is provided with the sample notebook to select the subset data from the given training data.

Based on the evaluation results of the INT8 model, you might need to adjust the number of sampled images or the kind of selected to images to better represent the test dataset. You can also use a portion of data from the test data for calibration to improve the results.

The calibration.bin is only required if you need to run inference at INT8 precision. For FP16/FP32 based inference, the export step is much simpler. All that is required is to provide a model from the train step to export to convert it into an encrypted TAO model.

Copy Copied! tao fpenet export -m <Trained TAO Model Path> -k <Encode Key> -o <Output file .etlt>

-m : The path to the trained model to be exported

-k : The encryption key for model loading

-o : The path to the output .etlt file ( .etlt is appended to model path otherwise)

-t : The target opset value for onnx conversion. The default value is 10

--cal_data_file : The path to the calibration data file ( .tensorfile )

--cal_image_dir The path to a directory with calibration image samples

--cal_cache_file The path to the calibration file (.bin)

--data_type : The data type for the TensorRT export. The options are fp32 and int8 .

--batches : The number of images per batch. The default value is 1.

--max_batch_size : The maximum batch size for the TensorRT engine builder. The default value is 1.

--max_workspace_size : The maximum workspace size to be set for the TensorRT engine builder

--batch_size : The number of batches to calibrate over. The default value is 1.

--engine_file : The path to the exported TRT engine. Generates an engine file if specified.

--input_dims : Input dims in channels first(CHW) or channels last (HWC) format as comma separated integer values. Default 1,80,80.

--backend : The model type to export to.

--cal_image_dir : The directory of images that is preprocessed and used for calibration.

--cal_data_file : The tensorfile generated using images in cal_image_dir for calibrating the engine. If this already exists, it is directly used to calibrate the engine. The INT8 tensorfile is a binary file that contains the preprocessed training samples.

Note The --cal_image_dir parameter applies the necessary preprocessing to generate a tensorfile at the path mentioned in the --cal_data_file parameter, which is in turn used for calibration. The number of generated batches in the tensorfile is obtained from the value set to the --batches parameter, and the batch_size is obtained from the value set to the --batch_size parameter. Ensure that the directory mentioned in --cal_image_dir has at least batch_size * batches number of images in it. The valid image extensions are .jpg , .jpeg , and .png .





--cal_cache_file : The path to save the calibration cache file to. The default value is ./cal.bin . If this file already exists, the calibration step is skipped.

--batches : The number of batches to use for calibration and inference testing. The default value is 10.

--batch_size : The batch size to use for calibration. The default value is 1.

--max_batch_size : The maximum batch size of the TensorRT engine. The default value is 1.

--max_workspace_size : The maximum workspace size of the TensorRT engine. The default value is 2 * (1 << 30) .

--experiment_spec : The experiment_spec used for training. This argument is used to obtain the parameters to preprocess the data used for calibration.

--engine_file : The path to the serialized TensorRT engine file. Note that this file is hardware specific and cannot be generalized across GPUs. Use this argument to quickly test your model accuracy using TensorRT on the host. As the TensorRT engine file is hardware specific, you cannot use this engine file for deployment unless the deployment GPU is identical to the training GPU.

The pretrained model for FPENet provided through NGC is available by default with DeepStream 6.0.

For more details, refer to DeepStream TAO Integration for FPENet.