The BodyPoseNet model application in the TAO Toolkit includes an export sub-task to export and prepare a trained BodyPoseNet model for verification and deployment. The export sub-task optionally generates the calibration cache for TensorRT INT8 engine calibration.

Exporting the model decouples the training process from deployment and allows conversion to TensorRT engines outside the TAO environment. TensorRT engines are specific to each hardware configuration and should be generated for each unique inference environment. This may be interchangeably referred to as a .trt or .engine file. The same exported TAO model may be used universally across training and deployment hardware. This is referred to as the .etlt file, or encrypted TAO file. During model export, the TAO model is encrypted with a private key, which is required when you deploy this model for inference.

The network input resolution of the model is one of the major factors that determine the accuracy of bottom-up approaches. Bottom-up methods have to feed the whole image at once, resulting in smaller resolution per person. Hence, higher-input resolution will yield better accuracy, especially on small- and medium-scale persons (w.r.t the image scale). Also note that with higher input resolution, the runtime of the CNN will also be higher. Therefore, the accuracy/runtime tradeoff should be decided based on the accuracy and runtime requirements for the target usecase.

Height of the desired network

You will need to choose a resolution that works best depending on the target use case and the compute or latency constraints. If your application involves pose estimation for one or more persons close to the camera, such that the scale of the person is relatively large, then you can go with a smaller network input resolution. Whereas if you are targeting use for persons with smaller relative scales, as with crowded scenes, you might want to go with a higher network input resolution. For instance, if your application has a person with height of about 25% of the image, the final resized height would be as follows:

56px for network height of 224

72px for network height of 288

80px for network height of 320

The network with 320 height has maximum resolution for the person and hence would be more accurate.

Width of the desired network

Once you freeze the height of the network, the width can be decided based on the aspect ratio for your input data used during deployment time. Or you can also follow a standard multiple of 32/64 closest to the aspect ratio.

Illustration of accuracy/runtime variation for different resolutions

Note that these are approximate runtimes/accuracies for the default architecture and train_spec . Any changes to the architecture or params will yield different results. This is primarily to get a better sense of which resolution would suit your needs. The runtimes provided are for the CNN

Input Resolution Precision Runtime (GeForce RTX 2080) Runtime (Jetson AGX) 320x448 FP16 3.13ms 18.8ms 288x384 FP16 2.58ms 12.8ms 224x320 FP16 2.27ms 10.1ms 320x448 INT8 1.80ms 8.90ms 288x384 INT8 1.56ms 6.38ms 224x320 INT8 1.33ms 5.07ms

You can expect to see a 7-10% mAP increase in the area=medium category when going from 224x320 to 288x384, and an additional 7-10% mAP when you go to 320x448. The accuracy for area=large remains almost the same across these resolutions, so you can stick to a lower resolution if this is what you need. As per COCO keypoint evaluation, the medium area is defined as persons occupying less than the area between 36^2 to 96^2. Anything above it is categorized as large.

Note The height and width should be a multiple of 8. Preferably, a multiple of 16/32/64





TensorRT engines can be generated in INT8 mode to run with lower precision, and thus improve performance. This process requires a cache file that contains scale factors for the tensors to help combat quantization errors, which may arise due to low-precision arithmetic. The calibration cache is generated using a calibration tensorfile when export is run with the --data_type flag set to int8 . Pre-generating the calibration information and caching it removes the need for calibrating the model on the inference machine. Moving the calibration cache is usually much more convenient than moving the calibration tensorfile since it is a much smaller file and can be moved with the exported model. Using the calibration cache also speeds up engine creation, as building the cache can take several minutes to generate depending on the size of the Tensorfile and the model itself.

The export tool can generate an INT8 calibration cache by ingesting a sampled subset of training data. You need to create a sub-sampled directory of random images that best represent your test dataset. We recommend using at least 10-20% of the training data. The more data provided during calibration, the closer int8 inferences are to fp32 inferences. A helper script is provided with the sample notebook to select the subset data from the given training data based on several criteria, like minimum number of persons in the image, minimum number of keypoints per person, etc.

Based on the evaluation results of the INT8 model, you might need to adjust the number of sampled images or the kind of selected to images to better represent test dataset. You can also use a portion of data from the test data for calibration to improve the results.

The calibration.bin is only required if you need to run inference at INT8 precision. For FP16/FP32 based inference, the export step is much simpler. All that is required is to provide a model from the train step to export to convert it into an encrypted TAO model.

The following are command line arguments for the export command:

Copy Copied! tao model bpnet export [-h] -m <path to the .tlt model file generated by tao train> -k <key> [-o <path to output file>] [--cal_data_file <path to tensor file>] [--cal_image_dir <path to the directory images to calibrate the model] [--cal_cache_file <path to output calibration file>] [--data_type <Data type for the TensorRT backend during export>] [--batches <Number of batches to calibrate over>] [--max_batch_size <maximum trt batch size>] [--max_workspace_size <maximum workspace size] [--batch_size <batch size to TensorRT engine>] [--experiment_spec <path to experiment spec file>] [--engine_file <path to the TensorRT engine file>] [--verbose Verbosity of the logger] [--input_dims Input dimensions to use for network] [--backend Intermediate model type to export to] [--force_ptq Flag to force PTQ]

Required Arguments

-m, --model : The path to the .tlt model file to be exported using export

-k, --key : The key used to save the .tlt model file

-t, --backend : The backend type used to convert to .etlt model file.

Note Currently, only tfonnx is supported as backend . Please do not use onnx or uff .





Optional Arguments

-o, --output_file : The path to save the exported model to. The default path is <input_file>.etlt .

--e, -experiment_spec : The experiment_spec used for training.

--data_type : The desired engine data type. The options are fp32 , fp16 , and int8 . A calibration cache will be generated in int8 mode. The default value is fp32 . If using int8 mode, the following INT8 arguments are required.

-s, --strict_type_constraints : A Boolean flag to indicate whether or not to apply the TensorRT strict_type_constraints when building the TensorRT engine. Note this is only for applying the strict type of int8 mode.

--cal_image_dir : The directory of images that is preprocessed and used for calibration.

--cal_data_file : The tensorfile generated using images in cal_image_dir for calibrating the engine. If this already exists, it is directly used to calibrate the engine. The INT8 tensorfile is a binary file that contains the preprocessed training samples.

Note The --cal_image_dir parameter applies the necessary preprocessing to generate a tensorfile at the path mentioned in the --cal_data_file parameter, which is in turn used for calibration. The number of generated batches in the tensorfile is obtained from the value set to the --batches parameter, and the batch_size is obtained from the value set to the --batch_size parameter. Ensure that the directory mentioned in --cal_image_dir has at least batch_size * batches number of images in it. The valid image extensions are .jpg , .jpeg , and .png .





--cal_cache_file : The path to save the calibration cache file to. The default value is ./cal.bin . If this file already exists, the calibration step is skipped.

--batches : The number of batches to use for calibration and inference testing. The default value is 10.

--batch_size : The batch size to use for calibration. The default value is 1.

--max_batch_size : The maximum batch size of the TensorRT engine. The default value is 1.

--max_workspace_size : The maximum workspace size of the TensorRT engine. The default value is 2 * (1 << 30) .

--experiment_spec : The experiment_spec used for training. This argument is used to obtain the parameters to preprocess the data used for calibration.

--engine_file : The path to the serialized TensorRT engine file. Note that this file is hardware specific and cannot be generalized across GPUs. Use this argument to quickly test your model accuracy using TensorRT on the host. As the TensorRT engine file is hardware specific, you cannot use this engine file for deployment unless the deployment GPU is identical to the training GPU.

--force_ptq : A Boolean flag to force post-training quantization on the exported .etlt model.

The following is a sample command to export a BodyPoseNet model in INT8 mode. This command shows usage of the --cal_image_dir option for a BodyPoseNet model calibration.

Copy Copied! # Export `.etlt` model, Calibrate model and Convert to TensorRT engine (INT8). tao model bpnet export -m /workspace/tao-experiments/bpnet/models/exp_m1_retrain/bpnet_model.tlt -o /workspace/tao-experiments/bpnet/models/exp_m1_final/bpnet_model.etlt -k $KEY -d $IN_HEIGHT,$IN_WIDTH,$IN_CHANNELS -e $SPECS_DIR/bpnet_retrain_m1_coco.txt -t tfonnx --data_type int8 --cal_image_dir /workspace/tao-experiments/bpnet/data/train2017/ --cal_cache_file /workspace/tao-experiments/bpnet/models/exp_m1_final/calibration.$IN_HEIGHT.$IN_WIDTH.bin --cal_data_file /workspace/tao-experiments/bpnet/models/exp_m1_final/coco.$IN_HEIGHT.$IN_WIDTH.tensorfile --batch_size 1 --batches 5000 --max_batch_size 1 --data_format channels_last --engine_file /workspace/tao-experiments/bpnet/models/exp_m1_final/bpnet_model.$IN_HEIGHT.$IN_WIDTH.int8.engine

The following is a sample command to export a BodyPoseNet model in INT8 mode:

Copy Copied! # Export `.etlt` model and Convert to TensorRT engine (FP16). tao model bpnet export -m /workspace/tao-experiments/bpnet/models/exp_m1_retrain/bpnet_model.tlt -o /workspace/tao-experiments/bpnet/models/exp_m1_final/bpnet_model.etlt -k $KEY -d $IN_HEIGHT,$IN_WIDTH,$IN_CHANNELS -e $SPECS_DIR/bpnet_retrain_m1_coco.txt -t tfonnx --data_type fp16 --batch_size 1 --max_batch_size 1 --data_format channels_last --engine_file /workspace/tao-experiments/bpnet/models/exp_m1_final/bpnet_model.$IN_HEIGHT.$IN_WIDTH.fp16.engine





Evaluating the exporter TRT .engine is similar to evaluating .tlt .

Follow the instructions as described in the Create an Inference Specification File section to create the infer_spec file. Note that the adjust_network_input mode in keep_aspect_ratio_mode is not supported for the exported TRT model, so pad_image_input (Strict mode) should be used instead. Follow the instructions in the Evaluate the Model section to evaluate the TRT model.

You can run evaluation of the .tlt model in strict mode as well to compare with the accuracies of the INT8/FP16/FP32 models for any drop in accuracy. The FP16/FP32 models should have little or no drop in accuracy when compared to the .tlt model in this step. The INT8 models would have similar accuracies (or comparable within a 2-3% mAP range) to the .tlt model.

If the accuracy of the INT8 model seems to degrade significantly compared to the corresponding FP16 version, it could be caused by the following:

There wasn’t enough data in the calibration tensorfile used to calibrate the model.

The training data is not entirely representative of your test images, and the calibration may be incorrect. Therefore, you may either regenerate the calibration tensorfile with more batches of the training data and recalibrate the model, or add a portion of data from the test set.

Note This evaluation is mainly used as a sanity check for the exported TRT (INT8/FP16) models. This is done in strict mode and hence doesn’t reflect the true accuracy of the model, as the input aspect ratio can vary a lot from the aspect ratio of the images in the test set. For a dataset like COCO, there might be a collection of images with various resolutions. Here, you retain a strict input resolution and padsthe image to retrain the aspect ratio. So the accuracy here might vary based on the aspect ratio and network resolution you choose.





Once the INT8/FP16/FP32 model is verified, you need to re-export the model so it can be used to run on inference platforms like Deepstream. You will use the same guidelines as in the Exporting the Model section, but you need to add the --sdk_compatible_model flag to the export command, which adds a few non-traininable post-process layers to the model to enable compatibility with the inference pipelines. You should re-use the calibration tensorfile ( --cal_data_file ) generated in the previous step to keep it consistent, but you will need to regenerate the cal_cache_file and the .etlt model.

The following is a sample command to export a BodyPoseNet model in INT8 mode (similar to previous section), which can be deployed in the inference pipelines.

Copy Copied! tao model bpnet export -m /workspace/tao-experiments/bpnet/models/exp_m1_retrain/bpnet_model.tlt -o /workspace/tao-experiments/bpnet/models/exp_m1_final/bpnet_model.deploy.etlt -k $KEY -d $IN_HEIGHT,$IN_WIDTH,$IN_CHANNELS -e $SPECS_DIR/bpnet_retrain_m1_coco.txt -t tfonnx --data_type int8 --cal_image_dir /workspace/tao-experiments/bpnet/data/train2017/ --cal_cache_file /workspace/tao-experiments/bpnet/models/exp_m1_final/calibration.$IN_HEIGHT.$IN_WIDTH.deploy.bin --cal_data_file /workspace/tao-experiments/bpnet/models/exp_m1_final/coco.$IN_HEIGHT.$IN_WIDTH.tensorfile --batch_size 1 --batches 5000 --max_batch_size 1 --data_format channels_last --engine_file /workspace/tao-experiments/bpnet/models/exp_m1_final/bpnet_model.$IN_HEIGHT.$IN_WIDTH.int8.deploy.engine --sdk_compatible_model

Note The above exported model will not work with the bpnet inference / evaluate tools. This is for deployment only. For inference and evaluation, use the TRT model exported without --sdk_compatible_model .



