NVIDIA TAO Toolkit v4.0
NVIDIA TAO Release tlt.40

Mask RCNN with TAO Deploy

Mask RCNN etlt file generated from tao export is taken as an input to tao-deploy to generate optimized TensorRT engine. For more information about training the Mask RCNN, please refer to Mask RCNN training documentation.

Same spec file can be used as the tao mask_rcnn export command.

Use the following command to run Mask RCNN engine generation:

Copy
Copied!
            

tao-deploy mask_rcnn gen_trt_engine [-h] [-v] -m MODEL_PATH -k KEY -e EXPERIMENT_SPEC [--data_type {fp32,fp16,int8}] [--engine_file ENGINE_FILE] [--cal_image_dir CAL_IMAGE_DIR] [--cal_data_file CAL_DATA_FILE] [--cal_cache_file CAL_CACHE_FILE] [--cal_json_file CAL_JSON_FILE] [--max_batch_size MAX_BATCH_SIZE] [--min_batch_size MIN_BATCH_SIZE] [--opt_batch_size OPT_BATCH_SIZE] [--batch_size BATCH_SIZE] [--batches BATCHES] [--max_workspace_size MAX_WORKSPACE_SIZE] [-s STRICT_TYPE_CONSTRAINTS] [--force_ptq FORCE_PTQ] [--gpu_index GPU_INDEX] [--log_file LOG_FILE]

Required Arguments

  • -m, --model_path: The .etlt model to be converted.

  • -e, --experiment_spec: The experiment spec file to set up the TensorRT engine generation. This should be the same as the export specification file.

  • -k, --key: A user-specific encoding key to load a .etlt model.

Optional Arguments

  • -h, --help: Show this help message and exit.

  • --data_type: The desired engine data type. The options are fp32, fp16, int8. The default value is fp32. A calibration cache will be generated in INT8 mode. If using INT8, the following INT8 arguments are required.

  • --engine_file: Path to the serialized TensorRT engine file. Note that this file is hardware specific, and cannot be generalized across GPUs. As TensorRT engine file is hardware specific, you cannot use this engine file for deployment unless the deployment GPU is identical to training GPU.

  • -s, --strict_type_constraints: A Boolean flag indicating whether to apply the TensorRT strict type constraints when building the TensorRT engine.

  • --gpu_index: The index of (discrete) GPUs used for exporting the model. You can specify the index of the GPU to run export if the machine has multiple GPUs installed. Note that gen_trt_engine can only run on a single GPU.

  • --log_file: The path to the log file. The default path is “stdout”.

INT8 Engine Generation Required Arguments

  • --cal_data_file: Tensorfile generated for calibrating the engine. This can also be an output file if used with --cal_image_dir.

  • --cal_image_dir: Directory of images to use for calibration.

Note

--cal_image_dir parameter for images and applies the necessary preprocessing to generate a tensorfile at the path mentioned in the --cal_data_file parameter, which is in turn used for calibration. The number of batches in the tensorfile generated is obtained from the value set to the --batches parameter, and the batch_size is obtained from the value set to the --batch_size parameter. Be sure that the directory mentioned in --cal_image_dir has at least batch_size * batches number of images in it. The valid image extensions are .jpg, .jpeg, and .png. In this case, the input_dimensions of the calibration tensors are derived from the input layer of the .etlt model.


INT8 Engine Generation Optional Arguments

  • --cal_cache_file: The path to save the calibration cache file to. The default value is ./cal.bin.

  • --cal_json_file: The path to the json file containing tensor scale for QAT models. This argument is required if an engine for QAT model is being generated.

  • --batches: Number of batches to use for calibration. The default value is 10.

  • --batch_size: Batch size to use for calibration. The default value is 1.

  • --max_batch_size: Maximum batch size of TensorRT engine. The default value is 1.

  • --min_batch_size: Minimum batch size of TensorRT engine. The default value is 1.

  • --opt_batch_size: Optimal batch size of TensorRT engine. The default value is 1.

  • --max_workspace_size: Maximum workspace size in Gb of TensorRT engine. The default value is: (2 Gb).

  • --force_ptq: A boolean flag to force post training quantization on the exported etlt model.

Note

When generating TensorRT engine for a model trained with QAT enabled, the tensor scale factors defined by the cal_cache_file argument is required. However, note that the current version of QAT doesn’t natively support DLA int8 deployment in the Jetson. In order to deploy this model on a Jetson with DLA int8, use the --force_ptq flag to use TensorRT post training quantization to generate the calibration cache file.


Sample Usage

Here’s an example of using the gen_trt_engine command to generate INT8 TensorRT engine:

Copy
Copied!
            

tao-deploy mask_rcnn gen_trt_engine -m /workspace/mrcnn.etlt \ -e /workspace/default_spec.txt \ -k $KEY \ --cal_image_dir /workspace/raw-data/val2017 \ --data_type int8 \ --batch_size 8 \ --batches 10 \ --cal_cache_file /export/cal.bin \ --cal_data_file /export/cal.tensorfile \ --engine_file /export/int8.engine


Batch size used for evaluation will be same as --max_batch_size used during engine generation. Label file will be derived from dataset_config.val_json_file from the spec file. Same spec file as TAO evaluation spec file. Sample spec file:

Copy
Copied!
            

data_config{ image_size: "(832, 1344)" augment_input_data: True eval_samples: 500 training_file_pattern: "/workspace/tao-experiments/data/train*.tfrecord" validation_file_pattern: "/workspace/tao-experiments/data/val*.tfrecord" val_json_file: "/workspace/tao-experiments/data/raw-data/annotations/instances_val2017.json" # dataset specific parameters num_classes: 91 skip_crowd_during_training: True } maskrcnn_config { nlayers: 50 arch: "resnet" freeze_bn: True freeze_blocks: "[0,1]" gt_mask_size: 112 # Region Proposal Network rpn_positive_overlap: 0.7 rpn_negative_overlap: 0.3 rpn_batch_size_per_im: 256 rpn_fg_fraction: 0.5 rpn_min_size: 0. # Proposal layer. batch_size_per_im: 512 fg_fraction: 0.25 fg_thresh: 0.5 bg_thresh_hi: 0.5 bg_thresh_lo: 0. # Faster-RCNN heads. fast_rcnn_mlp_head_dim: 1024 bbox_reg_weights: "(10., 10., 5., 5.)" # Mask-RCNN heads. include_mask: True mrcnn_resolution: 28 # training train_rpn_pre_nms_topn: 2000 train_rpn_post_nms_topn: 1000 train_rpn_nms_threshold: 0.7 # evaluation test_detections_per_image: 100 test_nms: 0.5 test_rpn_pre_nms_topn: 1000 test_rpn_post_nms_topn: 1000 test_rpn_nms_thresh: 0.7 # model architecture min_level: 2 max_level: 6 num_scales: 1 aspect_ratios: "[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]" anchor_scale: 8 # localization loss rpn_box_loss_weight: 1.0 fast_rcnn_box_loss_weight: 1.0 mrcnn_weight_loss_mask: 1.0 }

Use the following command to run Mask RCNN engine evaluation:

Copy
Copied!
            

tao-deploy mask_rcnn evaluate [-h] -e EXPERIMENT_SPEC -m MODEL_PATH [-i IMAGE_DIR] [-r RESULTS_DIR] [--gpu_index GPU_INDEX] [--log_file LOG_FILE]

Required Arguments

  • -e, --experiment_spec: The experiment spec file for evaluation. This should be the same as the tao evaluate specification file.

  • -m, --model_path: The engine file to run evaluation.

  • -i, --image_dir: The directory where test images are located.

  • -r, --results_dir: The directory where evaluation results will be stored.

Sample Usage

Here’s an example of using the evaluate command to run evaluation with the TensorRT engine:

Copy
Copied!
            

tao-deploy mask_rcnn evaluate -m /export/int8.engine \ -e /workspace/default_spec.txt \ -i /workspace/raw-data/val2017 \ -r /workspace/tao-experiments/evaluate


Copy
Copied!
            

tao-deploy mask_rcnn inference [-h] -e EXPERIMENT_SPEC -m MODEL_PATH [-i IMAGE_DIR] [-b BATCH_SIZE] [-r RESULTS_DIR] [--gpu_index GPU_INDEX] [--log_file LOG_FILE]

Required Arguments

  • -e, --experiment_spec: The experiment spec file for evaluation. This should be the same as the tao evaluate specification file.

  • -m, --model_path: The engine file to run evaluation.

  • -i, --image_dir: The directory where test images are located.

  • -r, --results_dir: The directory where evaluation results will be stored.

  • -b, --batch_size: The batch size used for evaluation. Note that this value can not be larger than --max_batch_size used during the engine generation. If not specified, --max_batch_size will be used instead.

Sample Usage

Batch size used for inference will be same as --max_batch_size used during engine generation. Here’s an example of using the inference command to run inference with the TensorRT engine:

Copy
Copied!
            

tao-deploy mask_rcnn inference -m /export/int8.engine \ -e /workspace/default_spec.txt \ -i /workspace/raw-data/val2017 \ -r /workspace/tao-experiments/inference

The visualization will be stored under $RESULTS_DIR/images_annotated and COCO format predictions will be stored under $RESULTS_DIR/labels.

© Copyright 2022, NVIDIA.. Last updated on Mar 23, 2023.