MaskRCNN -------- .. _maskrcnn: MaskRCNN supports the following tasks: * train * evaluate * inference * export These tasks may be invoked from the TLT launcher using the following convention on the command line: .. code:: tlt mask_rcnn where :code:`args_per_subtask` are the command-line arguments required for a given subtask. Each of these subtasks are explained in detail below. Creating a Configuration File ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. _creating_a_configuration_file_maskrcnn: Below is a sample MaskRCNN spec file. It has three major components: top level experiment configs, :code:`data_config`, and :code:`maskrcnn_config`, explained below in detail. The format of the spec file is a protobuf text (prototxt) message and each of its fields can be either a basic data type or a nested message. The top level structure of the spec file is summarized in the table below. Here's a sample of the MaskRCNN spec file: .. code:: seed: 123 use_amp: False warmup_steps: 0 checkpoint: "/workspace/tlt-experiments/maskrcnn/pretrained_resnet50/tlt_instance_segmentation_vresnet50/resnet50.hdf5" learning_rate_steps: "[60000, 80000, 100000]" learning_rate_decay_levels: "[0.1, 0.02, 0.002]" total_steps: 120000 train_batch_size: 2 eval_batch_size: 4 num_steps_per_eval: 10000 momentum: 0.9 l2_weight_decay: 0.0001 warmup_learning_rate: 0.0001 init_learning_rate: 0.02 data_config{ image_size: "(832, 1344)" augment_input_data: True eval_samples: 500 training_file_pattern: "/workspace/tlt-experiments/data/train*.tfrecord" validation_file_pattern: "/workspace/tlt-experiments/data/val*.tfrecord" val_json_file: "/workspace/tlt-experiments/data/annotations/instances_val2017.json" # dataset specific parameters num_classes: 91 skip_crowd_during_training: True } maskrcnn_config { nlayers: 50 arch: "resnet" freeze_bn: True freeze_blocks: "[0,1]" gt_mask_size: 112 # Region Proposal Network rpn_positive_overlap: 0.7 rpn_negative_overlap: 0.3 rpn_batch_size_per_im: 256 rpn_fg_fraction: 0.5 rpn_min_size: 0. # Proposal layer. batch_size_per_im: 512 fg_fraction: 0.25 fg_thresh: 0.5 bg_thresh_hi: 0.5 bg_thresh_lo: 0. # Faster-RCNN heads. fast_rcnn_mlp_head_dim: 1024 bbox_reg_weights: "(10., 10., 5., 5.)" # Mask-RCNN heads. include_mask: True mrcnn_resolution: 28 # training train_rpn_pre_nms_topn: 2000 train_rpn_post_nms_topn: 1000 train_rpn_nms_threshold: 0.7 # evaluation test_detections_per_image: 100 test_nms: 0.5 test_rpn_pre_nms_topn: 1000 test_rpn_post_nms_topn: 1000 test_rpn_nms_thresh: 0.7 # model architecture min_level: 2 max_level: 6 num_scales: 1 aspect_ratios: "[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]" anchor_scale: 8 # localization loss rpn_box_loss_weight: 1.0 fast_rcnn_box_loss_weight: 1.0 mrcnn_weight_loss_mask: 1.0 } +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | **Field** | **Description** | **Data Type and Constraints** | **Recommended/Typical Value** | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | seed | The random seed for the experiment | Unsigned int | 123 | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | warmup_steps | The steps taken for learning rate to ramp up to the init_learning_rate | Unsigned int | -- | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | warmup_learning_rate | The initial learning rate during the warmup phase | float | -- | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | learning_rate_steps | A list of steps at which the learning rate decays by the factor specified | string | -- | | | in learning_rate_decay_levels | | | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | learning_rate_decay_levels | A list of decay factors. The length should match the length of | string | -- | | | learning_rate_steps. | | | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | total_steps | The total number of training iterations | Unsigned int | -- | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | train_batch_size | The batch size during training | Unsigned int | 4 | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | eval_batch_size | The batch size during validation or evaluation | Unsigned int | 8 | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | num_steps_per_eval | Save a checkpoint and run evaluation every N steps. | Unsigned int | -- | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | momentum | Momentum of the SGD optimizer | float | 0.9 | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | l2_weight_decay | L2 weight decay | float | 0.0001 | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | use_amp | Specifies whether to use Automatic Mixed Precision training | boolean | False | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | checkpoint | The path to a pretrained model | string | -- | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | maskrcnn_config | The architecture of the model | message | -- | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | data_config | The input data configuration | message | -- | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ | skip_checkpoint_variables | If specified, the weights of the layers with matching regular expressions will | string | -- | | | not be loaded. This is especially helpful for transfer learning. | | | +----------------------------+--------------------------------------------------------------------------------+-------------------------------+-------------------------------+ .. Note:: When using :code:`skip_checkpoint_variables`, you can first find the model structure in the training log (Part of the MaskRCNN+ResNet50 model structure is shown below). If, for example, you want to retrain all prediction heads, you can set :code:`skip_checkpoint_variables` to “head”. TLT uses the Python re library to check whether “head” matches any layer name or :code:`re.search($skip_checkpoint_variables, $layer_name)`. .. code:: [MaskRCNN] INFO : ================ TRAINABLE VARIABLES ================== [MaskRCNN] INFO : [#0001] conv1/kernel:0 => (7, 7, 3, 64) [MaskRCNN] INFO : [#0002] bn_conv1/gamma:0 => (64,) [MaskRCNN] INFO : [#0003] bn_conv1/beta:0 => (64,) [MaskRCNN] INFO : [#0004] block_1a_conv_1/kernel:0 => (1, 1, 64, 64) [MaskRCNN] INFO : [#0005] block_1a_bn_1/gamma:0 => (64,) [MaskRCNN] INFO : [#0006] block_1a_bn_1/beta:0 => (64,) [MaskRCNN] INFO : [#0007] block_1a_conv_2/kernel:0 => (3, 3, 64, 64) [MaskRCNN] INFO : [#0008] block_1a_bn_2/gamma:0 => (64,) [MaskRCNN] INFO : [#0009] block_1a_bn_2/beta:0 => (64,) [MaskRCNN] INFO : [#0010] block_1a_conv_3/kernel:0 => (1, 1, 64, 256) [MaskRCNN] INFO : [#0011] block_1a_bn_3/gamma:0 => (256,) [MaskRCNN] INFO : [#0012] block_1a_bn_3/beta:0 => (256,) [MaskRCNN] INFO : [#0110] block_3d_bn_3/gamma:0 => (1024,) [MaskRCNN] INFO : [#0111] block_3d_bn_3/beta:0 => (1024,) [MaskRCNN] INFO : [#0112] block_3e_conv_1/kernel:0 => (1, 1, 1024, [MaskRCNN] INFO : [#0144] block_4b_bn_1/beta:0 => (512,) … … … … ... [MaskRCNN] INFO : [#0174] fpn/post_hoc_d5/kernel:0 => (3, 3, 256, 256) [MaskRCNN] INFO : [#0175] fpn/post_hoc_d5/bias:0 => (256,) [MaskRCNN] INFO : [#0176] rpn_head/rpn/kernel:0 => (3, 3, 256, 256) [MaskRCNN] INFO : [#0177] rpn_head/rpn/bias:0 => (256,) [MaskRCNN] INFO : [#0178] rpn_head/rpn-class/kernel:0 => (1, 1, 256, 3) [MaskRCNN] INFO : [#0179] rpn_head/rpn-class/bias:0 => (3,) [MaskRCNN] INFO : [#0180] rpn_head/rpn-box/kernel:0 => (1, 1, 256, 12) [MaskRCNN] INFO : [#0181] rpn_head/rpn-box/bias:0 => (12,) [MaskRCNN] INFO : [#0182] box_head/fc6/kernel:0 => (12544, 1024) [MaskRCNN] INFO : [#0183] box_head/fc6/bias:0 => (1024,) [MaskRCNN] INFO : [#0184] box_head/fc7/kernel:0 => (1024, 1024) [MaskRCNN] INFO : [#0185] box_head/fc7/bias:0 => (1024,) [MaskRCNN] INFO : [#0186] box_head/class-predict/kernel:0 => (1024, 91) [MaskRCNN] INFO : [#0187] box_head/class-predict/bias:0 => (91,) [MaskRCNN] INFO : [#0188] box_head/box-predict/kernel:0 => (1024, 364) [MaskRCNN] INFO : [#0189] box_head/box-predict/bias:0 => (364,) [MaskRCNN] INFO : [#0190] mask_head/mask-conv-l0/kernel:0 => (3, 3, 256, 256) [MaskRCNN] INFO : [#0191] mask_head/mask-conv-l0/bias:0 => (256,) [MaskRCNN] INFO : [#0192] mask_head/mask-conv-l1/kernel:0 => (3, 3, 256, 256) [MaskRCNN] INFO : [#0193] mask_head/mask-conv-l1/bias:0 => (256,) [MaskRCNN] INFO : [#0194] mask_head/mask-conv-l2/kernel:0 => (3, 3, 256, 256) [MaskRCNN] INFO : [#0195] mask_head/mask-conv-l2/bias:0 => (256,) [MaskRCNN] INFO : [#0196] mask_head/mask-conv-l3/kernel:0 => (3, 3, 256, 256) [MaskRCNN] INFO : [#0197] mask_head/mask-conv-l3/bias:0 => (256,) [MaskRCNN] INFO : [#0198] mask_head/conv5-mask/kernel:0 => (2, 2, 256, 256) [MaskRCNN] INFO : [#0199] mask_head/conv5-mask/bias:0 => (256,) [MaskRCNN] INFO : [#0200] mask_head/mask_fcn_logits/kernel:0 => (1, 1, 256, 91) [MaskRCNN] INFO : [#0201] mask_head/mask_fcn_logits/bias:0 => (91,) MaskRCNN Config *************** The MaskRCNN configuration (:code:`maskrcnn_config`) defines the model structure. This model is used for training, evaluation, and inference. A detailed description is included in the table below. Currently, MaskRCNN only supports ResNet10/18/34/50/101 as its backbone. +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | **Field** | **Description** | **Data Type and Constraints** | **Recommended/Typical Value** | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | nlayers | The number of layers in ResNet arch | message | 50 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | arch | The backbone feature extractor name | string | resnet | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | freeze_bn | Whether to freeze all BatchNorm layers in the backbone | boolean | False | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | freeze_blocks | A list of conv blocks in the backbone to freeze | string | -- | | | | | | | | | ResNet: For the ResNet series, the block IDs | | | | | valid for freezing are any subset of | | | | | [0, 1, 2, 3] (inclusive) | | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | gt_mask_size | The groundtruth mask size | Unsigned int | 112 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | rpn_positive_overlap | The lower-bound threshold to assign positive labels for anchors | float | 0.7 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | rpn_negative_overlap | The upper-bound threshold to assign negative labels for anchors | float | 0.3 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | rpn_batch_size_per_im | The number of sampled anchors per image in RPN | Unsigned int | 256 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | rpn_fg_fraction | The desired fraction of positive anchors in a batch | Unsigned int | 0.5 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | rpn_min_size | The minimum proposal height and width | | 0 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | batch_size_per_im | The RoI minibatch size per image | Unsigned int | 512 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | fg_fraction | The target fraction of RoI minibatch that is labeled as | float | 0.25 | | | foreground | | | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | fast_rcnn_mlp_head_dim | The Fast-RCNN classification head dimension | Unsigned int | 1024 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | bbox_reg_weights | The bounding-box regularization weights | string | “(10, 10, 5, 5)” | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | include_mask | Specifies whether to include a mask head | boolean | True | | | | | | | | | | (currently only True is supported) | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | mrcnn_resolution | The mask-head resolution | Unsigned int | 28 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | train_rpn_pre_nms_topn | The number of top-scoring RPN proposals to keep before applying | Unsigned int | 2000 | | | NMS (per FPN level) during training | | | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | train_rpn_post_nms_topn | The number of top-scoring RPN proposals to keep after applying NMS | Unsigned int | 1000 | | | (total number produced) during training | | | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | train_rpn_nms_threshold | The NMS IOU threshold in RPN during training | float | 0.7 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | test_detections_per_image | The number of bounding box candidates after NMS | Unsigned int | 100 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | test_nms | The NMS IOU threshold during test | float | 0.5 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | test_rpn_pre_nms_topn | The number of top-scoring RPN proposals to keep before applying NMS | Unsigned int | 1000 | | | (per FPN level) during test | | | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | test_rpn_post_nms_topn | The number of top scoring RPN proposals to keep after applying NMS | Unsigned int | 1000 | | | (total number produced) during test | | | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | test_rpn_nms_threshold | The NMS IOU threshold in RPN during test | float | 0.7 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | min_level | The minimum level of the output feature pyramid | Unsigned int | 2 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | max_level | The maximum level of the output feature pyramid | Unsigned int | 6 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | num_scales | The number of anchor octave scales on each pyramid level (e.g. if | Unsigned int | 1 | | | set to 3, the anchor scales are [2^0, 2^(1/3), 2^(2/3)]) | | | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | aspect_ratios | A list of tuples representing the aspect ratios of anchors on each | string | "[(1.0, 1.0), | | | pyramid level | | (1.4, 0.7), | | | | | (0.7, 1.4)]" | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | anchor_scale | Scale of the base-anchor size to the feature-pyramid stride | Unsigned int | 8 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | rpn_box_loss_weight | The weight for adjusting RPN box loss in the total loss | float | 1.0 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | fast_rcnn_box_loss_weight | The weight for adjusting FastRCNN box regression loss in the total | float | 1.0 | | | loss | | | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ | mrcnn_weight_loss_mask | The weight for adjusting mask loss in the total loss | float | 1.0 | +---------------------------+---------------------------------------------------------------------+--------------------------------------------------+------------------------------------+ .. Note:: The :code:`min_level`, :code:`max_level`, :code:`num_scales`, :code:`aspect_ratios`, and :code:`anchor_scale` are used to determine anchor generation for MaskRCNN. :code:`anchor_scale` is the base anchor scale, while :code:`min_level` and :code:`max_level` set the range of the scales on different feature maps. For example, the actual anchor scale for the feature map at :code:`min_level` will be `anchor_scale * 2^min_level` and the actual anchor scale for the feature map at :code:`max_level` will be `anchor_scale * 2^max_level`. And it will generate anchors of different :code:`aspect_ratios` based on the actual anchor scale. Data Config *********** The data configuration (:code:`data_config`) specifies the input data source and format. This is used for training, evaluation, and inference. A detailed description is summarized in the table below. +----------------------------+------------------------------------------------------------------+-------------------------------+-------------------------------+ | **Field** | **Description** | **Data Type and Constraints** | **Recommended/Typical Value** | +----------------------------+------------------------------------------------------------------+-------------------------------+-------------------------------+ | image_size | The image dimension as a tuple within quote marks. “(height, | string | “(832, 1344)” | | | width)” indicates the dimension of the resized and padded input. | | | +----------------------------+------------------------------------------------------------------+-------------------------------+-------------------------------+ | augment_input_data | Specifies whether to augment the data | boolean | True | +----------------------------+------------------------------------------------------------------+-------------------------------+-------------------------------+ | eval_samples | The number of samples for evaluation | Unsigned int | -- | +----------------------------+------------------------------------------------------------------+-------------------------------+-------------------------------+ | training_file_pattern | The TFRecord path for training | string | -- | +----------------------------+------------------------------------------------------------------+-------------------------------+-------------------------------+ | validation_file_pattern | The TFRecord path for validation | string | -- | +----------------------------+------------------------------------------------------------------+-------------------------------+-------------------------------+ | val_json_file | The annotation file path for validation | string | -- | +----------------------------+------------------------------------------------------------------+-------------------------------+-------------------------------+ | num_classes | The number of classes | Unsigned int | -- | +----------------------------+------------------------------------------------------------------+-------------------------------+-------------------------------+ | skip_crowd_during_training | Whether to skip crowd during training | boolean | True | +----------------------------+------------------------------------------------------------------+-------------------------------+-------------------------------+ Training the Model ^^^^^^^^^^^^^^^^^^ Train the MaskRCNN model using this command: .. code:: tlt mask_rcnn train [-h] -e -d -k [--gpus ] [--gpu_index ] [--log_file ] Required Arguments ****************** * :code:`-d, --model_dir`: The path to the folder where the experiment output is written. * :code:`-k, --key`: The encryption key to decrypt the model. * :code:`-e, --experiment_spec_file`: The experiment specification file to set up the evaluation. experiment. This should be the same as the training specification file. Optional Arguments ****************** * :code:`--gpus num_gpus`: The number of GPUs to use and processes to launch for training. The default value is 1. * :code:`--gpu_index`: The index of the (descrete) GPU for exporting the model if the machine has multiple GPUs installed. Note that export can only run on a single GPU. * :code:`--log_file`: The path to the log file. The default path is :code:`stdout`. * :code:`-h, --help`: Show this help message and exit. Sample Usage ************ Here's an example of using the :code:`train` command on a MaskRCNN model: .. code:: tlt mask_rcnn train --gpus 2 -e /path/to/spec.txt -d /path/to/result -k $KEY Evaluating the Model ^^^^^^^^^^^^^^^^^^^^ To run evaluation for a MaskRCNN model, use this command: .. code:: tlt mask_rcnn evaluate [-h] -e -m -k [--gpu_index ] [--log_file ] Required Arguments ****************** * :code:`-e, --experiment_spec_file`: The experiment spec file to set up the evaluation experiment. This should be the same as the training spec file. * :code:`-m, --model`: The path to the model file to use for evaluation * :code:`-k, --key`: The key to load the model. This argument is not required if :code:`-m` is followed by a TensorRT engine. Optional Arguments ****************** * :code:`--gpu_index`: The index of the (descrete) GPU for exporting the model if the machine has multiple GPUs installed. Note that export can only run on a single GPU. * :code:`--log_file`: The path to the log file. The default path is :code:`stdout`. * :code:`-h, --help`: Show this help message and exit. Running Inference on the Model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The :code:`inference` tool for MaskRCNN networks can be used to visualize bboxes or generate frame-by-frame COCO-format labels on a directory of images. Here's an example of using this tool: .. code:: tlt mask_rcnn inference [-h] -i -o -e -m -k [-l