FasterRCNN
==========

Preparing the Input Data Structure
----------------------------------

The dataset structure of FasterRCNN is identical to that of :ref:`DetectNet_v2 <detectnet_v2>`. The
only difference is the command line used to generate the TFRecords from KITTI text labels. To
generate TFRecords for FasterRCNN training, use this command:

.. code::

  tlt faster_rcnn dataset_convert [-h] -d <dataset_spec>
                                       -o <output_tfrecords_file>
                                       [--gpu_index <gpu_index>]

Required Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-d, --dataset_spec`: path to the dataset spec file.
* :code:`-o, --output_filename`: path to the output TFRecords file.

Optional Arguments
^^^^^^^^^^^^^^^^^^
* :code:`--gpu_index`: The GPU index to run this command on. We can specify the GPU index
  used to run this command if the machine has multiple GPUs installed. Note that this command
  can only run on a single GPU.

Creating an experiment spec file - Specification file for FasterRCNN
--------------------------------------------------------------------

The experiments specification (spec file for short) defines all the necessary parameters required to
in the entire workflow of a FasterRCNN model, from training to export. Below is a sample of the
FasterRCNN spec file. The format of the spec file is a protobuf text (prototxt) message and each of
its fields can be either a basic data type or a nested proto message. The top level structure of the
spec file is summarized in the table below. From the table, we can see the spec file has 9 components:
:code:`random_seed`, :code:`verbose`, :code:`enc_key`, :code:`dataset_config`, :code:`augmentation_config`,
:code:`model_config`, :code:`training_config`, :code:`inference_config` and :code:`evaluation_config`.

Here's a sample of the FasterRCNN spec file:

.. code::

  random_seed: 42
  enc_key: 'nvidia_tlt'
  verbose: True
  model_config {
  input_image_config {
  image_type: RGB
  image_channel_order: 'bgr'
  size_height_width {
  height: 384
  width: 1248
  }
      image_channel_mean {
          key: 'b'
          value: 103.939
  }
      image_channel_mean {
          key: 'g'
          value: 116.779
  }
      image_channel_mean {
          key: 'r'
          value: 123.68
  }
  image_scaling_factor: 1.0
  max_objects_num_per_image: 100
  }
  arch: "resnet:18"
  anchor_box_config {
  scale: 64.0
  scale: 128.0
  scale: 256.0
  ratio: 1.0
  ratio: 0.5
  ratio: 2.0
  }
  freeze_bn: True
  freeze_blocks: 0
  freeze_blocks: 1
  roi_mini_batch: 256
  rpn_stride: 16
  use_bias: False
  roi_pooling_config {
  pool_size: 7
  pool_size_2x: False
  }
  all_projections: True
  use_pooling:False
  }
  dataset_config {
    data_sources: {
      tfrecords_path: "/workspace/tlt-experiments/tfrecords/kitti_trainval/kitti_trainval*"
      image_directory_path: "/workspace/tlt-experiments/data/training"
    }
  image_extension: 'png'
  target_class_mapping {
  key: 'car'
  value: 'car'
  }
  target_class_mapping {
  key: 'van'
  value: 'car'
  }
  target_class_mapping {
  key: 'pedestrian'
  value: 'person'
  }
  target_class_mapping {
  key: 'person_sitting'
  value: 'person'
  }
  target_class_mapping {
  key: 'cyclist'
  value: 'cyclist'
  }
  validation_fold: 0
  }
  augmentation_config {
  preprocessing {
  output_image_width: 1248
  output_image_height: 384
  output_image_channel: 3
  min_bbox_width: 1.0
  min_bbox_height: 1.0
  }
  spatial_augmentation {
  hflip_probability: 0.5
  vflip_probability: 0.0
  zoom_min: 1.0
  zoom_max: 1.0
  translate_max_x: 0
  translate_max_y: 0
  }
  color_augmentation {
  hue_rotation_max: 0.0
  saturation_shift_max: 0.0
  contrast_scale_max: 0.0
  contrast_center: 0.5
  }
  }
  training_config {
  enable_augmentation: True
  enable_qat: False
  batch_size_per_gpu: 8
  num_epochs: 12
  retrain_pruned_model: "/workspace/tlt-experiments/data/faster_rcnn/model_1_pruned.tlt"
  output_model: "/workspace/tlt-experiments/data/faster_rcnn/frcnn_kitti_resnet18_retrain.tlt"
  rpn_min_overlap: 0.3
  rpn_max_overlap: 0.7
  classifier_min_overlap: 0.0
  classifier_max_overlap: 0.5
  gt_as_roi: False
  std_scaling: 1.0
  classifier_regr_std {
  key: 'x'
  value: 10.0
  }
  classifier_regr_std {
  key: 'y'
  value: 10.0
  }
  classifier_regr_std {
  key: 'w'
  value: 5.0
  }
  classifier_regr_std {
  key: 'h'
  value: 5.0
  }

  rpn_mini_batch: 256
  rpn_pre_nms_top_N: 12000
  rpn_nms_max_boxes: 2000
  rpn_nms_overlap_threshold: 0.7

  regularizer {
  type: L2
  weight: 1e-4
  }

  optimizer {
  sgd {
  lr: 0.02
  momentum: 0.9
  decay: 0.0
  nesterov: False
  }
  }

  learning_rate {
  soft_start {
  base_lr: 0.02
  start_lr: 0.002
  soft_start: 0.1
  annealing_points: 0.8
  annealing_points: 0.9
  annealing_divider: 10.0
  }
  }

  lambda_rpn_regr: 1.0
  lambda_rpn_class: 1.0
  lambda_cls_regr: 1.0
  lambda_cls_class: 1.0
  }
  inference_config {
  images_dir: '/workspace/tlt-experiments/data/testing/image_2'
  model: '/workspace/tlt-experiments/data/faster_rcnn/frcnn_kitti_resnet18_retrain.epoch12.tlt'
  batch_size: 1
  detection_image_output_dir: '/workspace/tlt-experiments/data/faster_rcnn/inference_results_imgs_retrain'
  labels_dump_dir: '/workspace/tlt-experiments/data/faster_rcnn/inference_dump_labels_retrain'
  rpn_pre_nms_top_N: 6000
  rpn_nms_max_boxes: 300
  rpn_nms_overlap_threshold: 0.7
  object_confidence_thres: 0.0001
  bbox_visualize_threshold: 0.6
  classifier_nms_max_boxes: 100
  classifier_nms_overlap_threshold: 0.3
  #trt_inference {
  #trt_engine: '/workspace/tlt-experiments/data/faster_rcnn/trt.int8.engine'
  #}
  }
  evaluation_config {
  model: '/workspace/tlt-experiments/data/faster_rcnn/frcnn_kitti_resnet18_retrain.epoch12.tlt'
  batch_size: 1
  validation_period_during_training: 1
  rpn_pre_nms_top_N: 6000
  rpn_nms_max_boxes: 300
  rpn_nms_overlap_threshold: 0.7
  classifier_nms_max_boxes: 100
  classifier_nms_overlap_threshold: 0.3
  object_confidence_thres: 0.0001
  use_voc07_11point_metric:False
  #trt_evaluation {
  #trt_engine: '/workspace/tlt-experiments/data/faster_rcnn/trt.int8.engine'
  #}
  gt_matching_iou_threshold: 0.5
  }

+-----------------------------+------------------------------------------------------------------------------------+-------------------------------+-------------------------------+
| **Parameter**               | **Description**                                                                    | **Data Type and Constraints** | **Default/Suggested Value**   |
+-----------------------------+------------------------------------------------------------------------------------+-------------------------------+-------------------------------+
| :code:`random_seed`         | The random seed for the experiment.                                                | Unsigned int                  | :code:`42`                    |
+-----------------------------+------------------------------------------------------------------------------------+-------------------------------+-------------------------------+
| :code:`enc_key`             | The encoding and decoding key for the TLT models, can be overridden by the command | Str, should not be empty      | --                            |
|                             | line arguments of :code:`tlt faster_rcnn train`, :code:`tlt faster_rcnn evaluate`  |                               |                               |
|                             | and :code:`tlt faster_rcnn inference`.                                             |                               |                               |
+-----------------------------+------------------------------------------------------------------------------------+-------------------------------+-------------------------------+
| :code:`verbose`             | Controls the logging level during the experiments. Will print more logs if True.   | Boolean(True or False)        | :code:`False`                 |
+-----------------------------+------------------------------------------------------------------------------------+-------------------------------+-------------------------------+
| :code:`dataset_config`      | The configurations of the dataset, this is the same as :code:`dataset_config`      | proto message                 | --                            |
|                             | in DetectNet_v2.                                                                   |                               |                               |
+-----------------------------+------------------------------------------------------------------------------------+-------------------------------+-------------------------------+
| :code:`augmentation_config` | The configuration of the data augmentation, same as DetectNet_v2.                  | proto message                 | --                            |
+-----------------------------+------------------------------------------------------------------------------------+-------------------------------+-------------------------------+
| :code:`model_config`        | The configuration of the model architecture.                                       | proto message                 | --                            |
+-----------------------------+------------------------------------------------------------------------------------+-------------------------------+-------------------------------+
| :code:`training_config`     | The configurations for doing training with the model.                              | proto message                 | --                            |
+-----------------------------+------------------------------------------------------------------------------------+-------------------------------+-------------------------------+
| :code:`inference_config`    | The configuration for doing inference with the model.                              | proto message                 | --                            |
+-----------------------------+------------------------------------------------------------------------------------+-------------------------------+-------------------------------+
| :code:`evaluation_config`   | The configuration for doing evaluation with the model.                             | proto message                 | --                            |
+-----------------------------+------------------------------------------------------------------------------------+-------------------------------+-------------------------------+

Dataset
^^^^^^^
The :code:`dataset_config` defines the dataset of a FasterRCNN experiments (including training dataset and validation dataset).
The definition of FasterRCNN dataset is identical to that of DetectNet_v2. Check the DetectNet_v2
:code:`dataset_config` documentation for the details of this parameter.

Data augmentation
^^^^^^^^^^^^^^^^^
The :code:`augmentation_config` defines the data augmentation during the training of a FasterRCNN
model. The definition of FasterRCNN data augmentation is identical to that of DetectNet_v2.
Check the DetectNet_v2 :code:`augmentation_config` documentation for the details of this parameter.

Model architecture
^^^^^^^^^^^^^^^^^^
The :code:`model_config` defines the FasterRCNN model architecture. In this parameter, we can choose
the backbone of the FasterRCNN model, enabling BatchNormalization layers or not, whether or not to
freeze the BatchNormalization layers during training, and whether or not to freeze some blocks in the model
during training. With this parameter, we can define a specialized FasterRCNN model
architecture from the general FasterRCNN application, according to the use cases. Detailed
description of this parameter is summarized in the table below.

.. TODO @Add comment re: line 330 asking how to add a row to the table to fix grammatical errors.

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
| **Parameter**                 | **Description**                                                                                                                    | **Data Type and Constraints**                                                                | **Default/Suggested Value**                                                   |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
| :code:`input_image_config`    | Defines the input image format, including the image channel number, channel order, width and height,                               | proto message                                                                                | --                                                                            |
|                               | and the preprocessings (subtract per-channel mean and divided by a scaling factor) for it before feeding                           |                                                                                              |                                                                               |
|                               | input the model. See below for details.                                                                                            |                                                                                              |                                                                               |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
| :code:`arch`                  | The feature extractor (backbone) for the FasterRCNN model. FasterRCNN supports 14 backbones.                                       | str type. The architecture can be ResNet, VGG , GoogLeNet, MobileNet or DarkNet. For         | --                                                                            |
|                               |                                                                                                                                    | each specific architecture, it can have different layers or versions. Details listed below.  |                                                                               |
|                               |                                                                                                                                    |                                                                                              |                                                                               |
|                               |                                                                                                                                    | ResNet series: resnet:10, resnet:18, resnet:34, resnet:50, resnet:101                        |                                                                               |
|                               |                                                                                                                                    |                                                                                              |                                                                               |
|                               |                                                                                                                                    | VGG series: vgg:16, vgg:19                                                                   |                                                                               |
|                               |                                                                                                                                    |                                                                                              |                                                                               |
|                               |                                                                                                                                    | GoogLeNet: googlenet                                                                         |                                                                               |
|                               |                                                                                                                                    |                                                                                              |                                                                               |
|                               |                                                                                                                                    | MobileNet series: mobilenet_v1, mobilenet_v2                                                 |                                                                               |
|                               |                                                                                                                                    |                                                                                              |                                                                               |
|                               |                                                                                                                                    | DarkNet: darknet:19, darknet:53                                                              |                                                                               |
|                               |                                                                                                                                    |                                                                                              |                                                                               |
|                               |                                                                                                                                    | EfficientNet: efficientnet:b0, efficientnet:b1                                               |                                                                               |
|                               |                                                                                                                                    |                                                                                              |                                                                               |
|                               |                                                                                                                                    | Here a notational convention can be used, i.e., for models that can have different numbers   |                                                                               |
|                               |                                                                                                                                    | of layers, use a colon followed by the layer number as the suffix of the model name.         |                                                                               |
|                               |                                                                                                                                    | E.g., resnet:<layer_number>                                                                  |                                                                               |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
| :code:`anchor_box_config`     | Configurations of the anchor boxes.                                                                                                | proto message.                                                                               | --                                                                            |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
| :code:`roi_mini_batch`        | The batch size of ROIs for training the RCNN.                                                                                      | int.                                                                                         | :code:`256`                                                                   |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
| :code:`rpn_stride`            | Cummulative stride from model input to RPN. This value is fixed (16) in current implementation.                                    | int.                                                                                         | :code:`16`                                                                    |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
| :code:`freeze_bn`             | A flag to freeze all the BatchNormalization layers in the model. Freezing a BatchNormalization layer means freezing its moving mean| Boolean.                                                                                     | :code:`False`                                                                 |
|                               | and moving variance while its gamma and beta parameters are still trainable. This is usually used in FasterRCNN training with a    |                                                                                              |                                                                               |
|                               | small batch size so the moving means and moving variances are initialized from the pretrained model and fixed during training.     |                                                                                              |                                                                               |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
| :code:`dropout_rate`          | The dropout rate is applicable to the Dropout layers in the model(if there are any).                                               | float. In the interval (0, 1).                                                               | :code:`0.0`                                                                   |
|                               | Currently only VGG 16/19 and EfficientNet has Dropout layers.                                                                      |                                                                                              |                                                                               |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
| :code:`drop_connect_rate`     | The drop_connect rate for EfficientNet.                                                                                            | float. In the interval (0, 1).                                                               | :code:`0.0`                                                                   |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
| :code:`freeze_blocks`         | The list of block IDs to freeze during training. Some times we want to freeze some blocks in the model after loading the pretrained| list of ints.                                                                                | leave it unspecified.                                                         |
|                               | models for some reason (save GPU memory, make training process more stable, etc.).                                                 | For ResNet, the valid block IDs for freezing is any subset of {0, 1, 2, 3}(inclusive).       |                                                                               |
|                               |                                                                                                                                    | For VGG, the valid block IDs for freezing is any subset of {1, 2, 3, 4, 5}(inclusive).       |                                                                               |
|                               |                                                                                                                                    | For GoogLeNet, the valid block IDs for freezing is any subset of {0, 1, 2, 3, 4, 5, 6, 7}    |                                                                               |
|                               |                                                                                                                                    | (inclusive). For MobileNet V1, the valid block IDs is any subset of {0, 1, 2, 3, 4, 5, 6, 7, |                                                                               |
|                               |                                                                                                                                    | 8, 9, 10, 11}(inclusive). For MobileNet V2, the valid block IDs is any subset of {0, 1, 2, 3,|                                                                               |
|                               |                                                                                                                                    | 4, 5, 6, 7, 8, 9, 10, 11, 12, 13}(inclusive). For DarkNet, the valid blocks IDs is any subset|                                                                               |
|                               |                                                                                                                                    | of {0, 1, 2, 3, 4, 5}(inclusive). For EfficientNet, the valid block IDs is any subset of {   |                                                                               |
|                               |                                                                                                                                    | 0, 1, 2, 3, 4, 5, 6, 7}(inclusive).                                                          |                                                                               |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
| :code:`use_bias`              | A flag to use bias for convlutional layers in the model. If the model has BatchNormalization layers, we usually set it to False.   | Boolean.                                                                                     | :code:`False`                                                                 |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
| :code:`roi_pooling_config`    | The configuration for the ROIPooling (CropAndResize) layer in the model.                                                           | proto message.                                                                               | --                                                                            |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
| :code:`all_projections`       | A flag to replace all the shortcut layers with projection layers in the model. Only valid for ResNet and MobileNet V2.             | Boolean.                                                                                     | :code:`False`                                                                 |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
| :code:`use_pooling`           | A flag to use pooling layers in the model or not. This parameter is valid only for VGG and ResNet. If set to True, pooling layers  | Boolean.                                                                                     | :code:`False`                                                                 |
|                               | will be used in the model(produces the same model structures as in papers). Otherwise, strided convlutional layers will be used    |                                                                                              |                                                                               |
|                               | and pooling layers will be omitted.                                                                                                |                                                                                              |                                                                               |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+
| :code:`activation`            | Defines the activation function used in the model. Only valid for EfficientNet.      For INT8 deployment, EfficientNet with relu   | proto message.                                                                               | --                                                                            |
|                               | activation will produces much better accuracy (mAP) than the original swish activation.                                            |                                                                                              |                                                                               |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+

Each of the above proto message parameters will be described in detail below.

Input image configurations
**************************

The :code:`input_image_config` defines the supported format of images by FasterRCNN model. We can
customize the input image size, the per-channel mean values and scaling factor for image preprocessing.
We can also specify the image type (RGB or grayscale) for our training/validation dataset, and the order of
the channel if we are going to use RGB images during training. This is described in the table
below in detail.

+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| **Parameter**                     | **Description**                                    | **Data Type and Constraints**                       | **Default/Suggested Value**    |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`image_type`                | The type of the images in the dataset.             | enum type, either :code:`RGB` or :code:`GRAY_SCALE` | :code:`RGB`                    |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`size_min`                  | Specify the input image's smaller side size,       | proto message with only one :code:`min` parameter   | --                             |
|                                   | exclusive with :code:`size_height_width`.          | to specify the smaller side size in pixel.          |                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`size_height_width`         | Specify the input image's height and width,        | proto message with two parameters: :code:`height`   | --                             |
|                                   | exclusive with :code:`size_min`.                   | and :code:`width` to specify a fixed image size.    |                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`image_channel_order`       | The image channel order.                           | str type. Can be :code:`rgb` or :code:`bgr` for RGB | --                             |
|                                   |                                                    | images. :code:`l` for grayscale images.             |                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`image_channel_mean`        | Per-channel mean values for the input images.      | proto dict that maps each channel to its mean       | --                             |
|                                   |                                                    | values.                                             |                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`image_scaling_factor`      | The image scaling factor to scale the images.      | float.                                              | :code:`1.0`                    |
|                                   | Each pixel value will be divided by this number.   |                                                     |                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`max_objects_num_per_image` | The maximum number of objects of an image          | int.                                                | :code:`100`                    |
|                                   | in the dataset.                                    |                                                     |                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+

.. Note:: The maximum number of objects in an image depends on the dataset. It is
   important to set the parameter :code:`max_objects_num_per_image` to be no less than this number.
   Otherwise, training will fail.

Anchor boxes
************

The parameter :code:`anchor_box_config` defines the anchor box sizes and aspect ratios in the
FasterRCNN model. There are two sub-parameters for it: :code:`scale` and :code:`ratio`. Each of
them is a list of floats as below.

+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| **Parameter**                     | **Description**                                    | **Data Type and Constraints**                       | **Default/Suggested Value**    |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`scale`                     | Anchor box scales (sizes) in pixel.                | list of floats.                                     | --                             |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`ratio`                     | Aspect ratios of the anchor boxes.                 | list of floats.                                     | --                             |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+

ROIPooling(CropAndResize)
*************************

The :code:`roi_pooling_config` parameter defines the parameters required in ROIPooling(CropAndResize)
layer in the model. Described in the table below.

+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| **Parameter**                     | **Description**                                    | **Data Type and Constraints**                       | **Default/Suggested Value**    |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`pool_size`                 | The output spatial size (height and width) of the  | int.                                                | :code:`7`                      |
|                                   | pooled ROIs. Only square ROIs are supported, so    |                                                     |                                |
|                                   | this parameter is for both height and width.       |                                                     |                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`pool_size_2x`              | A flag to double the pooled ROIs' size. If this is | Boolean.                                            | --                             |
|                                   | set to True. CropAndResize will produces ROIs of   |                                                     |                                |
|                                   | size 2*pool_size and in RCNN it will be downsampled|                                                     |                                |
|                                   | 2x to get back to pool_size.                       |                                                     |                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+


Activation function
*******************

The parameter :code:`activation` defines the type and parameter for the activation function in
a FasterRCNN model. This parameter is only valid for EfficientNet.

+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| **Parameter**                     | **Description**                                    | **Data Type and Constraints**                       | **Default/Suggested Value**    |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`activation_type`           | Type of the activation function. Only :code:`relu` | str.                                                | --                             |
|                                   | and :code:`swish` are supported.                   |                                                     |                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+


Training configurations
^^^^^^^^^^^^^^^^^^^^^^^

The proto message :code:`training_config` defines all the necessary parameters required for
a FasterRCNN training experiment. Each parameter is described in the table below.

+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| **Parameter**                     | **Description**                                    | **Data Type and Constraints**                       | **Default/Suggested Value**                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`enable_augmentation`       | A flag to enable data augmentation in training.    | Boolean.                                            | :code:`True`                               |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`pretrained_weights`        | The path to the pretrained weights for initializing| str.                                                | --                                         |
|                                   | the FasterRCNN model.                              |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`retrain_pruned_model`      | The path to the pruned model that we are going to  | str.                                                | --                                         |
|                                   | retrain.                                           |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`resume_from_model`         | The path to the model for which that we are going  | str.                                                | --                                         |
|                                   | to resume an interrupted training.                 |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`rpn_min_overlap`           | The lower IoU threshold used to match anchor boxes | float. In the interval (0, 1).                      | :code:`0.3`                                |
|                                   | to groundtruth boxes. If the IoU of an anchor box  |                                                     |                                            |
|                                   | and any groundtruth box is below this threshold,   |                                                     |                                            |
|                                   | then this anchor box will be regarded as an        |                                                     |                                            |
|                                   | negative anchor box.                               |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`rpn_max_overlap`           | The higher IoU threshold used to match anchor boxes| float. In the interval (0, 1).                      | :code:`0.7`                                |
|                                   | to groundtruth boxes. If the IoU of an anchor box  |                                                     |                                            |
|                                   | and some groundtruth box is higher this threshold, |                                                     |                                            |
|                                   | then this anchor box will be regarded as an        |                                                     |                                            |
|                                   | positive anchor box.                               |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`classifier_min_overlap`    | The lower IoU threshold used to generate the       | float. In the interval (0, 1).                      | :code:`0.0`                                |
|                                   | proposal target. If the IoU of a ROI and a         |                                                     |                                            |
|                                   | groundtruth box is above this number and below     |                                                     |                                            |
|                                   | classifier_max_overlap, then this ROI is regarded  |                                                     |                                            |
|                                   | as a negative ROI (background) during training.    |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`classifier_max_overlap`    | The higher IoU threshold used to generate the      | float. In the interval (0, 1).                      | :code:`0.0`                                |
|                                   | proposal target. If the IoU of a ROI and a         |                                                     |                                            |
|                                   | groundtruth box is above this number, then this    |                                                     |                                            |
|                                   | ROI is regarded as a positive ROI during training. |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`gt_as_roi`                 | A flag to include groundtruth boxes in the positive| Boolean.                                            | :code:`False`                              |
|                                   | ROIs for training the RCNN.                        |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`std_scaling`               | A scaling factor (multiplier) for RPN regression   | float.                                              | :code:`1.0`                                |
|                                   | loss.                                              |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`classifier_regr_std`       | Scaling factors (denominators) for the RCNN        | proto dict.                                         | :code:`{'x': 10, 'y': 10, 'w': 5, 'h': 5}` |
|                                   | regression loss. A map from 'x', 'y', 'w', 'h' to  |                                                     |                                            |
|                                   | its corresponding scaling factor, respectively.    |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`batch_size_per_gpu`        | Training batch size per GPU.                       | int.                                                | --                                         |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`num_epochs`                | Number of epochs for the training.                 | int.                                                | :code:`20`                                 |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`output_model`              | The path to the checkpoint tlt models during       | str.                                                | --                                         |
|                                   | training.                                          |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`checkpoint_interval`       | The period in epochs that we will save the         | int.                                                | :code:`1`                                  |
|                                   | checkpoint. Setting this number to be greater than |                                                     |                                            |
|                                   | num_epochs will essentially disable checkpointing. |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`rpn_pre_nms_top_N`         | The number of boxes (ROIs) to be retained before   | int.                                                | --                                         |
|                                   | the NMS in Proposal layer.                         |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`rpn_nms_max_boxes`         | The maximum number of boxes (ROIs) to be retained  | int.                                                | --                                         |
|                                   | after the NMS in Proposal layer.                   |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`rpn_nms_overlap_threshold` | The IoU threshold for NMS in Proposal layer.       | float. In the interval (0, 1).                      | :code:`0.7`                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`regularizer`               | The configuration for regularizer.                 | proto message.                                      | --                                         |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`optimizer`                 | The configuration for optimizer.                   | proto message.                                      | --                                         |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`learning_rate`             | The configuration for learning rate scheduler.     | proto message.                                      | --                                         |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`lambda_rpn_regr`           | Weighting factor for RPN regression loss.          | float.                                              | :code:`1.0`                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`lambda_rpn_class`          | Weighting factor for RPN classification loss.      | float.                                              | :code:`1.0`                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`lambda_cls_regr`           | Weighting factor for RCNN regression loss.         | float.                                              | :code:`1.0`                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`lambda_cls_class`          | Weighting factor for RCNN classification loss.     | float.                                              | :code:`1.0`                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`enable_qat`                | A flag to enable QAT (quantization-aware training).| Boolean.                                            | :code:`False`                              |
|                                   | FasterRCNN does not support loading a non-QAT      |                                                     |                                            |
|                                   | pruned model and retraining with QAT enabled.      |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+
| :code:`model_parallelism`         | List of fraction for model parallelism. Each       | repeated float                                      | --                                         |
|                                   | number is a fraction that represents the percentage|                                                     |                                            |
|                                   | of model layers to be placed on a GPU. For example |                                                     |                                            |
|                                   | two repeated :code:`model_parallelism: 0.5`        |                                                     |                                            |
|                                   | indicates the training will use 2 GPUs and each GPU|                                                     |                                            |
|                                   | will have a half of model layers on it.            |                                                     |                                            |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------------------+

The description for :code:`regularizer`, :code:`optimizer` and :code:`learning_rate` are summarized
further below.

Regularizer
***********

+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| **Parameter**                     | **Description**                                    | **Data Type and Constraints**                       | **Default/Suggested Value**    |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`type`                      | The type of the regularizer.                       | enum type. :code:`L1`, :code:`L2` or :code:`NO_REG` | --                             |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`weight`                    | The penality of the regularizer.                   | float.                                              | --                             |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+

Optimizer
*********

Three types of optimizers are supported by FasterRCNN: Adam, SGD and RMSProp. Only one of them
should be specified in spec file. No matter which one is chosen, it will be wrapped in a :code:`optimizer`
proto. For example:

.. code::

  optimizer {
  adam {
  lr: 0.00001
  beta_1: 0.9
  beta_2: 0.999
  decay: 0.0
  }
  }

The Adam optimizer parameters are summarized in the table below.

+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| **Parameter**                     | **Description**                                    | **Data Type and Constraints**                       | **Default/Suggested Value**    |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`lr`                        | learning rate. This is actually overriden by       | float.                                              | :code:`0.00001`                |
|                                   | the learning rate scheduler and hence not useful.  |                                                     |                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`beta_1`                    | Momentum for the means of the model parameters.    | float.                                              | :code:`0.9`                    |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`beta_2`                    | Momentum for the variances of the model parameters.| float.                                              | :code:`0.999`                  |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`decay`                     | decay factor for the learning rate. Not useful     | float.                                              | :code:`0.0`                    |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+

The SGD optimizer parameters are summarized in the table below.

+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| **Parameter**                     | **Description**                                    | **Data Type and Constraints**                       | **Default/Suggested Value**    |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`lr`                        | learning rate. Not useful as the learning rate is  | float.                                              | :code:`0.00001`                |
|                                   | overriden by the learning rate scheduler.          |                                                     |                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`momentum`                  | Momentum of SGD.                                   | float.                                              | :code:`0.0`                    |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`decay`                     | decay factor of the learning rate. Not useful as   | float.                                              | :code:`0.0`                    |
|                                   | overriden by learning rate scheduler.              |                                                     |                                |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`nesterov`                  | A flag to enable Nesterov momentum for SGD.        | Boolean.                                            | :code:`False`                  |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+

The RMSProp optimizer parameters are summarized in the table below.

+---------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| **Parameter**             | **Description**                                    | **Data Type and Constraints**                       | **Default/Suggested Value**    |
+---------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`lr`                | learning rate. Not useful as learning rate is      | float.                                              | :code:`0.00001`                |
|                           | overriden by learning rate scheduler.              |                                                     |                                |
+---------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+

Learning rate scheduler
***********************

The parameter :code:`learning_rate` defines the learning rate scheduler in a FasterRCNN training.
Two types of learning rate schedulers are supported in FasterRCNN: :code:`soft_start` and
:code:`step`. NO matter which one is chosen, it will be wrapped in a :code:`learning_rate` proto message.
For example:

.. code::

  learning_rate {
  step {
  base_lr: 0.00001
  gamma: 1.0
  step_size: 30
  }
  }

The parameters of :code:`soft_start` scheduler is described in the table below.

+---------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| **Parameter**             | **Description**                                    | **Data Type and Constraints**                       | **Default/Suggested Value**    |
+---------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`base_lr`           | Maximum learning rate during the training.         | float.                                              | --                             |
+---------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`start_lr`          | The initial learning rate at the start of the      | float. Smaller than :code:`base_lr`.                | --                             |
|                           | training.                                          |                                                     |                                |
+---------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`soft_start`        | The duration (in percentage of total epochs) of the| float. In the interval (0, 1).                      | --                             |
|                           | soft start phase of the learning rate curve.       |                                                     |                                |
+---------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`annealing_points`  | List of time points at which to decrease the       | list of floats.                                     | --                             |
|                           | learning rate. Also in percentage.                 |                                                     |                                |
+---------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`annealing_divider` | divider to decrease the learning rate at each of   | float.                                              | --                             |
|                           | annealing_points.                                  |                                                     |                                |
+---------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+

The parameters of :code:`step` scheduler is described in the table below.

+---------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| **Parameter**             | **Description**                                    | **Data Type and Constraints**                       | **Default/Suggested Value**    |
+---------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`base_lr`           | base learning rate at the start of training.       | float.                                              | --                             |
+---------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`gamma`             | multiplier to decrease learning rate.              | float.                                              | --                             |
+---------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`step_size`         | the step size (in percentage of total epochs) at   | float.                                              | --                             |
|                           | which the learning rate is multiplied by gamma.    |                                                     |                                |
+---------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+

.. Note:: The learning rate is automatically scaled with the number of GPUs used during training, or the effective learning rate is :code:`learning_rate * n_gpu`. 

Inference configurations
^^^^^^^^^^^^^^^^^^^^^^^^

The parameter :code:`inference_config` defines all the parameters required for running inference
against a FasterRCNN model.

+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| **Parameter**                            | **Description**                                    | **Data Type and Constraints**                       | **Default/Suggested Value**    |
+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`images_dir`                       | The path to the directory of images to run         | str.                                                | --                             |
|                                          | inference on.                                      |                                                     |                                |
+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`model`                            | Path to the :code:`.tlt` model to run inference.   | str.                                                | --                             |
+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`batch_size`                       | Batch size for running inference.                  | int.                                                | :code:`1`                      |
+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`rpn_pre_nms_top_N`                | The number of boxes (ROIs) to be retained before   | int.                                                | --                             |
|                                          | the NMS in Proposal layer in inference.            |                                                     |                                |
+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`rpn_nms_max_boxes`                | The maximum number of boxes (ROIs) to be retained  | int.                                                | --                             |
|                                          | after the NMS in Proposal layer in inference.      |                                                     |                                |
+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`rpn_nms_overlap_threshold`        | The IoU threshold for NMS in Proposal layer.       | float. In the interval (0, 1).                      | :code:`0.7`                    |
+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`object_confidence_thres`          | Object confidence score threshold in NMS. All the  | float. In the interval (0, 1).                      | :code:`0.0001`                 |
|                                          | objects whose confidence is lower than this number |                                                     |                                |
|                                          | will filtered out in NMS.                          |                                                     |                                |
+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`classifier_nms_max_boxes`         | The maximum number of boxes to retain in RCNN NMS. | int.                                                | :code:`100`                    |
+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`classifier_nms_overlap_threshold` | RCNN NMS IoU threshold.                            | float. In the interval (0, 1).                      | :code:`0.3`                    |
+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`detection_image_output_dir`       | Output directory for detection images.             | str.                                                | --                             |
+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`bbox_caption_on`                  | A flag to display the class name and confidence for| Boolean.                                            | :code:`False`                  |
|                                          | each detected object in an image.                  |                                                     |                                |
+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`labels_dump_dir`                  | Output directory to save the labels of the detected| str.                                                | --                             |
|                                          | objects.                                           |                                                     |                                |
+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`trt_inference`                    | The configurations for TensorRT based inference.   | proto message.                                      | --                             |
|                                          | If this parameter is set, inference will use       |                                                     |                                |
|                                          | TensorRT engine instead of :code:`.tlt` model.     |                                                     |                                |
+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`nms_score_bits`                   | The number of bits to represent the score values in| int. In the interval [1, 10].                       | :code:`0`                      |
|                                          | NMS plugin in TensorRT OSS. The valid range is     |                                                     |                                |
|                                          | integers in [1, 10]. Setting it to any other values|                                                     |                                |
|                                          | will make it fall back to ordinary NMS. Currently  |                                                     |                                |
|                                          | this optimized NMS plugin is only avaible in FP16  |                                                     |                                |
|                                          | but it should also be selected by INT8 data type as|                                                     |                                |
|                                          | there is no INT8 NMS in TensorRT OSS and hence this|                                                     |                                |
|                                          | fastest implementation in FP16 will be selected.   |                                                     |                                |
|                                          | If falling back to ordinary NMS, the actual data   |                                                     |                                |
|                                          | type when building the engine will decide the exact|                                                     |                                |
|                                          | precision(FP16 or FP32) to run at.                 |                                                     |                                |
+------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+


TensorRT based inference
************************

The parameter :code:`trt_inference` defines all the parameters for TensorRT based inference.
When specified, Inference will use TensorRT engine instead of the :code:`.tlt` model.
The TensorRT engine is assumed to be generated by the :code:`tlt-converter` tool.
All the parameters are summarized in the table below.

+----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| **Parameter**                    | **Description**                                    | **Data Type and Constraints**                       | **Default/Suggested Value**    |
+----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`trt_engine`               | Path to the TensorRT engine file to load.          | str.                                                | --                             |
|                                  | Exclisive with etlt_model below.                   |                                                     |                                |
+----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+

Evaluation configurations
^^^^^^^^^^^^^^^^^^^^^^^^^

The parameter :code:`evaluation_config` defines all the required parameters for running evaluation
against a FasterRCNN model. This parameter is very similar to :code:`inference_config`.

+-------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| **Parameter**                             | **Description**                                    | **Data Type and Constraints**                       | **Default/Suggested Value**    |
+-------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`model`                             | Path to the :code:`.tlt` model to run evaluation.  | str.                                                | --                             |
+-------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`batch_size`                        | Batch size for running inference.                  | int.                                                | :code:`1`                      |
+-------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`rpn_pre_nms_top_N`                 | The number of boxes(ROIs) to be retained before the| int.                                                | --                             |
|                                           | NMS in Proposal layer in evaluation.               |                                                     |                                |
+-------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`rpn_nms_max_boxes`                 | The maximum number of boxes(ROIs) to be retained   | int.                                                | --                             |
|                                           | after the NMS in Proposal layer in evaluation.     |                                                     |                                |
+-------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`rpn_nms_overlap_threshold`         | The IoU threshold for NMS in Proposal layer.       | float. In the interval (0, 1).                      | :code:`0.7`                    |
+-------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`object_confidence_thres`           | Object confidence score threshold in NMS. All the  | float. In the interval (0, 1).                      | :code:`0.0001`                 |
|                                           | objects whose confidence is lower than this number |                                                     |                                |
|                                           | will filtered out in NMS.                          |                                                     |                                |
+-------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`classifier_nms_max_boxes`          | The maximum number of boxes to retain in RCNN NMS. | int.                                                | :code:`100`                    |
+-------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`classifier_nms_overlap_threshold`  | RCNN NMS IoU threshold.                            | float. In the interval (0, 1).                      | :code:`0.3`                    |
+-------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`use_voc07_11point_metric`          | A flag to use PASCAL VOC 2007 11-point AP metric.  | Boolean.                                            | --                             |
+-------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`validation_period_during_training` | The period(in epochs) for doing validation during  | int.                                                | :code:`1`                      |
|                                           | training.                                          |                                                     |                                |
+-------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`trt_evaluation`                    | The configurations for TensorRT based evaluation.  | proto message.                                      | --                             |
|                                           | If this parameter is set, evaluation will use      |                                                     |                                |
|                                           | TensorRT engine instead of :code:`.tlt` model.     |                                                     |                                |
+-------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`gt_matching_iou_threshold`         | IoU threshold to match detected boxes with         | float.                                              | :code:`0.5`                    |
|                                           | groundtruth boxes. Exclusive with                  |                                                     |                                |
|                                           | gt_matching_iou_threshold_range below.             |                                                     |                                |
+-------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`gt_matching_iou_threshold_range`   | Range of IoU thresholds for computing AP at        | proto message.                                      | --                             |
|                                           | multiple IoU thresholds and computing COCO mAP.    |                                                     |                                |
|                                           | Exclusive with gt_matching_iou_threshold above.    |                                                     |                                |
+-------------------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+

TensorRT based evaluation
*************************

In the above table, the definition of :code:`trt_evaluation` is the same as :code:`trt_inference`
parameter described before. The :code:`gt_matching_iou_threshold_range` parameter is described in
table below.

+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| **Parameter**                     | **Description**                                    | **Data Type and Constraints**                       | **Default/Suggested Value**    |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`start`                     | start point of the IoU list(inclusive).            | float. In the interval (0, 1).                      | :code:`0.5`                    |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`step`                      | step size of the IoU list.                         | float. In the interval (0, 1).                      | :code:`0.05`                   |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+
| :code:`end`                       | end point of the IoU list(exclusive).              | float. In the interval (0, 1].                      | :code:`1.0`                    |
+-----------------------------------+----------------------------------------------------+-----------------------------------------------------+--------------------------------+


Training the model
------------------

.. _training_the_model:

To run training of a FasterRCNN model, use this command:

.. code::

    tlt faster_rcnn train [-h] -e <experiment_spec>
                               [-k <enc_key>]
                               [--gpus <num_gpus>]
                               [--num_processes <number_of_processes>]
                               [--gpu_index <gpu_index>]
                               [--use_amp]
                               [--log_file <log_file_path>]

Required Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-e, --experiment_spec_file`: Experiment specification file to set up the evaluation
  experiment. This should be the same as training specification file.

Optional Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-h, --help`: Show this help message and exit.
* :code:`-k, --enc_key`: TLT encoding key, can override the one in the spec file.
* :code:`--gpus`: The number of GPUs to be used in the training in a multi-GPU
  scenario (default: 1).
* :code:`--num_processes, -np`: Number of processes to be spawned for training. It defaults to
  be -1(equal to :code:`--gpus`, for the use case of data parallelism). In the case of model
  parallelism, this argument should be explicitly set to 1 or more, depending on the actual
  scenario. Setting :code:`--gpus` to be larger than 1 and :code:`--num_processes` to 1
  correspoinding to the model parallelism use case; while setting both :code:`--gpus` and
  :code:`num_processes`  to be larger than 1 corresponding to the case of enabling both model parallelism and
  data parallelism. For example, :code:`--gpus=4` and :code:`--num_processes=2` means
  2 horovod processes will be spawned and each of them will occupy 2 GPUs for model parallelism.
* :code:`--gpu_index`: The GPU indices used to run the training. We can specify
  the GPU indices used to run training when the machine has multiple GPUs installed.
* :code:`--use_amp`: A flag to enable AMP training.
* :code:`--log_file`: Path to the log file. Defaults to stdout.

Input Requirement
^^^^^^^^^^^^^^^^^

* **Input size**: C * W * H (where C = 1 or 3, W >= 128, H >= 128)
* **Image format**: JPG, JPEG, PNG
* **Label format**: KITTI detection

Sample Usage
^^^^^^^^^^^^

Here's an example of using the FasterRCNN training command:

.. code::

   tlt faster_rcnn train --gpu_index 0 -e <experiment_spec>

Using a Pretrained Model
^^^^^^^^^^^^^^^^^^^^^^^^

Usually, using a pretrained model (weights) file for the initial training of FasterRCNN helps get better
accuracy. NVIDIA recommends using the pretrained weights provided in NVIDIA GPU Cloud (NGC).
FasterRCNN loads the pretrained weights by name. That is, layer by layer, if TLT finds a layer
whose name and weights (bias) shape in the pretrained weights file matches a layer in the TLT
model, it will load that layer's weights (and bias, if any) into the model. If some layer in the
TLT cannot find a matching layer in the pretrained weights, then TLT will skip that layer and
will use random initialization for that layer instead. An exception is that if TLT finds a
matching layer in the pretrained weights (and bias, if any) but the shape of the pretrained
weights (or bias, if any) in that layer does not match the shape of weights (bias) for the
corresponding layer in TLT model, it will also skip that layer.

For some layers that have no weights (bias), nothing will be done for it(hence will be skipped).
So, in total, there are three possible statuses to indicate how a layer's pretrained weights
loading is going on:

* :code:`"Yes"` means a layer has weights (bias) and is loaded from the pretrained weights file
  successfully for initialization.
* :code:`"No"` means a layer has weights (bias) but due to mismatched weights (bias) shape(or probably
  something else), the weights (bias) cannot be loaded successfully and will use random
  initialization instead.
* :code:`"None"` means a layer has no weights (bias) at all and will not load any weights. In the
  FasterRCNN training log, there is a table that shows the pretrained weights loading status for
  each layer in the model.

To use a pretrained model in FasterRCNN training, set the :code:`pretrained_weights` path to point
to a pretrained :code:`.tlt` model (generated with the same encryption key as the FasterRCNN training),
a Keras :code:`.hdf5` model or a Keras :code:`.h5` weights.

.. Note:: At the start of the training, FasterRCNN will print the pretrained model loading status (per-layer).
   If facing with bad mAP with the model, we can double check this log to see if the pretrained model
   is loaded properly or not.

.. Note:: FasterRCNN does not support loading a non-QAT pruned model and retraining it with QAT
   enabled. To make the retrained model a QAT model, it is required to do the initial training with
   QAT enabled too.

Re-training a pruned model
^^^^^^^^^^^^^^^^^^^^^^^^^^

A FasterRCNN model can be retrained one or more times. The typical use case is retraining for a pruned
model. To retrain an existing FasterRCNN model, set the :code:`retrain_pruned_model` path to point to
an existing FasterRCNN model. 

Resuming an interrupted training
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Sometimes a training job can be interrupted due to some reason (e.g., system crash). In these cases,
there is no need to redo the training from the start. We can resume the interrupted training
from the last checkpoint(saved :code:`.tlt` model during training). In this case, set the
:code:`resume_from_model` path in spec file to point to the last checkpoint and re-run the training
to resume the job.

Input shape: static and dynamic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

FasterRCNN training can support both static input shape and dynamic input shape. Static input shape
means the input's width and height are constant numbers like 960 x 544. Static shape is the most
commonly used case in practice. To enable static input shape, we should specify it in
:code:`input_image_config` and :code:`augmentation_config`. We should use :code:`size_height_width`
in :code:`input_image_config` to specify the input height and width. Again, we should specify the
same two numbers in :code:`augmentation_config`. That is, we specify the
:code:`output_image_height` and :code:`output_image_width` in :code:`augmentation_config`.

With static input shape, we can offline resize the images to the target resolution or we can enable
automatic resize during training. By setting :code:`enable_auto_resize` in :code:`augmentation_config`
to :code:`True` we will enable automatic resize during training. Automatic resize will reduce the
effort to manually resize the images each time we want to train the model on a different resolution.
But since resize happens during training, it will potentially increase the training time.
Users should make this tradeoff between offline resize and automatic(online) resize.

Dynamic input shape means the input's height and width are not a constant number but rather can
change during training for different images. This kind of input shape is originally proposed in
the literature(such as in FasterRCNN paper) where we resize the image and keep aspect ratio such that
the resultant image's smaller side is a given number. Besides the limit on smaller side, we also have
a limit on the larger side. If we resize and keep aspect ratio but the resultant image's larger side's
size exceed this limit on larger side, then we will resize and keep aspect ratio such that the larger
side's size is a given number. In that case, the smaller side will be also no more than its limit.
FasterRCNN can support this kind of dynamic input shape. To enable this feature, we have to
specify :code:`size_min` in :code:`input_image_config` and specify :code:`output_image_min` and
:code:`output_image_max` in :code:`augmentation_confg`. :code:`size_min` and :code:`output_image_min`
indicates the limit of the smaller side's size, while :code:`output_image_max` indicates the limit on
the larger side's size.

Note that there are some limitations regarding the dynamic shape of FasterRCNN.

* TLT FasterRCNN training/evaluation/inference can only work with batch size 1.
* TLT FasterRCNN export & DeepStream(TensorRT) inference/evaluation does not support dynamic shape for now.

Model parallelism
^^^^^^^^^^^^^^^^^

FasterRCNN supports model parallelism. Model parallelism is a technique that we split the entire model
on multiple GPUs and each GPU will hold a part of the model. A model is splitted by layers. For example,
if a model has 100 layers, then we can place the layer 0-49 on GPU 0 and layer 50-99 on GPU 1.
Model parallelism will be useful when the model is huge and cannot fit into a single GPU even with
batch size 1. Model parallelism is also useful if we want to increase the batch size that is seen
by BatchNormalization layers and hence potentially improve the accuracy. This feature can be enabled
by setting :code:`model_parallelism` in :code:`training_config`. For example,

.. code::

    model_parallelism: 0.3
    model_parallelism: 0.7

will enable a 2-GPU model parallelism where the first GPU will hold 30% of the model layers and the
second GPU will hold 70% of the model layers. The percentage of model layers can be adjusted with
some trial-and-error so all GPUs consumes almost the same GPU memory size and in that case we can
use the largest batch size for this model-parallelised training.

Model parallelism can be enabled jointly with data parallelism. For example, in above case we enabled
a 2-GPU model parallelism, at the same time we can also enable 4 horovod processes for it. In this case,
we have 4 horovod processes for data parallelism and each process will have the model splitted on 2 GPUs.

Evaluating the model
--------------------

To run evaluation for a faster_rcnn model, use this command:

.. code::

    tlt faster_rcnn evaluate [-h] -e <experiment_spec>
                                  [-k <enc_key>]
                                  [--gpu_index <gpu_index>]
                                  [--log_file <log_file_path>]

Required Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-e, --experiment_spec_file`: Experiment spec file to set up the evaluation experiment.
  This should be the same as a training spec file.

Optional Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-h, --help`: show this help message and exit.
* :code:`-k, --enc_key`：The encoding key, can override the one in the spec file.
* :code:`--gpu_index`: The GPU index used to run the evaluation. We can specify
  the GPU index used to run evaluation when the machine has multiple GPUs installed.
  Note that evaluation can only run on a single GPU.
* :code:`--log_file`: Path to the log file. Defaults to stdout.

Evaluation Metrics
^^^^^^^^^^^^^^^^^^

The PASCAL VOC 2007 vs 2012 metrics
***********************************

For FasterRCNN, the evaluation will produce 4 metrics for the evaluated model:
AP (average precision), precision, recall and RPN_recall for each class in the evaluation dataset.
inally, it will also print the mAP (mean average precision) as a single metric number. Two
modes are supported for computing the AP, i.e., the PASCAL VOC 2007 and 2012 metrics. This
can be configured in the spec file's :code:`evaluation_config.use_voc_11_point_metric` parameter.
If this parameter is set to True, then AP calculation will use VOC 2007 method, otherwise it will
use the VOC 2012 method.

Setting IoU value/range for computing AP/mAP
********************************************

For matching the detected objects to groundtruth objects, we can define different IoU thresholds.
An IoU of 0.5 is used in PASCAL VOC metrics, while in MS COCO a list of IoUs are used to compute the AP.
For example, in MS COCO, the mAP@[0.5:0.05:0.95] is the averaged AP at 10 different IoUs, starting from 0.5
and ends with 0.95, with a step size of 0.05. TLT FasterRCNN supports evaluating AP at a list of IoUs
and computing the mAP across the range of IoUs. Specifically, setting :code:`gt_matching_iou_threshold`
in :code:`evaluation_config` will produce the AP/mAP at a single IoU; setting :code:`gt_matching_iou_threshold_range`
for a list (range) of IoUs will produce AP at these IoU values and the mAP. In order to compute PASCAL VOC
mAP, we can set the former to 0.5. While in order to compute COCO mAP, we can set the latter to be
:code:`start: 0.5`, :code:`step: 0.05` and :code:`end: 1.0`.

The RPN_recall metric indicates the recall capability of the RPN of the
FasterRCNN model. The higher the RPN_recall metric, it means RPN can better detect an object as
foreground (but it doesn't say anything on which class this object belongs to since that is
delegated to RCNN). The RPN_recall metric is mainly used for debugging on the accuracy issue
of a FasterRCNN model.

Two Modes for Evaluation
^^^^^^^^^^^^^^^^^^^^^^^^

The evaluation for FasterRCNN has two modes. It can run with either
TLT backend or TensorRT backend. This behavior is also controlled via the spec file. The
:code:`evaluation_config` in the spec file can have an optional :code:`trt_evaluation` sub-field
that specifies which backend the evaluation will run with.

By default (if the :code:`trt_evaluation` sub-field is not present in :code:`evaluation_config)`,
evaluation will use TLT as the backend. If the :code:`trt_evaluation` sub-field
is present, it can specify evaluation to run at TensorRT backend. In that case,
the model to do inference is the TensorRT engine file from export or :code:`tlt-converter`.

To use a TensorRT engine file for TensorRT backend based evaluation, the
:code:`trt_evaluation` sub-field should look like this:

.. code::

        trt_evaluation {
        trt_engine: '/workspace/tlt-experiments/data/faster_rcnn/trt.int8.engine'
        }

If the TensorRT inference data type is not INT8, the :code:`calibration_cache` sub-field that
provides the path to the INT8 calibration cache is not required. In INT8 case, the calibration
cache should be generated via the :code:`tlt faster_rcnn export` command line in INT8 mode.
See also the documentation of FasterRCNN spec file for the details of the :code:`trt_evaluation`
message structure.

Running inference on the model
------------------------------

The inference tool for FasterRCNN networks can be used to visualize bboxes or generate
frame by frame KITTI format labels on a directory of images. You can execute this tool from the
command line as shown here:

.. code::

    tlt faster_rcnn inference [-h] -e <experiment_spec>
                                   [-k <enc_key>]
                                   [--gpu_index <gpu_index>]
                                   [--log_file <log_file_path>]

Required Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-e, --experiment_spec_file`: Path to the experiment specification file for FasterRCNN
  training.

Optional Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-h, --help`: Print help log and exit.
* :code:`-k, --enc_key`: The encoding key, can override the one in the spec file.
* :code:`--gpu_index`: The GPU index to run inference on. We can specify the GPU index
  used to run inference if the machine has multiple GPUs installed. Note that inference
  can only run on a single GPU.
* :code:`--log_file`: Path to the log file. Defaults to stdout.

Two Modes for Inference
^^^^^^^^^^^^^^^^^^^^^^^

The inference for FasterRCNN has two modes. It can run with either TLT
backend or TensorRT backend. This behavior is also controlled via the spec file. The
:code:`inference_config` in the spec file can have an optional :code:`trt_inference` sub-field that
specifies which backend the inference will run with. By default (if the :code:`trt_inference`
sub-field is not present in :code:`inference_config`), inference will use TLT as the backend.
If the :code:`trt_inference` sub-field is present, it can specify inference to run at
TensorRT backend. In that case, the model to do inference is the TensorRT engine file from export or
:code:`tlt-converter`.

To use a TensorRT engine file for TensorRT backend based inference, the
:code:`trt_inference` sub-field should look like this:

.. code::

    trt_inference {
    trt_engine: '/workspace/tlt-experiments/data/faster_rcnn/trt.int8.engine'
    }

If the TensorRT inference data type is not INT8, the :code:`calibration_cache` sub-field that
provides the path to the INT8 calibration cache is not required. In INT8 case, the calibration
cache should be generated via the :code:`tlt faster_rcnn export` command line in INT8 mode.
See also the documentation of FasterRCNN spec file for the details of the :code:`trt_inference`
message structure.

Pruning the model
-----------------

Pruning removes parameters from the model to reduce the model size without compromising the
integrity of the model itself using the :code:`tlt faster_rcnn prune` command.

The :code:`tlt faster_rcnn prune` command includes these parameters:

.. code::

        tlt faster_rcnn prune [-h] -m <model>
                                   -o <output_file>
                                   -k <key>
                                   [-n <normalizer>]
                                   [-eq <equalization_criterion>]
                                   [-pg <pruning_granularity>]
                                   [-pth <pruning threshold>]
                                   [-nf <min_num_filters>]
                                   [-el [<excluded_list>]
                                   [--gpu_index <gpu_index>]
                                   [--log_file <log_file_path>]

Required Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-m, --model`: Path to a pretrained :code:`.tlt` model to be pruned.
* :code:`-o, --output_file`: Path to save the pruned :code:`.tlt` model.
* :code:`-k, --ke`: Key to load a :code`.tlt` model.

Optional Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-h, --help`: Show this help message and exit.
* :code:`-n, –normalizer`: ``max`` to normalize by dividing each norm by the maximum norm within
  a layer; ``L2`` to normalize by dividing by the L2 norm of the vector comprising all kernel norms.
  (default: `max`)
* :code:`-eq, --equalization_criterion`: Criteria to equalize the stats of inputs to an
  element wise op layer or depth-wise convolutional layer. This parameter is useful for
  resnets and mobilenets. Options are :code:`arithmetic_mean`,:code:`geometric_mean`,
  :code:`union`, and :code:`intersection`. (default: :code:`union`)
* :code:`-pg, -pruning_granularity`: Number of filters to remove at a time (default:8)
* :code:`-pth`: Threshold to compare normalized norm against (default:0.1)

  .. Note:: NVIDIA recommends changing the threshold to keep the number of parameters in the
     model to within 10-20% of the original unpruned model.

* :code:`-nf, --min_num_filters`: Minimum number of filters to keep per layer (default:16)
* :code:`-el, --excluded_layers`: List of excluded_layers. Examples: -i item1 item2 (default: [])
* :code:`--gpu_index`: The GPU index to run pruning on. We can specify the GPU index
  used to run pruning if the machine has multiple GPUs installed. Note that pruning
  can only run on a single GPU.
* :code:`--log_file`: Path to the log file. Defaults to stdout.

After pruning, the model needs to be retrained. See :ref:`Re-training the Pruned Model
<re-training_the_pruned_model>` for more details.

Using the Prune Command
^^^^^^^^^^^^^^^^^^^^^^^

Here's an example of using the :code:`tlt faster_rcnn prune` command:

.. code::

        tlt faster_rcnn prune -m /workspace/output/weights/resnet_003.tlt
                              -o /workspace/output/weights/resnet_003_pruned.tlt
                              -eq union
                              -pth 0.7
                              -k nvidia_tlt

Retraining the pruned model
---------------------------

.. _re-training_the_pruned_model:

Once the model has been pruned, there might be a slight decrease in accuracy. This happens
because some previously useful weights may have been removed. In order to regain the accuracy,
NVIDIA recommends that you retrain this pruned model over the same dataset. To do this, use
the :code:`tlt faster_rcnn train` command as documented in :ref:`Training the model <training_the_model>` with
an updated spec file that points to the newly pruned model as the pretrained model file.

Users are advised to turn off the regularizer(set regularizer type to :code:`NO_REG`) or use a
smaller weight decay in the spec file to recover the accuracy when retraining a pruned model.
All the other parameters may be retained in the spec file from the previous training.

For FasterRCNN, it is important to set the :code:`retrain_pruned_model` path to point to the pruned
model.

Exporting the model
-------------------

.. _exporting_the_model:

Exporting the model decouples the training process from inference and allows conversion to
TensorRT engines outside the TLT environment. TensorRT engines are specific to each hardware
configuration and should be generated for each unique inference environment. 
The exported model may be used universally across training and deployment hardware.
The exported model format is referred to as :code:`.etlt`. Like :code:`.tlt`, the :code:`.etlt` model
format is also a encrypted model format with the same key of the :code:`.tlt` model that it is
exported from. This key is required when deploying this model.

FasterRCNN export can optionally generate a (partial) DeepStream configuration file and label file. See below.

INT8 Mode Overview
^^^^^^^^^^^^^^^^^^

TensorRT engines can be generated in INT8 mode to improve performance, but require a calibration
cache at engine creation-time. The calibration cache is generated using a calibration tensor
file, if export is run with the :code:`--data_type` flag set to :code:`int8`.
Pre-generating the calibration information and caching it removes the need for calibrating the
model on the inference device. Using the calibration cache also speeds up engine creation as building the
cache can take several minutes to generate depending on the size of the calibration data and the model
itself.

The export tool can generate INT8 calibration cache by ingesting training data using either of
these options:

* **Option 1**: Using the training data loader to load the training images for INT8 calibration.
  This option is now the recommended approach to support multiple image directories by leveraging
  the training dataset loader. This also ensures two important aspects of data during calibration:
  
  * Data pre-processing in the INT8 calibration step is the same as in the training process.
  
  * The data batches are sampled randomly across the entire training dataset, thereby improving
    the accuracy of the INT8 model.

* **Option 2**: Pointing the tool to a directory of images that you want to use to calibrate 
  the model. For this option, make sure to create a sub-sampled directory of random images that
  best represent your training dataset.

FP16/FP32 Model
^^^^^^^^^^^^^^^

The :code:`calibration.bin` is only required if you need to run inference at INT8 precision. For
FP16/FP32 based inference, the export step is much simpler. All that is required is to provide
a :code:`.tlt` model from the training/retraining step to be converted into an :code:`.etlt`.


Exporting the Model
^^^^^^^^^^^^^^^^^^^

Here's an example of the :code:`tlt faster_rcnn export` command:

.. code::

    tlt faster_rcnn export [-h] -m <path to the .tlt model file generated by training>
                                -k <key>
                                --experiment_spec <path to experiment spec file>
                                [-o <path to output file>]
                                [--cal_data_file <path to tensor file>]
                                [--cal_image_dir <path to the directory images to calibrate the model]
                                [--cal_cache_file <path to output calibration file>]
                                [--data_type <data type for the TensorRT backend during export>]
                                [--batches <number of batches to calibrate over>]
                                [--max_batch_size <maximum trt batch size>]
                                [--max_workspace_size <maximum workspace size]
                                [--batch_size <batch size to TensorRT engine>]
                                [--engine_file <path to the TensorRT engine file>]
                                [--gen_ds_config]
                                [--verbose]
                                [--strict_type_constraints]
                                [--force_ptq]
                                [--gpu_index <gpu_index>]
                                [--log_file <log_file_path>]

Required Arguments
******************

* :code:`-m, --model`: Path to the :code:`.tlt` model file to be exported.
* :code:`-k, --key`: Key used to save the :code:`.tlt` model file.
* :code:`-e, --experiment_spec`: Path to the spec file.

Optional Arguments
******************

* :code:`-o, --output_file`: Path to save the exported model to. The default is :code:`./<input_file>.etlt`.
* :code:`--data_type`: Desired engine data type, generates calibration cache if in INT8 mode. The
  options are: {:code:`fp32`, :code:`fp16`, :code:`int8`} The default value is :code:`fp32`.
  If using INT8, the following INT8 arguments are required.
* :code:`--gen_ds_config`: A Boolean flag indicating whether to generate the
  partial DeepStream related configuration ("nvinfer_config.txt") as well as a label file ("labels.txt")
  in the same directory as the :code:`output_file`. Note that the config file is NOT a complete
  configuration file and requires the user to update the sample config files in DeepStream with the
  parameters generated.
* :code:`-s, --strict_type_constraints`: A Boolean flag to indicate whether or not to apply the
  TensorRT strict type constraints when building the TensorRT engine.
* :code:`--gpu_index`: The index of (discrete) GPUs used for exporting the model. We can specify the GPU index
  to run export if the machine has multiple GPUs installed. Note that export can only run on a
  single GPU.
* :code:`--log_file`: Path to the log file. Defaults to stdout.

INT8 Export Mode Required Arguments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* :code:`--cal_data_file`: The output tensorfile if used with :code:`--cal_image_dir`.
* :code:`--cal_image_dir`: Directory of images to use for calibration.

.. Note:: :code:`--cal_image_dir` parameter for images and applies the necessary preprocessing
          to generate a tensorfile at the path mentioned in the :code:`--cal_data_file`
          parameter which is in turn used for calibration. The number of batches in the
          tensorfile generated is obtained from the value set to the :code:`--batches` parameter,
          and the :code:`batch_size` is obtained from the value set to the :code:`--batch_size`
          parameter. Be sure that the directory mentioned in :code:`--cal_image_dir` has at least
          :code:`batch_size * batches` number of images in it. The valid image extensions are .jpg,
          .jpeg, and .png. In this case, the :code:`input_dimensions` of the calibration tensors
          are derived from the input layer of the :code:`.tlt` model.

INT8 Export Optional Arguments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* :code:`--cal_cache_file`: Path to save the calibration cache file. The default value is 
  :code:`./cal.bin`.
* :code:`--batches`: Number of batches to use for calibration and inference testing.The default
  value is :code:`10`.
* :code:`--batch_size`: Batch size to use for calibration. The default value is :code:`8`.
* :code:`--max_batch_size`: Maximum batch size of TensorRT engine. The default value is :code:`16`.
* :code:`--max_workspace_size`: Maximum workspace size of TensorRT engine. The default value is:
  :code:`1073741824(1<<30)`.
* :code:`--engine_file`: Path to the serialized TensorRT engine file. Note that this file is
  hardware specific, and cannot be generalized across GPUs. Useful to quickly test your model
  accuracy using TensorRT on the host. As TensorRT engine file is hardware specific, you cannot
  use this engine file for deployment unless the deployment GPU is identical to training GPU.
* :code:`--force_ptq`: A boolean flag to force post training quantization on the exported etlt
  model.

.. Note:: When exporting a model trained with QAT enabled, the tensor scale factors to calibrate
   the activations are peeled out of the model and serialized to a TensorRT readable cache file
   defined by the :code:`cal_cache_file` argument. However, do note that the current version of
   QAT doesn’t natively support DLA INT8 deployment in the Jetson device. In order to deploy
   this model on a Jetson with DLA INT8, use the :code:`--force_ptq` flag to use
   TensorRT post training quantization to generate the calibration cache file.

Exporting a Model
^^^^^^^^^^^^^^^^^

Here's a sample command using the data loader for loading calibration data to calibrate a 
FasterRCNN model using option 1.

.. code::

  tlt faster_rcnn export --gpu_index 0
                         -m $USER_EXPERIMENT_DIR/data/faster_rcnn/frcnn_kitti_resnet18_retrain.epoch12.tlt
                         -o $USER_EXPERIMENT_DIR/data/faster_rcnn/frcnn_kitti_resnet18_retrain_int8.etlt
                         -e $SPECS_DIR/default_spec_resnet18_retrain_spec.txt
                         -k nvidia_tlt
                         --data_type int8
                         --batch_size 8
                         --batches 10
                         --cal_cache_file $USER_EXPERIMENT_DIR/data/faster_rcnn/cal.bin

Deploying to DeepStream
-----------------------

.. _deploying_to_deepstream_fasterrcnn:

.. include:: ../excerpts/deploying_to_deepstream.rst

.. _here: https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html

TensorRT Open Source Software (OSS)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. _tensorrt_oss:

TensorRT OSS build is required for FasterRCNN models. This is required because several TensorRT
plugins that are required by these models are only available in TensorRT open source repo and not
in the general TensorRT release. Specifically, for FasterRCNN, we need the :code:`cropAndResizePlugin` and
:code:`proposalPlugin`.

If the deployment platform is x86 with NVIDIA GPU, follow instructions for x86. If your
deployment is on NVIDIA Jetson platform, follow instructions for Jetson.

TensorRT OSS on x86
*******************

.. include:: ../excerpts/tensorrt_oss_on_x86.rst

TensorRT OSS on Jetson (ARM64)
******************************

.. include:: ../excerpts/tensorrt_oss_on_jetson_arm64.rst

Generating an Engine Using tlt-converter
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. _generating_an_engine_using_tlt-converter:

.. include:: ../excerpts/generating_an_engine_using_tlt-converter.rst

Instructions for x86
********************

1. Copy :code:`/opt/nvidia/tools/tlt-converter` to the target machine.
2. Install `TensorRT`_ for the respective target machine.
3. For FasterRCNN, we need to build `TensorRT Open source software`_ on the machine.
   Instructions to build TensorRT OSS on x86 can be found in :ref:`TensorRT OSS on x86<tensorrt_oss_on_x86>`
   section above or in this `GitHub repo`_.
4. Run :code:`tlt-converter` using the sample command below and generate the engine.

.. _TensorRT: https://developer.nvidia.com/tensorrt
.. _TensorRT Open source software: https://github.com/NVIDIA/TensorRT
.. _GitHub repo: https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps

Instructions for Jetson
***********************

For the Jetson platform, the :code:`tlt-converter` is available to download in the `dev zone`_.
Once the :code:`tlt-converter` is downloaded, follow the instructions below to generate a
TensorRT engine.

.. _dev zone: https://developer.nvidia.com/tlt-converter-trt71

1. Unzip :code:`tlt-converter-trt7.1.zip` on the target machine.
2. Install the open ssl package using the command:

   .. code::

      sudo apt-get install libssl-dev

3. Export the following environment variables:

.. code::

   $ export TRT_LIB_PATH=”/usr/lib/aarch64-linux-gnu”
   $ export TRT_INC_PATH=”/usr/include/aarch64-linux-gnu”

4. For Jetson devices, TensorRT comes pre-installed with `Jetpack`_. If you are using older
   JetPack, upgrade to the latest one that :code:`tlt-converter` can support.
5.  For FasterRCNN, instructions to build TensorRT OSS on Jetson can
    be found in :ref:`TensorRT OSS on Jetson (ARM64) <tensorrt_oss_on_jetson_arm64>` section above or
    in this
    `GitHub repo`_.

6.  Run the :code:`tlt-converter` using the sample command below and generate the engine.

.. Note:: Make sure to follow the output node names as mentioned in
          :ref:`Exporting the Model<exporting_the_model>`.

.. _Jetpack: https://developer.nvidia.com/embedded/jetpack

Using the tlt-converter
***********************

.. code::

    tlt-converter [-h] -k <encryption_key> 
                       -d <input_dimensions>
                       -o <comma separated output nodes>
                       [-c <path to calibration cache file>]
                       [-e <path to output engine>]
                       [-b <calibration batch size>] 
                       [-m <maximum batch size of the TRT engine>]
                       [-t <engine datatype>]
                       [-w <maximum workspace size of the TRT Engine>] 
                       [-i <input dimension ordering>]
                       [-p <optimization_profiles>]
                       [-s]
                       [-u <DLA_core>]
                       input_file

Required Arguments
~~~~~~~~~~~~~~~~~~

* :code:`input_file`: Path to the :code:`.etlt` model exported using :code:`export`.
* :code:`-k`: The key used to encode the :code:`.tlt` model when doing the traning.
* :code:`-d`: Comma-separated list of input dimensions that should match the dimensions used for
  :code:`export`. Unlike :code:`export` this cannot be inferred from calibration data. This
  parameter is not required for new models introduced in TLT 3.0 (e.g., LPRNet, UNet, GazeNet, etc).
* :code:`-o`: Comma-separated list of output blob names that should match the output configuration
  used for :code:`export`. This parameter is not required for new models introduced in TLT
  3.0 (e.g., LPRNet, UNet, GazeNet, etc). For FasterRCNN, set this argument to :code:`NMS`.

Optional Arguments
~~~~~~~~~~~~~~~~~~

* :code:`-e`: Path to save the engine to. (default: :code:`./saved.engine`)
* :code:`-t`: Desired engine data type, generates calibration cache if in INT8 mode. The default
  value is :code:`fp32`. The options are {:code:`fp32`, :code:`fp16`, :code:`int8`}.
* :code:`-w`: Maximum workspace size for the TensorRT engine. The default value is :code:`1073741824(1<<30)`.
* :code:`-i`: Input dimension ordering, all other TLT commands use NCHW. The default value is
  :code:`nchw`. The options are {:code:`nchw`, :code:`nhwc`, :code:`nc`}. For FasterRCNN, we can omit it (defaults to :code:`nchw`).
* :code:`-p`: Optimization profiles for :code:`.etlt` models with dynamic shape. Comma separated
  list of optimization profile shapes in the format :code:`<input_name>,<min_shape>,<opt_shape>,<max_shape>`,
  where each shape has the format: :code:`<n>x<c>x<h>x<w>`. Can be specified multiple times if there are
  multiple input tensors for the model. This is only useful for new models introduced in TLT 3.0.
  This parameter is not required for models that are already existed in TLT 2.0.
* :code:`-s`: TensorRT strict type constraints. A Boolean to apply TensorRT strict type constraints
  when building the TensorRT engine.
* :code:`-u`: Use DLA core. Specifying DLA core index when building the TensorRT engine on Jetson devices.

INT8 Mode Arguments
~~~~~~~~~~~~~~~~~~~

* :code:`-c`: Path to calibration cache file, only used in INT8 mode. The default value is
  :code:`./cal.bin`.
* :code:`-b`: Batch size used during the export step for INT8 calibration cache generation.
  (default: :code:`8`).
* :code:`-m`: Maximum batch size for TensorRT engine.(default: :code:`16`). If meet with out-of-memory
  issue, please decrease the batch size accordingly. This parameter is not required for :code:`.etlt`
  models generated with dynamic shape (This is only possible for new models introduced in TLT 3.0).

Sample Output Log
~~~~~~~~~~~~~~~~~

Here is a sample log for exporting a FasterRCNN model.

.. code::

    tlt-converter -d 3,544,960 \
                  -k nvidia_tlt \
                  -o NMS \
                  /workspace/tlt-experiments/faster_rcnn/resnet18_pruned.epoch45.etlt
    ..
    [INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
    [INFO] Detected 1 inputs and 2 output network tensors.

Integrating the model to DeepStream
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. _integrating_the_model_to_deepstream:

There are 2 options to integrate models from TLT with DeepStream:

* **Option 1**: Integrate the model (.etlt) with the encrypted key directly in the DeepStream app.
  The model file is generated by export.
* **Option 2**: Generate a device specific optimized TensorRT engine using tlt-converter. The
  TensorRT engine file can also be ingested by DeepStream.

For FasterRCNN, we will need to build TensorRT Open source plugins and custom bounding
box parser. The instructions are provided below in the TensorRT OSS section above and the
required code can be found in this `GitHub repo`_.

.. _GitHub repo: https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps

In order to integrate the models with DeepStream, you need the following:

1. Download_ and install DeepStream SDK. The installation instructions for DeepStream are provided
   in the `DeepStream Development Guide`_.
2. An exported :code:`.etlt` model file and optional calibration cache for INT8 precision.
3. `TensorRT OSS Plugins`_ .
4. A :code:`labels.txt` file containing the labels for classes in the order in which the networks
   produces outputs.
5. A sample :code:`config_infer_*.txt` file to configure the nvinfer element in DeepStream.
   The nvinfer element handles everything related to TensorRT optimization and engine creation
   in DeepStream.

.. _Download: https://developer.nvidia.com/deepstream-download
.. _DeepStream Development Guide: https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html
.. _TensorRT OSS Plugins : https://github.com/NVIDIA/TensorRT/tree/21.03

DeepStream SDK ships with an end-to-end reference application which is fully configurable. Users
can configure input sources, inference model, and output sinks. The app requires a primary object
detection model, followed by an optional secondary classification model. The reference
application is installed as :code:`deepstream-app`. The graphic below shows the architecture of the
reference application.

.. image:: ../../content/arch_ref_appl.png

There are typically 2 or more configuration files that are used with this app. In the install
directory, the config files are located in :code:`samples/configs/deepstream-app` or
:code:`sample/configs/tlt_pretrained_models`. The main config file configures all the high level
parameters in the pipeline above. This would set input source and resolution, number of
inferences, tracker, and output sinks. The other supporting config files are for each individual
inference engine. The inference specific config files are used to specify models, inference
resolution, batch size, number of classes and other customization. The main config file will call
all the supporting config files. Here are some config files in
:code:`samples/configs/deepstream-app` for your reference.

* :code:`source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt`: Main config file

* :code:`config_infer_primary.txt`: Supporting config file for primary detector in the pipeline
  above

* :code:`config_infer_secondary_*.txt`: Supporting config file for secondary classifier in the
  pipeline above

The :code:`deepstream-app` will only work with the main config file. This file will most likely
remain the same for all models and can be used directly from the DeepStream SDK will little to no
change. User will only have to modify or create :code:`config_infer_primary.txt` and
:code:`config_infer_secondary_*.txt`.

Integrating a FasterRCNN Model
******************************

To run a FasterRCNN model in DeepStream, you need a label file and a DeepStream configuration
file. In addition, you need to compile the TensorRT Open source software and FasterRCNN
bounding box parser for DeepStream.

A DeepStream sample with documentation on how to run inference using the trained FasterRCNN
models from TLT is provided on GitHub here_.

Prerequisite for FasterRCNN Model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. FasterRCNN requires the cropAndResizePlugin_ and the proposalPlugin_. This plugin is available
   in the TensorRT open source repo. Detailed instructions to build
   TensorRT OSS can be found in `TensorRT Open Source Software (OSS)`_.
2. FasterRCNN requires custom bounding box parsers that are not built-in inside the DeepStream
   SDK. The source code to build custom bounding box parsers for FasterRCNN is available
   here_. The following instructions can be used to build bounding box parser:

.. _cropAndResizePlugin: https://github.com/NVIDIA/TensorRT/tree/21.03/plugin/cropAndResizePlugin
.. _proposalPlugin: https://github.com/NVIDIA/TensorRT/tree/21.03/plugin/proposalPlugin
.. _TensorRT Open Source Software (OSS): https://github.com/NVIDIA/TensorRT

**Step 1**: Install git-lfs_ (git >= 1.8.2)

.. _git-lfs: https://github.com/git-lfs/git-lfs/wiki/Installation

.. code::

    curl -s https://packagecloud.io/install/repositories/github/git-lfs/
    script.deb.sh | sudo bash
    sudo apt-get install git-lfs
    git lfs install

**Step 2**: Download Source Code with SSH or HTTPS

.. code::

    git clone -b release/tlt3.0 https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps

**Step 3**: Build

.. code::

    // or Path for DS installation
    export CUDA_VER=10.2         // CUDA version, e.g. 10.2
    make

This generates :code:`libnvds_infercustomparser_tlt.so` in the directory :code:`post_processor`.

Label File
~~~~~~~~~~

The label file is a text file containing the names of the classes that the FasterRCNN model is
trained to detect. The order in which the classes are listed here must match the order in which
the model predicts the output. This order is derived from the order the objects are instantiated
in the :code:`target_class_mapping` field of the FasterRCNN experiment specification file.
During the training, TLT FasterRCNN will make all the class names in lower case and sort them in
alphabetical order. For example, if the :code:`target_class_mapping` label file is:

.. code::

    target_class_mapping {
        key: "car"
        value: "car"
      }
      target_class_mapping {
        key: "person"
        value: "person"
      }
      target_class_mapping {
        key: "bicycle"
        value: "bicycle"
      }

The actual class name list is :code:`bicycle`, :code:`car`, :code:`person`. The example of the
corresponding :code:`label_file_frcnn.txt` file is (we always append a :code:`background` class at
the end):

.. code::

    bicycle
    car
    person
    background

.. Note:: If :code:`--gen_ds_config` is provided during TLT export of a FasterRCNN model, then a
   label file named :code:`labels.txt` will be generated automatically. Without knowing the
   above details, the :code:`labels.txt` file can be used directly in DeepStream inference.


DeepStream Configuration File
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The detection model is typically used as a primary inference engine. It can also be used as a
secondary inference engine. To run this model in the sample :code:`deepstream-app`, you must
modify the existing :code:`config_infer_primary.txt` file to point to this model as well as
the custom parser.

.. image:: ../../content/dstream_deploy_options3.png

**Option 1**: Integrate the model (:code:`.etlt`) directly in the DeepStream app.

For this option, users will need to add the following parameters in the configuration file.
The :code:`int8-calib-file` is only required for INT8 precision.

.. code::

    tlt-encoded-model=<TLT exported .etlt>
    tlt-model-key=<Model export key>
    int8-calib-file=<Calibration cache file>

The :code:`tlt-encoded-model` parameter points to the exported model (:code:`.etlt`) from TLT. The
:code:`tlt-model-key` is the encryption key used during model export.

**Option 2**: Integrate TensorRT engine file with DeepStream app.

**Step 1**: Generate TensorRT engine using tlt-converter. See the :ref:`Generating an engine using
tlt-converter <generating_an_engine_using_tlt-converter>` section above for detailed instructions.

**Step 2**: Once the engine file is generated successfully, modify the following parameters to
use this engine with DeepStream.

.. code::

    model-engine-file=<PATH to generated TensorRT engine>

All other parameters are common between the 2 approaches. To use the custom bounding box parser
instead of the default parsers in DeepStream, modify the following parameters in [property]
section of primary infer configuration file:

.. code::

    parse-bbox-func-name=NvDsInferParseCustomNMSTLT
    custom-lib-path=<PATH to libnvds_infercustomparser_tlt.so>

Add the label file generated above using:

.. code::

    labelfile-path=<Classification labels>

For all the options, see the configuration file below. To learn about what all the parameters
are used for, refer to `DeepStream Development Guide`_.

Here's a sample config file, :code:`config_infer_primary.txt`:

.. code::

    [property]
    gpu-id=0
    net-scale-factor=1.0
    offsets=<image mean values as in the training spec file> # e.g.: 103.939;116.779;123.68
    model-color-format=1
    labelfile-path=<Path to frcnn_labels.txt>
    tlt-encoded-model=<Path to FasterRCNN model>
    tlt-model-key=<Key to decrypt the model>
    infer-dims=<c;h;w> # e.g., 3;544;960 Where c = number of channels, h = height of the model input, w = width of model input
    uff-input-order=0
    uff-input-blob-name=<input_blob_name> # e.g.: input_image
    batch-size=<batch size> e.g.: 1
    ## 0=FP32, 1=INT8, 2=FP16 mode
    network-mode=0
    num-detected-classes=<number of classes to detect(including background)> # 
    e.g.: 5
    interval=0
    gie-unique-id=1
    is-classifier=0
    #network-type=0
    output-blob-names=<output_blob_names> e.g.: NMS
    parse-bbox-func-name=NvDsInferParseCustomNMSTLT
    custom-lib-path=<PATH to libnvds_infercustomparser_tlt.so>

    [class-attrs-all]
    pre-cluster-threshold=0.6
    roi-top-offset=0
    roi-bottom-offset=0
    detected-min-w=0
    detected-min-h=0
    detected-max-w=0
    detected-max-h=0

.. Note:: If :code:`--gen_ds_config` is provided during TLT export of a FasterRCNN model, then a
   config file named :code:`nvinfer_config.txt` will be generated automatically. This file is an
   incomplete config file for DeepStream inference; you should copy and paste available fields 
   in this partial config file to you own complete config file.