DetectNet_v2 ============ .. _detectnet_v2: `DetectNet_v2` is an NVIDIA-developed object-detection model that is included in the Transfer Learning Toolkit (TLT). `DetectNet_v2` supports the following tasks: * dataset_convert * train * evaluate * inference * prune * calibration_tensorfile * export These tasks can be invoked from the TLT launcher using the following convention on the command-line: .. code:: tlt detectnet_v2 where, :code:`args_per_subtask` are the command-line arguments required for a given subtask. Each subtask is explained in detail in the following sections. NVIDIA recommends following the workflow in the diagram below to generate a trained and optimized `DetectNet_v2` model. .. image:: ../../content/tlt_workflow_detectnet_v2.png Data Input for Object Detection ------------------------------- The object detection apps in TLT expect data in KITTI format for training and evaluation. See the :ref:`Data Annotation Format ` page for more information about the KITTI data format. Pre-processing the Dataset -------------------------- .. _conversion_to_tfrecords_detectnet_v2: The `DetectNet_v2` app requires the KITTI formatted data to be converted to TFRecords for optimized iteration across the data batches. This can be done using the :code:`dataset_convert` subtask under `DetectNet_v2`. The :code:`dataset_convert` tool requires a configuration file as input. Details of the configuration file and examples are included in the following sections. Configuration File for Dataset Converter ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The :code:`dataset_convert` tool provides several configurable parameters. The parameters are encapsulated in a spec file to convert data from the KITTI format to the TFRecords format which the `DetectNet_v2` trainer can ingest. This is a prototxt format file with two global parameters: * :code:`kitti_config`: A nested prototxt configuration with multiple input parameters * :code:`image_directory_path`: The path to the dataset root. The :code:`image_dir_name` is appended to this path to get the input images and must be the same path specified in the experiment spec file. Here are descriptions of the configurable parameters for the :code:`kitti_config` field: +--------------------------------+--------------+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+ | **Parameter** | **Datatype** | **Default** | **Description** | **Supported Values** | +================================+==============+===================================+====================================================================================================================================================+================================+ | `root_directory_path` | string | -- | The path to the dataset root directory | -- | +--------------------------------+--------------+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+ | `image_dir_name` | string | -- | The relative path to the directory containing images from the path in `root_directory_path`. | -- | +--------------------------------+--------------+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+ | `label_dir_name` | string | -- | The relative path to the directory containing labels from the path in `root_directory_path`. | -- | +--------------------------------+--------------+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+ | `partition_mode` | string | -- | The method employed when partitioning the data to multiple folds. Two methods are supported: | | | | | | | | | | | | * Random partitioning: The data is divided in to 2 folds, `train` and `val`. This mode requires that the `val_split` parameter be set. | * random | | | | | | | | | | | * Sequence-wise partitioning: The data is divided into `n` partitions (defined by the `num_partitions` parameter) | * sequence | | | | | based on the number of sequences available. | | | | | | | | +--------------------------------+--------------+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+ | `num_partitions` | int | `2` (if partition_mode is random) | The number of partitions to use to split the data (`N` folds). This field is ignored when the partition model is set to random, as by default only | | | | | | two partitions are generated: `val` and `train`. In sequence mode, the data is split into n-folds. The number of partitions is ideally fewer | `n=2` for random partition | | | | | than the total number of sequences in the `kitti_sequence_to_frames` file. | n< number of sequences in the | | | | | | `kitti_sequence_to_frames_file`| +--------------------------------+--------------+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+ | `image_extension` | str | `.png` | The extension of the images in the `image_dir_name` parameter. | | | | | | | .png | | | | | | .jpg | | | | | | .jpeg | +--------------------------------+--------------+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+ | `val_split` | float | `20` | The percentage of data to be separated for validation. This only works under “random” partition mode. This partition is available in fold `0` of | 0-100 | | | | | the TFrecords generated. Set the validation fold to `0` in the `dataset_config`. | | +--------------------------------+--------------+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+ | `kitti_sequence_to_frames_file`| str | | The name of the KITTI sequence to frame mapping file. This file must be present within the dataset root as mentioned in the `root_directory_path`. | | +--------------------------------+--------------+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+ | `num_shards` | int | `10` | The number of shards per fold. | 1-20 | +--------------------------------+--------------+-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------+ The sample configuration file shown below converts the Pascal VOC dataset with 80% training data and 20% validation data. This assumes that the data has been converted to KITTI format and is available for ingestion in the root directory path. .. code:: kitti_config { root_directory_path: "/workspace/tlt-experiments/data/VOCtrainval_11-May-2012/VOCdevkit/VOC2012" image_dir_name: "JPEGImages_kitti/test" label_dir_name: "Annotations_kitti/test" image_extension: ".jpg" partition_mode: "random" num_partitions: 2 val_split: 20 num_shards: 10 } image_directory_path: "/workspace/tlt-experiments/data/VOCtrainval_11-May-2012/VOCdevkit/VOC2012" Sample Usage of the Dataset Converter Tool ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. _sample_usage_of_the_dataset_converter_tool_detectnet_v2: While KITTI is the accepted dataset format for object detection, the `DetectNet_v2` trainer requires this data to be converted to TFRecord files for ingestion. The :code:`dataset_convert` tool is described below: .. code:: tlt detectnet_v2 dataset-convert [-h] -d DATASET_EXPORT_SPEC -o OUTPUT_FILENAME [-f VALIDATION_FOLD] You can use the follwoing optional arguments: * :code:`-h, --help`: Show this help message and exit * :code:`-d, --dataset-export-spec`: The path to the detection dataset spec containing the config for exporting :code:`.tfrecord` files * :code:`-o output_filename`: The output filename * :code:`-f, –validation-fold`: The validation fold in 0-based indexing. This is required when modifying the training set, but otherwise optional. The following example shows how to use the command with the dataset: .. code:: tlt detectnet_v2 dataset_convert [-h] -d -o The following is the output log from executing :code:`tlt detectnet_v2 dataset_convert`: .. code:: Using TensorFlow backend. 2019-07-16 01:30:59,073 - iva.detectnet_v2.dataio.build_converter - INFO - Instantiating a kitti converter 2019-07-16 01:30:59,243 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Num images in Train: 10786 Val: 2696 2019-07-16 01:30:59,243 - iva.detectnet_v2.dataio.kitti_converter_lib - INFO - Validation data in partition 0. Hence, while choosing the validation set during training choose validation_fold 0. 2019-07-16 01:30:59,251 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 0 /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:265: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default. 2019-07-16 01:31:01,226 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 0, shard 1 . . sheep: 242 bottle: 205 .. boat: 171 car: 418 2019-07-16 01:31:20,772 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 0 .. 2019-07-16 01:32:40,338 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Writing partition 1, shard 9 2019-07-16 01:32:49,063 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Wrote the following numbers of objects: sheep: 695 .. car: 1770 2019-07-16 01:32:49,064 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Cumulative object statistics 2019-07-16 01:32:49,064 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Wrote the following numbers of objects: sheep: 937 .. car: 2188 2019-07-16 01:32:49,064 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Class map. Label in GT: Label in tfrecords file sheep: sheep .. boat: boat For the dataset_config in the experiment_spec, please use labels in the tfrecords file, while writing the classmap. 2019-07-16 01:32:49,064 - iva.detectnet_v2.dataio.dataset_converter_lib - INFO - Tfrecords generation complete. .. Note:: The :code:`dataset_convert` tool converts the class names in the KITTI-formatted data files to lowercase characters. Therefore, when configuring a training experiment, ensure that lowercase class names are used in the `dataset_config` section under target class mapping. Using incorrect class names in the `dataset_config` section can cause invalid training experiments with 0 mAP. .. Note:: When using the :code:`dataset_convert` tool to create separate TFRecords for evaluation, which may be defined under :code:`dataset_config` using the parameter :code:`validation_data_source`, we recommend setting the :code:`partition_mode` to random with 2 partitions and an arbitrary :code:`val_split` (1-100). The dataloader takes care of traversing through all the folds and generating the mAP accordingly. Creating a Configuration File ----------------------------- .. _creating_a_configuration_file_detectnet_v2: To perform training, evaluation, and inference for `DetectNet_v2`, you need to configure several components, each with their own parameters. The :code:`train` and :code:`evaluate` tasks for a `DetectNet_v2` experiment share the same configuration file. The :code:`inference` task uses a separate configuration file. The specification file for `DetectNet_v2` training configures these components of the training pipe: * Model * BBox ground truth generation * Post processing module * Cost function configuration * Trainer * Augmentation model * Evaluator * Dataloader Model Config ^^^^^^^^^^^^ .. _model_config_detectnet_v2: The core object-detection model can be configured using the :code:`model_config` option in the spec file. The following is a sample model config to instantiate a ResNet-18 model with pretrained weights and freeze blocks 0 and 1 with all shortcuts set to projection layers. .. code:: # Sample model config for to instantiate a resnet18 model with pretrained weights and freeze blocks 0, 1 # with all shortcuts having projection layers. model_config { arch: "resnet" pretrained_model_file: freeze_blocks: 0 freeze_blocks: 1 all_projections: True num_layers: 18 use_pooling: False use_batch_norm: True dropout_rate: 0.0 objective_set: { cov {} bbox { scale: 35.0 offset: 0.5 } } } The following table describes the :code:`model_config` parameters: +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | **Parameter** | **Datatype** | **Default** | **Description** | **Supported Values** | +=======================+==================+=============+=================================================================================================================================================+========================================================================================================================================================+ | `all_projections` | bool | `False` | For templates with shortcut connections, this parameter defines whether or not all shortcuts should be instantiated with 1x1 | `True` or `False` (only to be used in ResNet templates) | | | | | projection layers, irrespective of whether there is a change in stride across the input and output. | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | `arch` | string | `resnet` | The architecture of the backbone feature extractor to be used for training. | - `resnet` | | | | | | - `vgg` | | | | | | - `mobilenet_v1` | | | | | | - `mobilenet_v2` | | | | | | - `googlenet` | | | | | | | | | | | | | | | | | | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | `num_layers` | int | `18` | The depth of the feature extractor for scalable templates. | - `resnet`: 10, 18, 34, 50, 101 | | | | | | - `vgg`: 16, 19 | | | | | | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | pretrained model file | string | -- | This parameter defines the path to a pretrained TLT model file. If the :code:`load_graph flag` is set to :code:`false`, it is assumed that only | Unix path | | | | | the weights of the pretrained model file is to be used. In this case, TLT train constructs the feature extractor graph in the | | | | | | experiment and loads the weights from the pretrained model file that has matching layer names. Thus, transfer learning across different | | | | | | resolutions and domains are supported. For layers that may be absent in the pretrained model, the tool initializes them with | | | | | | random weights and skips the import for that layer. | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | `use_pooling` | Boolean | `False` | Choose between using strided convolutions or MaxPooling while downsampling. When `True`, MaxPooling is used to downsample; however, | `True` or `False` | | | | | for the object-detection network, NVIDIA recommends setting this to `False` and using strided convolutions. | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | `use_batch_norm` | Boolean | `False` | A flag to determine whether to use Batch Normalization layers or not. | `True` or `False` | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | `objective_set` | Proto Dictionary | -- | The objectives for training the network. For object-detection networks, set it to learn `cov` and `bbox`. These | `cov {} bbox { scale: 35.0 offset: 0.5 }` | | | | | parameters should not be altered for the current training pipeline. | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | `dropout_rate` | Float | `0.0` | Probability for drop out, | `0.0-0.1` | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | `load_graph` | Boolean | `False` | A flag to determine whether or not to load the graph from the pretrained model file, or just the weights. For a pruned model, | `True` or `False` | | | | | set this parameter to `True`. Pruning modifies the original graph, so the pruned model graph and the weights need to be imported. | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | `freeze_blocks` | float | -- | This parameter defines which blocks may be frozen from the instantiated feature extractor template, and is different for different | * **ResNet series**: For the ResNet series, the block ID's valid for freezing is any subset of [0, 1, 2, 3, 4](inclusive). | | | (repeated) | | feature extractor templates. | * **VGG series**: For the VGG series, the block ID's valid for freezing is any subset of [1, 2, 3, 4, 5](inclusive). | | | | | | * **MobileNet V1**: For the MobileNet V1, the block ID's valid for freezing is any subset of [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11](inclusive). | | | | | | * **MobileNet V2**: For the MobileNet V2, the block ID's valid for freezing is any subset of [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13](inclusive).| | | | | | * **GoogLeNet**: For the GoogLeNet, the block ID's valid for freezing is any subset of [0, 1, 2, 3, 4, 5, 6, 7](inclusive). | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | `freeze_bn` | Boolean | `False` | A flag to determine whether to freeze the Batch Normalization layers in the model during training. | `True` or `False` | | | | | | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ BBox Ground Truth Generator ^^^^^^^^^^^^^^^^^^^^^^^^^^^ `DetectNet_v2` generates 2 tensors, `cov` and `bbox`. The image is divided into 16x16 grid cells. The `cov` tensor (short for "coverage" tensor) defines the number of grid cells that are covered by an object. The `bbox` tensor defines the normalized image coordinates of the object top left (x1, y1) and bottom right (x2, y2) with respect to the grid cell. For best results, you can assume the coverage area to be an ellipse within the `bbox` label with the maximum confidence assigned to the cells in the center and reducing coverage outwards. Each class has its own coverage and `bbox` tensor, thus the shape of the tensors are as follows: * cov: Batch_size, Num_classes, image_height/16, image_width/16 * bbox: Batch_size, Num_classes * 4, image_height/16, image_width/16 (where 4 is the number of coordinates per cell) Here is a sample rasterizer config for a 3 class detector: .. code:: # Sample rasterizer configs to instantiate a 3 class bbox rasterizer bbox_rasterizer_config { target_class_config { key: "car" value: { cov_center_x: 0.5 cov_center_y: 0.5 cov_radius_x: 0.4 cov_radius_y: 0.4 bbox_min_radius: 1.0 } } target_class_config { key: "cyclist" value: { cov_center_x: 0.5 cov_center_y: 0.5 cov_radius_x: 0.4 cov_radius_y: 0.4 bbox_min_radius: 1.0 } } target_class_config { key: "pedestrian" value: { cov_center_x: 0.5 cov_center_y: 0.5 cov_radius_x: 0.4 cov_radius_y: 0.4 bbox_min_radius: 1.0 } } deadzone_radius: 0.67 } The `bbox_rasterizer` has the following parameters that are configurable: +----------------------+------------------+-------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------+ | **Parameter** | **Datatype** | **Default** | **Description** | **Supported Values** | +======================+==================+=============+==========================================================================================================================+===============================+ | `deadzone radius` | float | `0.67` | The area to be considered dormant (or area of no bbox) around the ellipse of an object. This is particularly useful | `0-1.0` | | | | | in cases of overlapping objects so that foreground objects and background objects are not confused. | | +----------------------+------------------+-------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------+ | `target_class_config`| proto dictionary | | This is a nested configuration field that defines the coverage region for an object of a given class. For each class, | * `cov_center_x: 0.0 - 1.0` | | | | | this field is repeated. The following are configurable parameters for the `target_class_config`: | * `cov_center_y: 0.0 - 1.0` | | | | | | * `cov_radius_x: 0.0 - 1.0` | | | | | | * `cov_radius_y: 0.0 - 1.0` | | | | | | * `bbox_min_radius: 0.0 - 1.0`| | | | | * `cov_center_x (float)`: x-coordinate of the center of the object | | | | | | * `cov_center_y (float)`: y-coordinate of the center of the object | | | | | | * `cov_radius_x (float)`: x-radius of the coverage ellipse | | | | | | * `cov_radius_y (float)`: y-radius of the coverage ellipse | | | | | | * `bbox_min_radius (float)`: The minimum radius of the coverage region to be drawn for boxes | | +----------------------+------------------+-------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------+ Post-Processor ^^^^^^^^^^^^^^ .. _postprocessor: The post-processor module generates renderable bounding boxes from the raw detection output. The process includes the following: * Filtering out valid detections by thresholding objects using the confidence value in the coverage tensor. * Clustering the raw filtered predictions using DBSCAN to produce the final rendered bounding boxes. * Filtering out weaker clusters based on the final confidence threshold derived from the candidate boxes that get grouped into a cluster. Here is an example of the definition of the post-processor for a 3-class network learning for **car**, **cyclist**, and **pedestrian**: .. code:: postprocessing_config { target_class_config { key: "car" value: { clustering_config { coverage_threshold: 0.005 dbscan_eps: 0.15 dbscan_min_samples: 0.05 minimum_bounding_box_height: 20 } } } target_class_config { key: "cyclist" value: { clustering_config { coverage_threshold: 0.005 dbscan_eps: 0.15 dbscan_min_samples: 0.05 minimum_bounding_box_height: 20 } } } target_class_config { key: "pedestrian" value: { clustering_config { coverage_threshold: 0.005 dbscan_eps: 0.15 dbscan_min_samples: 0.05 minimum_bounding_box_height: 20 } } } } This section defines parameters that configure the post-processor. For each class that you can train for, the :code:`postprocessing_config` has a :code:`target_class_config` element that defines the clustering parameters for this class. The parameters for each target class include the following: +---------------+--------------------------+-------------+----------------------------------------------------------------------------------------+----------------------------------------------------+ | **Parameter** | **Datatype** | **Default** | **Description** | **Supported Values** | +===============+==========================+=============+========================================================================================+====================================================+ | `key` | string | -- | The name of the class for which the post processor module is being configured | The network object class name, which is mentioned | | | | | | in the `cost_function_config`. | +---------------+--------------------------+-------------+----------------------------------------------------------------------------------------+----------------------------------------------------+ | `value` | clustering _config proto | -- | The nested clustering-config proto parameter that configures the postprocessor module. | Encapsulated object with parameters defined below. | | | | | The parameters for this module are defined in the next table. | | +---------------+--------------------------+-------------+----------------------------------------------------------------------------------------+----------------------------------------------------+ The :code:`clustering_config` element configures the clustering block for this class. Here are the parameters for this element: +------------------------------+--------------+-------------+-------------------------------------------------------------------------------------------------------------+--------------------------+ | **Parameter** | **Datatype** | **Default** | **Description** | **Supported Values** | +------------------------------+--------------+-------------+-------------------------------------------------------------------------------------------------------------+--------------------------+ | `coverage_threshold` | float | -- | The minimum threshold of the coverage tensor output to be considered a valid candidate box for | `0.0 - 1.0` | | | | | clustering. The four coordinates from the bbox tensor at the corresponding indices are passed for | | | | | | clustering. | | +------------------------------+--------------+-------------+-------------------------------------------------------------------------------------------------------------+--------------------------+ | `dbscan_eps` | float | -- | The maximum distance between two samples for one to be considered in the neighborhood of the other. | `0.0 - 1.0` | | | | | This is not a maximum bound on the distances of points within a cluster. The greater the `dbscan_eps` value,| | | | | | the more boxes are grouped together. | | +------------------------------+--------------+-------------+-------------------------------------------------------------------------------------------------------------+--------------------------+ | `dbscan_min_samples` | float | -- | The total weight in a neighborhood for a point to be considered as a core point. This includes the | `0.0 - 1.0` | | | | | point itself. | | +------------------------------+--------------+-------------+-------------------------------------------------------------------------------------------------------------+--------------------------+ | `minimum_bounding_box_height`| int | -- | The minimum height in pixels to consider as a valid detection post clustering. | 0 - input image height | +------------------------------+--------------+-------------+-------------------------------------------------------------------------------------------------------------+--------------------------+ | `clustering_algorithm` | enum | `DBSCAN` | Defines the post-processing algorithm to cluter raw detections to the final `bbox` render. | `DBSCAN`, `NMS`, `HYBRID`| | | | | When using `HYBRID` mode, ensure both `DBSCAN` and `NMS` configuration parameters are defined. | | +------------------------------+--------------+-------------+-------------------------------------------------------------------------------------------------------------+--------------------------+ | `dbscan_confidence_threshold`| float | `0.1` | The confidence threshold used to filter out the clustered bounding box output from DBSCAN. | `> 0.0` | +------------------------------+--------------+-------------+-------------------------------------------------------------------------------------------------------------+--------------------------+ | `nms_iou_threshold` | float | `0.2` | The Intersection Over Union (IOU) threshold to filter out redundant boxes from raw detections | `(0.0 - 1.0)` | | | | | to form final clustered outputs. | | +------------------------------+--------------+-------------+-------------------------------------------------------------------------------------------------------------+--------------------------+ | `nms_confidence_threshold` | float | `0.` | The confidence threshold to filter out clustered bounding boxes from NMS. | `0.0 - 1.0` | +------------------------------+--------------+-------------+-------------------------------------------------------------------------------------------------------------+--------------------------+ In TLT 3.0, `DetectNet_v2` supports three methods for clustering raw detections for the network in final rendered bounding boxes. * `DBSCAN`: Density Based Spatial Clustering of Application * `NMS`: Non-Maximal suppression * `HYDRID`: DBSCAN + NMS Under `HYBRID` clustering, `DetectNet_v2` post-processing first passes the raw network outputs to the `DBSCAN` clustering and uses the candidate boxes per cluster from `DBSCAN` as input to `NMS`. The `NMS` clustering generates the final rendered boxes. .. Note:: For `HYBRID` clustering, ensure both `DBSCAN` and `NMS` related parameters are defined in the post-processing config. Cost Function ^^^^^^^^^^^^^ This section describes how to configure the cost function to include the classes that you are training for. For each class you want to train, add a new entry for the target classes to the spec file. We recommend not changing the parameters within the spec file for best performance with these classes. The other parameters here should remain unchanged. .. code:: cost_function_config { target_classes { name: "car" class_weight: 1.0 coverage_foreground_weight: 0.05 objectives { name: "cov" initial_weight: 1.0 weight_target: 1.0 } objectives { name: "bbox" initial_weight: 10.0 weight_target: 10.0 } } target_classes { name: "cyclist" class_weight: 1.0 coverage_foreground_weight: 0.05 objectives { name: "cov" initial_weight: 1.0 weight_target: 1.0 } objectives { name: "bbox" initial_weight: 10.0 weight_target: 1.0 } } target_classes { name: "pedestrian" class_weight: 1.0 coverage_foreground_weight: 0.05 objectives { name: "cov" initial_weight: 1.0 weight_target: 1.0 } objectives { name: "bbox" initial_weight: 10.0 weight_target: 10.0 } } enable_autoweighting: True max_objective_weight: 0.9999 min_objective_weight: 0.0001 } Trainer ^^^^^^^ .. _trainer_detectnet_v2: The following is a sample `training_config` block to configure a `DetectNet_v2` trainer: .. code:: training_config { batch_size_per_gpu: 16 num_epochs: 80 learning_rate { soft_start_annealing_schedule { min_learning_rate: 5e-6 max_learning_rate: 5e-4 soft_start: 0.1 annealing: 0.7 } } regularizer { type: L1 weight: 3e-9 } optimizer { adam { epsilon: 1e-08 beta1: 0.9 beta2: 0.999 } } cost_scaling { enabled: False initial_exponent: 20.0 increment: 0.005 decrement: 1.0 } } The following table describes the parameters used to configure the trainer: +---------------------+-------------------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | **Parameter** | **Datatype** | **Default** | **Description** | **Supported Values** | +---------------------+-------------------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | `batch_size_per_gpu`| int | `32` | The number of images per batch per GPU. | >1 | | | | | | | +---------------------+-------------------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | `num_epochs` | int | `120` | The total number of epochs to run the experiment. | | +---------------------+-------------------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | `enable_qat` | bool | `False` | Enables model training using Quantization Aware Training (QAT). For | `True` or `False` | | | | | more information about QAT, see the :ref:`Quantization Aware Training ` section. | | +---------------------+-------------------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | learning rate | learning rate scheduler proto | soft_start | Configures the learning rate schedule for the trainer. Currently, | annealing: 0.0-1.0 and greater than soft_start Soft_start: 0.0 - 1.0 | | | | _annealing | `DetectNet_v2` only supports the `soft_start` annealing learning rate schedule, which may be | | | | | _schedule | configured using the following parameters: | A sample lr plot for a `soft_start` of 0.3 and annealing of 0.1 is shown | | | | | | in the figure below. | | | | | | | | | | | | | | | | | * `soft_start` (float): The time to ramp up the learning rate from minimum learning rate to maximum learning rate. | | | | | | * `annealing` (float): The time to cool down the learning rate from maximum learning rate to minimum learning rate. | | | | | | * `minimum_learning_rate` (float): The minimum learning rate in the learning rate schedule. | | | | | | * `maximum_learning_rate` (float): The maximum learning rate in the learning rate schedule. | | +---------------------+-------------------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | regularizer | regularizer proto config | | The type and the weight of the regularizer to be used during training. There are two parameters: | The supported values for type are: | | | | | | | | | | | | | | | | | | | | | | | | * NO_REG | | | | | * `type`: The type of the regularizer being used. | * L1 | | | | | * `weight`: The floating point weight of the regularizer. | * L2 | +---------------------+-------------------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | optimizer | optimizer proto config | | The optimizer to use for training and the parameters to configure it: | | | | | | | | | | | | | | | | | | | | | | | | * `epsilon` (float): A very small number to prevent any division by zero in the implementation. | | | | | | * `beta1` (float) | | | | | | * `beta2` (float) | | +---------------------+-------------------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | `cost_scaling` | costscaling | | Enables cost scaling during training. Leave this parameter untouched currently for the `DetectNet_v2` training pipe. | cost_scaling { enabled: False initial_exponent: 20.0 increment: 0.005 decrement: 1.0 } | | | _config | | | | +---------------------+-------------------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | checkpoint interval | float | `0/10` | The interval (in epochs) at which :code:`train` saves intermediate models. | 0 to num_epochs | +---------------------+-------------------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ `DetectNet_v2` currently supports the `soft_start` annealing learning rate schedule. The learning rate when plotted as a function of the training progress (0.0, 1.0) results in the following curve: .. image:: ../../content/learning_rate.png In this experiment, the `soft_start` was set as 0.3 and annealing as 0.7, with the minimum learning rate as 5e-6 and maximum learning rate, or `base_lr`, as 5e-4. .. Note:: We suggest using an L1 regularizer when training a network before pruning, as L1 regularization makes pruning the network weights easier. After pruning, when retraining the networks, we recommend turning regularization off by setting the regularization type to :code:`NO_REG`. Augmentation Module ^^^^^^^^^^^^^^^^^^^ .. _augmentation_module_detectnet_v2: The augmentation module provides some basic pre-processing and augmentation when training. Here is a sample :code:`augmentation_config` element: .. code:: # Sample augementation config for augmentation_config { preprocessing { output_image_width: 960 output_image_height: 544 output_image_channel: 3 min_bbox_width: 1.0 min_bbox_height: 1.0 } spatial_augmentation { hflip_probability: 0.5 vflip_probability: 0.0 zoom_min: 1.0 zoom_max: 1.0 translate_max_x: 8.0 translate_max_y: 8.0 } color_augmentation { color_shift_stddev: 0.0 hue_rotation_max: 25.0 saturation_shift_max: 0.2 contrast_scale_max: 0.1 contrast_center: 0.5 } } .. Note:: If the output image height and output image width of the preprocessing block doesn't match with the dimensions of the input image, the dataloader either pads with zeros or crops to fit to the output resolution. It does not resize the input images and labels to fit. The :code:`augmentation_config` contains three elements: :code:`preprocessing`: This nested field configures the input image and ground truth label pre-processing module. It sets the shape of the input tensor to the network. The ground truth labels are pre-processed to meet the dimensions of the input image tensors. +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ | **Parameter** | **Datatype** | **Default/Suggested value** | **Description** | **Supported Values** | +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ | output | int | -- | The width of the augmentation output. This is the same as the width of the network | >480 | | _image | | | input and must be a multiple of 16. | | | _width | | | | | +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ | output | int | -- | The height of the augmentation output. This is the same as the height of the network | >272 | | _image | | | input and must be a multiple of 16. | | | _height | | | | | +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ | output | int | 1, 3 | The channel depth of the augmentation output. This is the same as the channel depth of | 1,3 | | _image | | | the network input. Currently, 1-channel input is not recommended for datasets with JPG | | | _channel | | | images. For PNG images, both 3-channel RGB and 1-channel monochrome images are | | | | | | supported. | | +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ | output | int | -- | The smaller side of the augmentation output. This is the same as the smaller side of | >272 | | _image_min | | | the network input. This is used in the case of input with dynamic shape in FasterRCNN | | | | | | where we specify the smaller side size and the data loader will resize the image such | | | | | | that the smaller side is this number and keep aspect ratio. | | +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ | output | int | -- | The larger side of the augmentation output. This is the same as the larger side of the | >272 | | _image_max | | | network input. This is used in the case of input with dynamic shape in FasterRCNN | | | | | | where if the smaller side and keep aspect ratio results in the other side exceeding | | | | | | this limit, it will resize such that the larger side is exactly this number and keep | | | | | | aspect ratio so the smaller side does not exceed :code:`input_image_min`. | | +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ | enable_auto | bool | False | A flag to enable automatic resize during training. When it is set to True, offline | -- | | _resize | | | resize before the training is no longer required. Enabling this will potentially | | | | | | increase the training time. | | +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ | min_bbox | float | | The minimum height of the object labels to be considered for training. | 0 - output_image_height | | _height | | | | | +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ | min_bbox | float | | The minimum width of the object labels to be considered for training. | 0 - output_image_width | | _width | | | | | +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ | crop_right | int | | The right boundary of the crop to be extracted from the original image. | 0 - input image width | +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ | crop_left | int | | The left boundary of the crop to be extracted from the original image. | 0 - input image width | +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ | crop_top | int | | The top boundary of the crop to be extracted from the original image. | 0 - input image height | +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ | crop_bottom | int | | The bottom boundary of the crop to be extracted from the original image. | 0 - input image height | +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ | scale_height | float | | The floating point factor to scale the height of the cropped images. | > 0.0 | +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ | scale_width | float | | The floating point factor to scale the width of the cropped images. | > 0.0 | +---------------+--------------+-----------------------------+----------------------------------------------------------------------------------------+-------------------------+ :code:`spatial_augmentation`: This module supports basic spatial augmentation such as flip, zoom, and translate, which may be configured. +-------------------+--------------+-----------------------------+------------------------------------------------------------------------+---------------------------+ | **Parameter** | **Datatype** | **Default/Suggested value** | **Description** | **Supported Values** | +-------------------+--------------+-----------------------------+------------------------------------------------------------------------+---------------------------+ | hflip_probability | float | 0.5 | The probability to flip an input image horizontally. | 0.0-1.0 | +-------------------+--------------+-----------------------------+------------------------------------------------------------------------+---------------------------+ | vflip_probability | float | 0.0 | The probability to flip an input image vertically. | 0.0-1.0 | +-------------------+--------------+-----------------------------+------------------------------------------------------------------------+---------------------------+ | zoom_min | float | 1.0 | The minimum zoom scale of the input image. | > 0.0 | +-------------------+--------------+-----------------------------+------------------------------------------------------------------------+---------------------------+ | zoom_max | float | 1.0 | The maximum zoom scale of the input image. | > 0.0 | +-------------------+--------------+-----------------------------+------------------------------------------------------------------------+---------------------------+ | translate_max_x | int | 8.0 | The maximum translation to be added across the x axis. | 0.0 - output_image_width | +-------------------+--------------+-----------------------------+------------------------------------------------------------------------+---------------------------+ | translate_max_y | int | 8.0 | The maximum translation to be added across the y axis. | 0.0 - output_image_height | +-------------------+--------------+-----------------------------+------------------------------------------------------------------------+---------------------------+ | rotate_rad_max | float | 0.69 | The angle of rotation to be applied to the images and the training | > 0.0 (modulo 2*pi | | | | | labels. The range is defined between [-rotate_rad_max, rotate_rad_max].| | +-------------------+--------------+-----------------------------+------------------------------------------------------------------------+---------------------------+ :code:`color_augmentation`: This module configures the color space transformations, such as color shift, hue_rotation, saturation shift, and contrast adjustment. +----------------------+--------------+-----------------------------+---------------------------------------------------------------------------------------------+----------------------+ | **Parameter** | **Datatype** | **Default/Suggested value** | **Description** | **Supported Values** | +----------------------+--------------+-----------------------------+---------------------------------------------------------------------------------------------+----------------------+ | color_shift_stddev | float | 0.0 | The standard devidation value for the color shift. | 0.0-1.0 | +----------------------+--------------+-----------------------------+---------------------------------------------------------------------------------------------+----------------------+ | hue_rotation_max | float | 25.0 | The maximum rotation angle for the hue rotation matrix. | 0.0-360.0 | +----------------------+--------------+-----------------------------+---------------------------------------------------------------------------------------------+----------------------+ | saturation_shift_max | float | 0.2 | The maximum shift that changes the saturation. A value of 1.0 means no change in saturation | 0.0 - 1.0 | | | | | shift. | | +----------------------+--------------+-----------------------------+---------------------------------------------------------------------------------------------+----------------------+ | contrast_scale_max | float | 0.1 | The slope of the contrast as rotated around the provided center. A value of 0.0 leaves | 0.0 - 1.0 | | | | | the contrast unchanged. | | +----------------------+--------------+-----------------------------+---------------------------------------------------------------------------------------------+----------------------+ | contrast_center | float | 0.5 | The center around which the contrast is rotated. Ideally, this is set to half of the | 0.5 | | | | | maximum pixel value. Since our input images are scaled between 0 and 1.0, you can set this | | | | | | value to 0.5. | | +----------------------+--------------+-----------------------------+---------------------------------------------------------------------------------------------+----------------------+ The dataloader online augmentation pipeline applies spatial and color-space augmentation transformations in the following order: 1. The dataloader first performs the pre-processing operations on the input data (image and labels) read from the tfrecords files. Here the images and labels are cropped and scaled based on the parameters mentioned in the :code:`preprocessing` config. The boundaries for generating the cropped image and labels from the original image is defined by the :code:`crop_left`, :code:`crop_right`, :code:`crop_top` and :code:`crop_bottom` parameters. This cropped data is then scaled by the scale factors defined by :code:`scale_height` and :code:`scale_width`. The transformation matrices for these operations are computed globally and do not change per image. 2. The net tensors generated from the pre-processing blocks are then passed through a pipeline of random augmentations in spatial and color domains. The spatial augmentations are applied to both images and label coordinates, while the color augmentations are applied only to images. To apply color augmentations, the :code:`output_image_channel` parameter must be set to 3. For monochrome tensors, color augmentations are not applied. The spatial and color transformation matrices are computed per image, based on a uniform distribution along the maximum and minimum ranges defined by the :code:`spatial_augmentation` and :code:`color_augmentation` config parameters. 3. Once the spatial and color augmented net input tensors are generated, the output is then padded with zeros or clipped along the right and bottom edge of the image to fit the output dimensions defined in the :code:`preprocessing` config. Configuring the Evaluator ^^^^^^^^^^^^^^^^^^^^^^^^^ The evaluator in the detection training pipeline can be configured using the :code:`evaluation_config` parameters. The following is an example :code:`evaluation_config` element: .. code:: # Sample evaluation config to run evaluation in integrate mode for the given 3 class model, # at every 10th epoch starting from the epoch 1. evaluation_config { average_precision_mode: INTEGRATE validation_period_during_training: 10 first_validation_epoch: 1 minimum_detection_ground_truth_overlap { key: "car" value: 0.7 } minimum_detection_ground_truth_overlap { key: "person" value: 0.5 } minimum_detection_ground_truth_overlap { key: "bicycle" value: 0.5 } evaluation_box_config { key: "car" value { minimum_height: 4 maximum_height: 9999 minimum_width: 4 maximum_width: 9999 } } evaluation_box_config { key: "person" value { minimum_height: 4 maximum_height: 9999 minimum_width: 4 maximum_width: 9999 } } evaluation_box_config { key: "bicycle" value { minimum_height: 4 maximum_height: 9999 minimum_width: 4 maximum_width: 9999 } } } The following tables describe the parameters used to configure evaluation: +-----------------------+------------------+-----------------------------+---------------------------------------------------------------------------------+----------------------------------------------------------------------------+ | **Parameter** | **Datatype** | **Default/Suggested value** | **Description** | **Supported Values** | +-----------------------+------------------+-----------------------------+---------------------------------------------------------------------------------+----------------------------------------------------------------------------+ | average_precision | | Sample | The mode in which the average precision for each class is calculated. | | | | _mode | | | | | | | | | | * SAMPLE: This is the ap calculation mode using 11 evenly spaced recall | | | | | | points as used in the Pascal VOC challenge 2007. | | | | | | | | | | | | * INTEGRATE: This is the ap calculation mode as used in the 2011 challenge | +-----------------------+------------------+-----------------------------+---------------------------------------------------------------------------------+----------------------------------------------------------------------------+ | validation_period | int | 10 | The interval at which evaluation is run during training. The evaluation is | 1 - total number of epochs | | _during_training | | | run at this interval starting from the value of the first validation epoch | | | | | | parameter as specified below. | | +-----------------------+------------------+-----------------------------+---------------------------------------------------------------------------------+----------------------------------------------------------------------------+ | first_validation | int | 30 | The first epoch to start running validation. Ideally it is preferred to wait | 1 - total number of epochs | | _epoch | | | for at least 20-30% of the total number of epochs before starting evaluation, | | | | | | since the predictions in the initial epochs would be fairly inaccurate. Too | | | | | | many candidate boxes may be sent to clustering and this can cause the | | | | | | evaluation to slow down. | | +-----------------------+------------------+-----------------------------+---------------------------------------------------------------------------------+----------------------------------------------------------------------------+ | minimum_detection | proto dictionary | | Minimum IOU between ground truth and predicted box after clustering to call a | | | _ground_truth_overlap | | | valid detection. This parameter is a repeatable dictionary and a separate one | | | | | | must be defined for every class. The members include: | | | | | | | | | | | | | | | | | | | | | | | | | * key (string): class name | | | | | | * value (float): intersection over union value | | +-----------------------+------------------+-----------------------------+---------------------------------------------------------------------------------+----------------------------------------------------------------------------+ | evaluation_box_config | proto dictionary | | This nested configuration field configures the min and max box dimensions to be | | | | | | considered as a valid ground truth and prediction for AP calculation. | | +-----------------------+------------------+-----------------------------+---------------------------------------------------------------------------------+----------------------------------------------------------------------------+ The :code:`evaluation_box_config` field has these configurable inputs. +----------------+--------------+-----------------------------+------------------------------------------------------------------------+-------------------------------------+ | **Parameter** | **Datatype** | **Default/Suggested value** | **Description** | **Supported Values** | +----------------+--------------+-----------------------------+------------------------------------------------------------------------+-------------------------------------+ | minimum_height | float | 10 | Minimum height in pixels for a valid ground truth and prediction bbox. | 0. - model image height | +----------------+--------------+-----------------------------+------------------------------------------------------------------------+-------------------------------------+ | minimum_width | float | 10 | Minimum width in pixels for a valid ground truth and prediction bbox. | 0. - model image width | +----------------+--------------+-----------------------------+------------------------------------------------------------------------+-------------------------------------+ | maximum_height | float | 9999 | Maximum height in pixels for a valid ground truth and prediction bbox. | minimum_height - model image height | +----------------+--------------+-----------------------------+------------------------------------------------------------------------+-------------------------------------+ | maximum_width | float | 9999 | Maximum width in pixels for a valid ground truth and prediction bbox. | minimum _width - model image width | +----------------+--------------+-----------------------------+------------------------------------------------------------------------+-------------------------------------+ Dataloader ^^^^^^^^^^ .. _dataloader_detectnet_v2: The dataloader defines the path to the data you want to train on and the class mapping for classes in the dataset that the network is to be trained for. The following is an example :code:`dataset_config` element: .. code:: dataset_config { data_sources: { tfrecords_path: "" image_directory_path: "" } image_extension: "jpg" target_class_mapping { key: "car" value: "car" } target_class_mapping { key: "automobile" value: "car" } target_class_mapping { key: "heavy_truck" value: "car" } target_class_mapping { key: "person" value: "pedestrian" } target_class_mapping { key: "rider" value: "cyclist" } validation_fold: 0 } In this example the tfrecords is assumed to be multi-fold, and the fold number to validate on is defined. However, evaluation doesn’t necessarily have to be run on a split of the training set. Many ML engineers choose to evaluate the model on a well chosen evaluation dataset that is exclusive of the training dataset. If you prefer to run evaluation on a different validation dataset as opposed to a split from the training dataset, then convert this dataset into tfrecords by using the :code:`dataset-convert` tool as mentioned :ref:`here ` and use the :code:`validation_data_source` field in the :code:`dataset_config` to define this dataset to the evaluator . In this case, do not forget to remove the :code:`validation_fold` field from the spec. When generating the TFRecords for evaluation by using the :code:`validation_data_source` field, please review the notes :ref:`here `. .. _here: https://docs.nvidia.com .. code:: validation_data_source: { tfrecords_path: " /tfrecords validation pattern>" image_directory_path: " " } The parameters in :code:`dataset_config` are defined as follows: * :code:`data_sources`: Captures the path to tfrecords to train on. This field contains 2 parameters: * :code:`tfrecords_path`: Path to the individual tfrecords files. This path follows the UNIX style pathname pattern extension, so a common pathname pattern that captures all the tfrecords files in that directory can be used. * :code:`image_directory_path`: Path to the training data root from which the tfrecords was generated. * :code:`image_extension`: Extension of the images to be used. * :code:`target_class_mapping`: This parameter maps the class names in the tfrecords to the target class to be trained in the network. An element is defined for every source class to target class mapping. This field was included with the intention of grouping similar class objects under one umbrella. For example: car, van, heavy_truck etc may be grouped under automobile. The “key” field is the value of the class name in the tfrecords file and the “value” field corresponds to the value that the network is expected to learn. * :code:`validation_fold`: In case of an n fold tfrecords, you define the index of the fold to use for validation. For *sequencewise* validation choose the validation fold in the range [0, N-1]. For *random split* partitioning, force the validation fold index to 0 as the tfrecord is just 2-fold. .. Note:: The class names key in the target_class_mapping must be identical to the one shown in the dataset converter log, so that the correct classes are picked up for training. Specification File for Inference ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This spec file configures the :code:`infer` tool of detectnet to generate valid bbox predictions. The inference tool consists of 2 blocks, namely the inferencer and the bbox handler. The inferencer instantiates the model object and preprocessing pipe. The bbox handler handles the post processing, rendering of bounding boxes and the serialization to KITTI format output labels. Inferencer ********** The inferencer instantiates a model object that generates the raw predictions from the trained model. The model may be defined to run inference in the TLT backend or the TensorRT backend. A sample :code:`inferencer_config` element for the inferencer spec is defined here: .. code:: inferencer_config{ # defining target class names for the experiment. # Note: This must be mentioned in order of the networks classes. target_classes: "car" target_classes: "cyclist" target_classes: "pedestrian" # Inference dimensions. image_width: 1248 image_height: 384 # Must match what the model was trained for. image_channels: 3 batch_size: 16 gpu_index: 0 # model handler config tensorrt_config{ parser: ETLT etlt_model: "/path/to/model.etlt" backend_data_type: INT8 save_engine: true trt_engine: "/path/to/trt/engine/file" calibrator_config{ calibration_cache: "/path/to/calibration/cache" n_batches: 10 batch_size: 16 } } } The :code:`inferencer_config` parameters are explained in the table below. +-----------------+-------------------+-----------------------------+-------------------------------------------------------------------------------------------------+-----------------------------------------------+ | **Parameter** | **Datatype** | **Default/Suggested value** | **Description** | **Supported Values** | +-----------------+-------------------+-----------------------------+-------------------------------------------------------------------------------------------------+-----------------------------------------------+ | target_classes | String (repeated) | None | The names of the target classes the model should output. For a multi-class model this parameter | For example, for the 3 class kitti model | | | | | is repeated N times. The number of classes must be equal to the number of classes and the | it will be: | | | | | order must be the same as the classes in costfunction_config of the training config file. | | | | | | | * car | | | | | | * cyclist | | | | | | * pedestrian | +-----------------+-------------------+-----------------------------+-------------------------------------------------------------------------------------------------+-----------------------------------------------+ | batch_size | int | 1 | The number of images per batch of inference. | Max number of images that can be fit in 1 GPU | +-----------------+-------------------+-----------------------------+-------------------------------------------------------------------------------------------------+-----------------------------------------------+ | image_height | int | 384 | The height of the image in pixels at which the model will be inferred. | >16 | +-----------------+-------------------+-----------------------------+-------------------------------------------------------------------------------------------------+-----------------------------------------------+ | image_width | int | 1248 | The width of the image in pixels at which the model will be inferred. | >16 | +-----------------+-------------------+-----------------------------+-------------------------------------------------------------------------------------------------+-----------------------------------------------+ | image_channels | int | 3 | The number of channels per image. | 1,3 | +-----------------+-------------------+-----------------------------+-------------------------------------------------------------------------------------------------+-----------------------------------------------+ | gpu_index | int | 0 | The index of the GPU to run inference on. This is useful only in TLT inference. For tensorRT | | | | | | inference, by default, the GPU of choice in ‘0’. | | +-----------------+-------------------+-----------------------------+-------------------------------------------------------------------------------------------------+-----------------------------------------------+ | tensorrt_config | TensorRTConfig | None | Proto config to instantiate a TensorRT object. | | +-----------------+-------------------+-----------------------------+-------------------------------------------------------------------------------------------------+-----------------------------------------------+ | tlt_config | TLTConfig | None | Proto config to instantiate a TLT model object. | | +-----------------+-------------------+-----------------------------+-------------------------------------------------------------------------------------------------+-----------------------------------------------+ As mentioned earlier, the :code:`infer` tool is capable of running inference using the native TLT backend and the TensorRT backend. They can be configured by using the tensorrt_config proto element or the tlt_config proto element respectively. You may use only one of the two in a single spec file. The definitions of the two model objects are: +---------------------+------------------------+-----------------------------+-----------------------------------------------------------------------------------------+------------------------------+ | **Parameter** | **Datatype** | **Default/Suggested value** | **Description** | **Supported Values** | +---------------------+------------------------+-----------------------------+-----------------------------------------------------------------------------------------+------------------------------+ | parser | enum | ETLT | The tensorrt parser to be invoked. Only ETLT parser is supported. | ETLT | +---------------------+------------------------+-----------------------------+-----------------------------------------------------------------------------------------+------------------------------+ | etlt_model | string | None | Path to the exported etlt model file. | Any existing etlt file path. | +---------------------+------------------------+-----------------------------+-----------------------------------------------------------------------------------------+------------------------------+ | backend_data _type | enum | FP32 | The data type of the backend TensorRT inference engine. For int8 mode, be | FP32 | | | | | | FP16 | | | | | sure to mention the calibration_cache. | INT8 | +---------------------+------------------------+-----------------------------+-----------------------------------------------------------------------------------------+------------------------------+ | save_engine | bool | False | Flag to save a TensorRT engine from the input etlt file. This will save initialization | True, False | | | | | time if inference needs to be run on the same etlt file and there are no changes | | | | | | needed to be made to the inferencer object. | | +---------------------+------------------------+-----------------------------+-----------------------------------------------------------------------------------------+------------------------------+ | trt_engine | string | None | Path to the TensorRT engine file. This acts an I/O parameter. If the path defined here | UNIX path string | | | | | is not an engine file, then the :code:`infer` tool creates a new TensorRT engine from | | | | | | the etlt file. If there exists an engine already, the tool re-instantiates the | | | | | | inferencer from the engine defined here. | | +---------------------+------------------------+-----------------------------+-----------------------------------------------------------------------------------------+------------------------------+ | calibration _config | CalibratorConfig Proto | None | This is a required parameter when running in the int8 inference mode. This proto object | | | | | | contains parameters used to define a calibrator object. Namely: | | | | | | calibration_cache: path to the calibration cache file generated using :code:`export`. | | +---------------------+------------------------+-----------------------------+-----------------------------------------------------------------------------------------+------------------------------+ TLT_Config ********** +---------------+--------------+-----------------------------+----------------------------------+----------------------+ | **Parameter** | **Datatype** | **Default/Suggested value** | **Description** | **Supported Values** | +---------------+--------------+-----------------------------+----------------------------------+----------------------+ | model | string | None | The path to the .tlt model file. | | +---------------+--------------+-----------------------------+----------------------------------+----------------------+ .. Note:: Since detectnet is a full convolutional neural net, the model can be inferred at a different inference resolution than the resolution at which it was trained. The input dims of the network will be overridden to run inference at this resolution, if they are different from the training resolution. There may be some regression in accuracy when running inference at a different resolution since the convolutional kernels don’t see the object features at this shape. Bbox Handler ************ The bbox handler takes care of the post processing the raw outputs from the inferencer. It performs the following steps: 1. Thresholding the raw outputs to defines grid cells where the detections may be present per class. 2. Reconstructing the image space coordinates from the raw coordinates of the inferencer. 3. Clustering the raw thresholded predictions. 4. Filtering the clustered predictions per class. 5. Rendering the final bounding boxes on the image in its input dimensions and serializing them to KITTI format metadata. A sample :code:`bbox_handler_config` element is defined below. .. code:: bbox_handler_config{ kitti_dump: true disable_overlay: false overlay_linewidth: 2 classwise_bbox_handler_config{ key:"car" value: { confidence_model: "aggregate_cov" output_map: "car" bbox_color{ R: 0 G: 255 B: 0 } clustering_config{ coverage_threshold: 0.005 dbscan_eps: 0.3 dbscan_min_samples: 0.05 dbscan_confidence_threshold: 0.9 minimum_bounding_box_height: 4 } } } classwise_bbox_handler_config{ key:"default" value: { confidence_model: "aggregate_cov" bbox_color{ R: 255 G: 0 B: 0 } clustering_config{ coverage_threshold: 0.005 dbscan_eps: 0.3 dbscan_min_samples: 0.05 dbscan_confidence_threshold: 0.9 minimum_bounding_box_height: 4 } } } } The parameters to configure the bbox handler are defined below. +--------------------------------+------------------------------------+-----------------------------+------------------------------------------------------------------------------------------+----------------------+ | **Parameter** | **Datatype** | **Default/Suggested value** | **Description** | **Supported Values** | +--------------------------------+------------------------------------+-----------------------------+------------------------------------------------------------------------------------------+----------------------+ | kitti_dump | bool | false | Flag to enable saving the final output predictions per image in KITTI format. | true, false | +--------------------------------+------------------------------------+-----------------------------+------------------------------------------------------------------------------------------+----------------------+ | disable_overlay | bool | true | Flag to disable bbox rendering per image. | true, false | +--------------------------------+------------------------------------+-----------------------------+------------------------------------------------------------------------------------------+----------------------+ | overlay _linewidth | int | 1 | Thickness in pixels of the bbox boundaries. | >1 | +--------------------------------+------------------------------------+-----------------------------+------------------------------------------------------------------------------------------+----------------------+ | classwise_bbox _handler_config | ClasswiseCluster Config (repeated) | None | This is a repeated class-wise dictionary of post-processing parameters. DetectNet_v2 | | | | | | uses dbscan clustering to group raw bboxes to final predictions. For models with several | | | | | | output classes, it may be cumbersome to define a separate dictionary for each class. In | | | | | | such a situation, a default class may be used for all classes in the network. | | +--------------------------------+------------------------------------+-----------------------------+------------------------------------------------------------------------------------------+----------------------+ The :code:`classwise_bbox_handler_config` is a Proto object containing several parameters to configure the clustering algorithm as well as the bbox renderer. +-----------------------+------------------------+-------------------------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------+ | **Parameter** | **Datatype** | **Default / Suggested value** | **Description** | **Supported Values** | +-----------------------+------------------------+-------------------------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------+ | confidence _model | string | aggregate_cov | Algorithm to compute the final confidence of the clustered bboxes. In the aggregate_cov mode, | aggregate_cov, mean_cov | | | | | the final confidence of a detection is the sum of the confidences of all the candidate bboxes | | | | | | in a cluster. In mean_cov mode, the final confidence is the mean confidence of all the bboxes | | | | | | in the cluster. | | +-----------------------+------------------------+-------------------------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------+ | bbox_color | BBoxColor Proto Object | None | RGB channel wise color intensity per box. | R: 0 - 255 | | | | | | | | | | | | G: 0 - 255 | | | | | | | | | | | | B: 0 - 255 | +-----------------------+------------------------+-------------------------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------+ | clustering_config | ClusteringConfig | None | Proto object to configure the DBSCAN, NMS or HYBRID clustering algorithm. It leverages the same | | | | | | parameters as defined in the :code:`postprocessing_config` of the training config. Please refer | | | | | | :ref:`here ` for more explanation about the parameters. | | +-----------------------+------------------------+-------------------------------+-------------------------------------------------------------------------------------------------------+--------------------------------------------------------+ Training the Model ------------------ .. _training_the_model_detectnet_v2: After creating TFRecords ingestible by the TLT training (as outlined in :ref:`Preprocessing the Dataset `) and setting up a :ref:`spec file `, you are now ready to start training an object detection network. The following outlines the DetectNet_v2 training command: .. code:: tlt detectnet_v2 train [-h] -k -r -e [-n ] [--gpus ] [--gpu_index ] [--use_amp] [--log_file ] Required Arguments ^^^^^^^^^^^^^^^^^^ * :code:`-r, --results_dir`: The path to a folder where experiment outputs should be written. * :code:`-k, –key`: A user-specific encoding key to save or load a :code:`.tlt` model. * :code:`-e, --experiment_spec_file`: The path to the spec file. The path may be absolute or relative to the working directory. By default, the spec from :code:`spec_loader.py` is used. Optional Arguments ^^^^^^^^^^^^^^^^^^ * :code:`-n, --model_name`: The name of the final step model saved. If not provided, defaults to the model. * :code:`--gpus`: The number of GPUs to use and processes to launch for training. The default value is 1. * :code:`--gpu_index`: The indices of the GPUs to use for training. The GPUs are referenced as per the indices mentioned in the :code:`./deviceQuery` CUDA samples. * :code:`--use_amp`: When defined, this flag enables Automatic Mixed Precision mode. * :code:`--log_file`: The path to the log file. Defaults to :code:`stdout`. * :code:`-h, --help`: Show this help message and exit. Input Requirement ^^^^^^^^^^^^^^^^^ * **Input size**: C * W * H (where C = 1 or 3, W > =480, H >=272 and W, H are multiples of 16) * **Image format**: JPG, JPEG, PNG * **Label format**: KITTI detection .. Note:: The :code:`train` tool does not support training on images of multiple resolutions. However, the dataloader does support resizing images to the input resolution defined in the specification file. This can be enabled by setting the :code:`enable_auto_resize` parameter to :code:`true` in the :code:`augmentation_config` module of the spec file. Sample Usage ^^^^^^^^^^^^ Here is an example of a command for training with two GPUs: .. code:: tlt detectnet_v2 train -e -r -k -n --gpus 2 .. Note:: The :code:`train` tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly. .. Note:: DetectNet_v2 now supports resuming training from intermediate checkpoints. When a previously running training experiment is stopped prematurely, one may restart the training from the last checkpoint by simply re-running the detectnet_v2 training command with the same command line arguments as before. The trainer for detectnet_v2 finds the last saved checkpoint in the results directory and resumes the training from there. The interval at which the checkpoints are saved are defined by the `checkpoint_interval` parameter under the “training_config” for detectnet_v2. Evaluating the Model -------------------- .. _evaluating_the_model_detectnet_v2: Execute :code:`evaluate` on a DetectNet_v2 model. .. code:: tlt detectnet_v2 evaluate [-h] -e -k [--use_training_set] [--gpu_index] Required Arguments ^^^^^^^^^^^^^^^^^^ * :code:`-e, --experiment_spec_file`: The experiment spec file to set up the evaluation experiment. This should be the same as training spec file. * :code:`-m, --model`: The path to the model file to use for evaluation. This could be a :code:`.tlt` model file or a tensorrt engine generated using the :code:`export` tool. * :code:`-k, -–key`: The encryption key to decrypt the model. This argument is only required with a :code:`.tlt` model file. Optional Arguments ^^^^^^^^^^^^^^^^^^ * :code:`-h, --help`: Show this help message and exit. * :code:`-f, --framework`: The framework to use when running evaluation (choices: “tlt”, “tensorrt”). By default the framework is set to TensorRT. * :code:`--use_training_set`: Set this flag to run evaluation on the training dataset. * :code:`--gpu_index`: The index of the GPU to run evaluation on. If you have followed the example in :ref:`Training a Detection Model `, you may now evaluate the model using the following command: .. code:: tlt detectnet_v2 evaluate -e -m -k .. Note:: This command runs evaluation on the same validation set that was used during training. Use these steps to evaluate on a test set with ground truth labeled: 1. Create tfrecords for this training set by following the steps listed in the data input section. 2. Update the dataloader configuration part of the training spec file to include the newly generated tfrecords. For more information on the dataset config, refer to :ref:`Creating an Experiment Spec File`. You may create the tfrecords with any partition mode (sequence/random). The evaluate tool iterates through all the folds in the tfrecords patterns mentioned in the :code:`validation_data_source`. .. code:: dataset_config { data_sources: { tfrecords_path: "/" image_directory_path: "" } image_extension: "jpg" target_class_mapping { key: "car" value: "car" } target_class_mapping { key: "automobile" value: "car" } .. .. .. target_class_mapping { key: "person" value: "pedestrian" } target_class_mapping { key: "rider" value: "cyclist" } validation_data_source: { tfrecords_path: "/" image_directory_path: "" } } The rest of the experiment spec file should remain the same as the training spec file. Using Inference on the Model ---------------------------- .. _using_inference_on_the_model_detectnet_v2: The :code:`infer` task for detectnet_v2 may be used to visualize bboxes and/or generate frame-by-frame KITTI format labels on a single image or directory of images. An example of the command for this task is shown below: .. code:: tlt detectnet_v2 inference [-h] -e -i -o -k Required Parameters ^^^^^^^^^^^^^^^^^^^ * :code:`-e, --inference_spec`: The path to an inference spec file. * :code:`-i, --inference_input`: The directory of input images or a single image for inference. * :code:`-o, --inference_output`: The directory to the output images and labels. The annotated images are in :code:`inference_output/images_annotated` and labels are in :code:`inference_output/labels`. * :code:`-k, --enc_key`: The key to load the model. The tool automatically generates bbox rendered images in :code:`output_path/images_annotated`. To get the bbox labels in KITTI format, configure the :code:`bbox_handler_config` spec file using the :code:`kitti_dump parameter` as mentioned `here`_. This will generate the output in :code:`output_path/labels`. .. _here: https://docs.nvidia.com Pruning the Model ----------------- .. _pruning_the_model_detectnet_V2: Pruning removes parameters from the model to reduce the model size without compromising the integrity of the model itself using the :code:`prune` command. The :code:`prune` task includes these parameters: .. code:: tlt detectnet_v2 prune [-h] -pm -o -k [-n ] [-eq ] [-pg ] [-pth ] [-nf ] [-el [] Required Arguments ^^^^^^^^^^^^^^^^^^ * :code:`-pm, --pretrained_model`: The path to the pretrained model. * :code:`-o, --output_file`: The path to the output checkpoints. * :code:`-k, --key`: The key to load a .tlt model. Optional Arguments ^^^^^^^^^^^^^^^^^^ * :code:`-h, --help`: Show this help message and exit. * :code:`-n, –normalizer`: Specify ``max`` to normalize by dividing each norm by the maximum norm within a layer; specify ``L2`` to normalize by dividing by the L2 norm of the vector comprising all kernel norms. The default value is ``max``. * :code:`-eq, --equalization_criterion`: Criteria to equalize the stats of inputs to an element-wise op layer or depth-wise convolutional layer. This parameter is useful for resnets and mobilenets. The options are :code:`arithmetic_mean`, :code:`geometric_mean`, :code:`union`, and :code:`intersection` (default: :code:`union`). * :code:`-pg, -pruning_granularity`: The number of filters to remove at a time (default:8) * :code:`-pth`: The threshold to compare the normalized norm against (default:0.1) .. Note: NVIDIA recommends changing the threshold to keep the number of parameters in the model to within 10-20% of the original unpruned model. * :code:`-nf, --min_num_filters`: The minimum number of filters to keep per layer (default:16) * :code:`-el, --excluded_layers`: A list of excluded_layers (e.g. :code:`-i item1 item2`) (default: []) After pruning, the model needs to be retrained. See :ref:`Re-training the Pruned Model ` for more details. Using the Prune Command ^^^^^^^^^^^^^^^^^^^^^^^ .. _pruning_a_detectnet_v2_model: Here's an example of using the :code:`prune` task: .. code:: tlt detectnet_v2 prune -m /workspace/output/weights/resnet_003.tlt -o /workspace/output/weights/resnet_003_pruned.tlt -eq union -pth 0.7 -k $KEY Re-training the Pruned Model ---------------------------- .. _re-training_the_pruned_model_detectnet_v2: Once the model has been pruned, there might be a slight decrease in accuracy because some previously useful weights may have been removed. To regain the accuracy, we recommend that you retrain this pruned model over the same dataset using the :code:`train` task, as documented in the :ref:`Training the model ` section, with an updated spec file that points to the newly pruned model as the pretrained model file. You should turn off the regularizer in the :code:`training_config` for detectnet to recover the accuracy when retraining a pruned model. You may do this by setting the regularizer type to :code:`NO_REG` as mentioned :ref:`here`. All other parameters may be retained in the spec file from the previous training. To load the pretrained model, set the :code:`load_graph` flag under :code:`model_config` to :code:`true`. Exporting the Model ------------------- .. _exporting_the_model_detectnet_v2: The DetectNet_V2 model application in the Transfer Learning Toolkit includes an :code:`export` sub-task to export and prepare a trained DetectNet_v2 model for :ref:`Deploying to DeepStream `. The :code:`export` sub-task optionally generates the calibration cache for TensorRT INT8 engine calibration. Exporting the model decouples the training process from deployment and allows conversion to TensorRT engines outside the TLT environment. TensorRT engines are specific to each hardware configuration and should be generated for each unique inference environment. This may be interchangeably referred to as a :code:`.trt` or :code:`.engine` file. The same exported TLT model may be used universally across training and deployment hardware. This is referred to as the :code:`.etlt` file, or encrypted TLT file. During model export, the TLT model is encrypted with a private key, which is required when you deploy this model for inference. INT8 Mode Overview ^^^^^^^^^^^^^^^^^^ TensorRT engines can be generated in INT8 mode to run with lower precision, and thus improve performance. This process requires a cache file that contains scale factors for the tensors to help combat quantization errors, which may arise due to low-precision arithmetic. The calibration cache is generated using a calibration tensorfile when :code:`export` is run with the :code:`--data_type` flag set to :code:`int8`. Pre-generating the calibration information and caching it removes the need for calibrating the model on the inference machine. Moving the calibration cache is usually much more convenient than moving the calibration tensorfile since it is a much smaller file and can be moved with the exported model. Using the calibration cache also speeds up engine creation as building the cache can take several minutes to generate depending on the size of the Tensorfile and the model itself. The export tool can generate an INT8 calibration cache by ingesting training data using one of these options: * **Option 1**: Providing a calibration tensorfile generated using the :code:`calibration_tensorfile` task defined in DetectNet_v2. This command uses the data generators in the training pipeline to produce a drop of preprocessed batches of input images from the training dataset. Using this gives users the opportunity to maintain a record of the exact batches of the training data used to generate the calibration scale factors in the calibration cache file. However, this is a two-step process for generating an int8 cache file. * **Option 2**: Pointing the tool to a directory of images that you want to use to calibrate the model. For this option, you will need to create a sub-sampled directory of random images that best represent your training dataset. * **Option 3**: Using the training data loader directly to load the training images for INT8 calibration. This option is now the recommended approach as it helps to generate multiple random samples. This also ensures two important aspects of the data during calibration: * Data pre-processing in the INT8 calibration step is the same as in the training process. * The data batches are sampled randomly across the entire training dataset, thereby improving the accuracy of the int8 model. * Calibration occurs as a one-step process with the data batches being generated on the fly. NVIDIA plans to eventually deprecate Option 1 and only support Options 2 and 3. .. image:: ../../content/tlt_int8_calibration.png FP16/FP32 Model ^^^^^^^^^^^^^^^ The :code:`calibration.bin` is only required if you need to run inference at INT8 precision. For FP16/FP32 based inference, the export step is much simpler. All that is required is to provide a model from the :code:`train` step to :code:`export` to convert it into an encrypted TLT model. .. image:: ../../content/fp16_fp32_export.png Generating an INT8 tensorfile Using the calibration_tensorfile Command ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The INT8 tensorfile is a binary file that contains the preprocessed training samples, which may be used to calibrate the model. In this release, TLT only supports calibration tensorfile generation for SSD, DSSD, DetectNet_v2, and classification models. The sample usage for the :code:`calibration_tensorfile` command to generate a calibration tensorfile is defined below: .. code:: tlt detectnet_v2 calibration_tensorfile [-h] -e -o -m [--use_validation_set] Required Arguments ****************** * :code:`-e, --experiment_spec_file`: The path to the experiment spec file (only required for SSD and FasterRCNN). * :code:`-o, --output_path`: The path to the output tensorfile that will be created. * :code:`-m, --max_batches`: The number of batches of input data to be serialized. Optional Argument ***************** * :code:`--use_validation_set`: A flag specifying whether to use the validation dataset instead of the training set. The following is a sample command to invoke the :code:`calibration_tensorfile` command for a classification model: .. code:: tlt detectnet_v2 calibration_tensorfile -e $SPECS_DIR/classification_retrain_spec.cfg -m 10 -o $USER_EXPERIMENT_DIR/export/calibration.tensor Exporting the DetectNet_v2 Model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The following are command line arguments of the :code:`export` command: .. code:: tlt detectnet_v2 export [-h] -m -k [-o ] [--cal_data_file ] [--cal_image_dir ] [--data_type ] [--batches ] [--max_batch_size ] [--max_workspace_size ] [--experiment_spec ] [--engine_file ] [--verbose Verbosity of the logger] [--force_ptq Flag to force PTQ] [--gen_ds_config Generate DeepStream config] Required Arguments ****************** * :code:`-m, --model`: The path to the :code:`.tlt` model file to be exported using :code:`export`. * :code:`-k, --key`: The key used to save the :code:`.tlt` model file. * :code:`-e, --experiment_spec`: The path to the spec file. This argument is required for faster_rcnn, ssd, dssd, yolo, and retinanet. Optional Arguments ****************** * :code:`-o, --output_file`: The path to save the exported model to. The default path is :code:`./.etlt`. * :code:`--data_type`: The desired engine data type. The options are :code:`fp32`, :code:`fp16`, and :code:`int8`. A calibration cache will be generated in :code:`int8` mode. The default value is :code:`fp32`. If using :code:`int8` mode, the following INT8 arguments are required. * :code:`-s, --strict_type_constraints`: A Boolean flag to indicate whether or not to apply the TensorRT :code:`strict_type_constraints` when building the TensorRT engine. Note this is only for applying the strict type of :code:`int8` mode. * :code:`--gen_ds_config`: A Boolean flag indicating whether to generate the template DeepStream related configuration ("nvinfer_config.txt") as well as a label file ("labels.txt") in the same directory as the :code:`output_file`. Note that the config file is NOT a complete configuration file and requires the user to update the sample config files in DeepStream with the parameters generated. INT8 Export Mode Required Arguments ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * :code:`--cal_data_file`: The tensorfile generated from :code:`calibration_tensorfile` for calibrating the engine. This can also be an output file if used with :code:`--cal_image_dir`. * :code:`--cal_image_dir`: The directory of images to use for calibration. .. Note:: The :code:`--cal_image_dir` parameter applies the necessary preprocessing to generate a tensorfile at the path mentioned in the :code:`--cal_data_file` parameter, which is in turn used for calibration. The number of generated batches in the tensorfile is obtained from the value set to the :code:`--batches` parameter, and the :code:`batch_size` is obtained from the value set to the :code:`--batch_size` parameter. Ensure that the directory mentioned in :code:`--cal_image_dir` has at least :code:`batch_size * batches` number of images in it. The valid image extensions are :code:`.jpg`, :code:`.jpeg`, and :code:`.png`. In this case, the :code:`input_dimensions` of the calibration tensors are derived from the input layer of the :code:`.tlt` model. INT8 Export Optional Arguments ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * :code:`--cal_cache_file`: The path to save the calibration cache file to. The default value is :code:`./cal.bin`. * :code:`--batches`: The number of batches to use for calibration and inference testing. The default value is 10. * :code:`--batch_size`: The batch size to use for calibration. The default value is 8. * :code:`--max_batch_size`: The maximum batch size of the TensorRT engine. The default value is 16. * :code:`--max_workspace_size`: The maximum workspace size of the TensorRT engine. The default value is 1073741824 = 1<<30. * :code:`--experiment_spec`: The experiment_spec for training/inference/evaluation. This is used to generate the graphsurgeon config script for FasterRCNN from the experiment_spec (which is only useful for FasterRCNN). Use this argument when DetectNet_v2 and FasterRCNN also set up the dataloader-based calibrator to leverage the training dataloader to calibrate the model. * :code:`--engine_file`: The path to the serialized TensorRT engine file. Note that this file is hardware specific and cannot be generalized across GPUs. Use this argument to quickly test your model accuracy using TensorRT on the host. As the TensorRT engine file is hardware specific, you cannot use this engine file for deployment unless the deployment GPU is identical to the training GPU. * :code:`--force_ptq`: A Boolean flag to force post-training quantization on the exported :code:`.etlt` model. .. Note:: When exporting a model that was trained with QAT enabled, the tensor scale factors to calibrate the activations are peeled out of the model and serialized to a TensorRT-readable cache file defined by the :code:`cal_cache_file` argument. However, the current version of QAT doesn’t natively support DLA int8 deployment on Jetson. To deploy this model on Jetson with DLA :code:`int8`, use the :code:`--force_ptq` flag to use TensorRT post-training quantization to generate the calibration cache file. Sample usage for the export sub-task ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The following is a sample command to export a DetectNet_v2 model in INT8 mode. This command shows option 1: using the :code:`--cal_data_file` option with the :code:`calibration.tensor` generated using the :code:`calibration_tensorfile` sub-task. .. code:: tlt detectnet_v2 export -e $USER_EXPERIMENT_DIR/experiment_dir_retrain/experiment_spec.txt -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt -k $KEY --cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor --data_type int8 --batches 10 --cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin --engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet_18.engine The following is an example log of a successful export: .. code:: Using TensorFlow backend. 2018-11-02 18:59:43,347 [INFO] iva.common.tlt-export: Loading model from resnet10_kitti_multiclass_v1.tlt 2018-11-02 18:59:47,572 [INFO] tensorflow: Restoring parameters from /tmp/tmp8crUBp.ckpt INFO:tensorflow:Froze 82 variables. 2018-11-02 18:59:47,701 [INFO] tensorflow: Froze 82 variables. Converted 82 variables to const ops. 2018-11-02 18:59:48,123 [INFO] iva.common.tlt-export: Converted model was saved into resnet10_kitti_multiclass_v1.etlt 2018-11-02 18:59:48,123 [INFO] iva.common.tlt-export: Input node: input_1 2018-11-02 18:59:48,124 [INFO] iva.common.tlt-export: Output node(s): ['output_bbox/BiasAdd', 'output_cov/Sigmoid'] The following is a sample command using the :code:`--cal_image_dir` option for a DetectNet_v2 model using option 2. .. code:: tlt detectnet_v2 export -m $USER_EXPERIMENT_DIR/detectnet_v2/model.tlt -o $USER_EXPERIMENT_DIR/detectnet_v2/model.int8.etlt -e $SPECS_DIR/detectnet_v2_kitti_retrain_spec.txt --key $KEY --cal_image_dir $USER_EXPERIMENT_DIR/data/KITTI/val/image_2 --data_type int8 --batch_size 8 --batches 10 --cal_data_file $USER_EXPERIMENT_DIR/data/detectnet_v2/cal.tensorfile --cal_cache_file $USER_EXPERIMENT_DIR/data/detectnet_v2/cal.bin --engine_file $USER_EXPERIMENT_DIR/data/detectnet_v2/detection.trt Generating a Template DeepStream Config File ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. _generate_a_template_deepstream_config: TLT supports serializing a template config file for the nvinfer element of deepstream to consume this model. This config file contains the network specific pre-processing parameters and network graph parameters for parsing the :code:`etlt` model file. It also generates a label file that contains the names of the classes that the model was trained for in the order in which the outputs are generated. To generate the deepstream config, simply run the :code:`export` command using the :code:`--gen_ds_config` option. The following example shows how to generate the DeepStream config: .. code:: tlt detectnet_v2 export -m $USER_EXPERIMENT_DIR/detectnet_v2/model.tlt -o $USER_EXPERIMENT_DIR/detectnet_v2/model.int8.etlt -e $SPECS_DIR/detectnet_v2_kitti_retrain_spec.txt --key $KEY --cal_image_dir $USER_EXPERIMENT_DIR/data/KITTI/val/image_2 --data_type int8 --batch_size 8 --batches 10 --cal_data_file $USER_EXPERIMENT_DIR/data/detectnet_v2/cal.tensorfile --cal_cache_file $USER_EXPERIMENT_DIR/data/detectnet_v2/cal.bin --engine_file $USER_EXPERIMENT_DIR/data/detectnet_v2/detection.trt --gen_ds_config The template DeepStream config is generated in the same directory as the output model file as :code:`nvinfer_config.txt`, while the labels are serialized in :code:`labels.txt` file. Sample output of the :code:`nvinfer_config.txt` and :code:`labels.txt` are as follows: * Sample :code:`nvinfer_config.txt` .. code:: text net-scale-factor=0.00392156862745098 offsets=0;0;0 infer-dims=3;544;960 tlt-model-key=tlt_encode network-type=0 num-detected-classes=3 uff-input-order=0 output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd uff-input-blob-name=input_1 model-color-format=0 * Sample :code:`labels.txt` .. code:: text person bag face .. Note:: The :code:`nvinfer_config.txt` file that is generated by :code:`export` is **NOT** a complete :code:`config_infer_*.txt` file that can be replaced into the DeepStream config file. You need to find and replace the parameters defined in this file, with the parameters in the default :code:`config_infer_*.txt` file. Deploying to Deepstream ----------------------- .. _deploying_to_deepstream_detectnet_v2: This section elaborates how to deploy a trained DetectNet_v2 model to the DeepStream SDK for inference, and the two options to consume an exported :code:`.etlt` model file. For information about the DeepStream SDK itself, please refer to this :ref:`section `. Generating an Engine Using tlt-converter ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. _generating_an_engine_using_tlt-converter_detectnet_v2: .. include:: ../excerpts/generating_an_engine_using_tlt-converter.rst Instructions for x86 ******************** .. include:: ../excerpts/instructions_for_x86.rst Instructions for Jetson *********************** .. include:: ../excerpts/instructions_for_jetson.rst Using the tlt-converter *********************** .. _using tlt-converter with DetectNet_v2: .. code:: tlt-converter [-h] -k -d -o [-c ] [-e ] [-b ] [-m ] [-t ] [-w ] [-i ] [-p ] [-s] [-u ] input_file Required Arguments ~~~~~~~~~~~~~~~~~~ * :code:`input_file`: The path to the :code:`.etlt` model exported using :code:`export`. * :code:`-k`: The key used to encode the :code:`.tlt` model when doing the training. * :code:`-d`: A comma-separated list of input dimensions that should match the dimensions used for :code:`export`. Unlike :code:`export`, this cannot be inferred from calibration data. * :code:`-o`: A comma-separated list of output blob names that should match the output configuration used for :code:`export`. For DetectNet_v2, set this argument to :code:`output_cov/Sigmoid,output_bbox/BiasAdd`. Optional Arguments ~~~~~~~~~~~~~~~~~~ * :code:`-e`: The path to save the engine to. The default path is default: :code:`./saved.engine`. * :code:`-t`: The desired engine data type. This option generates a calibration cache if in INT8 mode. The default value is :code:`fp32`. The options are :code:`fp32`, :code:`fp16`, :code:`int8`. * :code:`-w`: The maximum workspace size for the TensorRT engine. The default value is :code:`1073741824(1<<30)`. * :code:`-i`: The input dimension ordering. The default value is :code:`nchw`. The options are :code:`nchw`, :code:`nhwc`, :code:`nc`. For detectnet_v2, we can omit this argument. * :code:`-p`: The optimization profiles for :code:`.etlt` models with dynamic shape. This argument takes a comma-separated list of optimization profile shapes in the format :code:`,,,`, where each shape has the format :code:`xxx`. This can be specified multiple times if there are multiple input tensors for the model. This argument is only useful for new models introduced in TLT 3.0. * :code:`-s`: A Boolean value specifying whether to apply TensorRT strict type constraints when building the TensorRT engine. * :code:`-u`: Specifies the DLA core index when building the TensorRT engine on Jetson devices. INT8 Mode Arguments ~~~~~~~~~~~~~~~~~~~ * :code:`-c`: The path to the calibration cache file for INT8 mode. The default path is :code:`./cal.bin`. * :code:`-b`: The batch size used during the export step for INT8 calibration cache generation (default: :code:`8`). * :code:`-m`: The maximum batch size for the TensorRT engine. The default value is :code:`16`. If you encounter out-of-memory issues, decrease the batch size accordingly. Sample Output Log ~~~~~~~~~~~~~~~~~ The following is a sample log for exporting a DetectNet_v2 model: .. code:: tlt-converter -d 3,544,960 -k nvidia_tlt -o output_cov/Sigmoid,output_bbox/BiasAdd /workspace/tlt-experiments/detectnet_v2/resnet18_pruned.etlt .. [INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [INFO] Detected 1 inputs and 2 output network tensors. .. Note:: To use the default :code:`tlt-converter` available in the Transfer Learning Toolkit package, append :code:`tlt` to the sample usage of the :code:`tlt_converter` as mentioned :ref:`here `. Once the model and/or TensorRT engine file has been generated, two extra files are required: 1. Label file 2. DS configuration file Label File ^^^^^^^^^^ The label file is a text file containing the names of the classes that the DetectNet_v2 model is trained to detect. The order in which the classes are listed here must match the order in which the model predicts the output. The export subtask in DetectNet_v2 generates this file when run with the :code:`--gen_ds_config` flag enabled. DeepStream Configuration File ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The detection model is typically used as a primary inference engine. It can also be used as a secondary inference engine. To run this model in the sample :code:`deepstream-app`, you must modify the existing :code:`config_infer_primary.txt` file to point to this model. .. image:: ../../content/dstream_deploy_options2.png **Option 1**: Integrate the model (:code:`.etlt`) directly in the DeepStream app. For this option, you will need to add the following parameters in the configuration file. The :code:`int8-calib-file` is only required for INT8 precision. .. code:: tlt-encoded-model= tlt-model-key= int8-calib-file= The :code:`tlt-encoded-model` parameter points to the exported model (:code:`.etlt`) from TLT. The :code:`tlt-model-key` is the encryption key used during model export. **Option 2**: Integrate the TensorRT engine file with DeepStream app. Step 1: Generate TensorRT engine using :code:`tlt-converter`. Detailed instructions are provided in the :ref:`Generating an engine using tlt-converter ` section above. Step 2: Once the engine file is generated successfully, modify the following parameters to use this engine with DeepStream. .. code:: model-engine-file= All other parameters are common between the two approaches. Update the :code:`label-file-path` parameter in the config file with the path to the :code:`labels.txt` that was generated at :ref:`export `. .. code:: labelfile-path= For all options, see the configuration file below. To learn more about all the parameters, refer to the `DeepStream Development Guide`_ under the `GsT-nvinfer`_ section. .. _DeepStream Development Guide: https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html .. _GsT-nvinfer: https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html .. code:: [property] gpu-id=0 # preprocessing parameters. net-scale-factor=0.0039215697906911373 model-color-format=0 # model paths. int8-calib-file= labelfile-path= tlt-encoded-model= tlt-model-key= infer-dims=c;h;w # where c = number of channels, h = height of the model input, w = width of model input uff-input-order=0 # 0 implies that the input blob is in chw order uff-input-blob-name=input_1 batch-size=4 ## 0=FP32, 1=INT8, 2=FP16 mode network-mode=0 num-detected-classes=3 interval=0 gie-unique-id=1 is-classifier=0 output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd #enable_dbscan=0 [class-attrs-all] threshold=0.2 group-threshold=1 ## Set eps=0.7 and minBoxes for enable-dbscan=1 eps=0.2 #minBoxes=3 roi-top-offset=0 roi-bottom-offset=0 detected-min-w=0 detected-min-h=0 detected-max-w=0 detected-max-h=0