UNET ============ .. _unet: UNet is a semantic segmentation model that supports the following tasks: * train * evaluate * inference * export These tasks may be invoked from the TLT launcher by following this convention from command line: .. code:: tlt unet where :code:`args_per_subtask` are the command line arguments required for a given subtask. Each of these subtasks is explained in detail below. Creating a Configuration File ----------------------------- .. _creating_a_configuration_file_unet: To perform training, evaluation, and inference for Unet, several components need to be configured, each with their own parameters. The :code:`train`, :code:`evaluate` and :code:`inference` tasks for a UNet experiment share the same configuration file. The specification file for Unet training configures these components for the training pipe: * Model * Trainer * Dataset Model Config ^^^^^^^^^^^^ .. _model_config_unet: Specifications for the segmentation model can be configured using the :code:`model_config` option in the spec file. The following is a sample model config to instantiate a resnet18 model with blocks 0 and 1 frozen with all shortcuts being set to projection layers: .. code:: # Sample model config for to instantiate a resnet18 model freeze blocks 0, 1 # with all shortcuts having projection layers. model_config { num_layers: 18 all_projections: true arch: "resnet" freeze_blocks: 0 freeze_blocks: 1 use_batch_norm: true training_precision { backend_floatx: FLOAT32 } model_input_height: 320 model_input_width: 320 model_input_channels: 3 } The following table describes the :code:`model_config` parameters: +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | **Parameter** | **Datatype** | **Default** | **Description** | **Supported Values** | +=======================+==================+=============+=================================================================================================================================================+========================================================================================================================================================+ | all_projections | bool | False | For templates with shortcut connections, this parameter defines whether or not all shortcuts should be instantiated with 1x1 | True/False (only to be used in resnet templates) | | | | | projection layers, irrespective of whether there is a change in stride across the input and output. | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | arch | string | resnet | The architecture of the backbone feature extractor to be used for training | | | | | | | resnet, vgg, vanilla_unet | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | num_layers | int | 18 | The depth of the feature extractor for scalable templates | | | | | | | * resnets: 10, 18, 34, 50, 101 | | | | | | * vgg: 16, 19 | | | | | | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | use_pooling | Boolean | False | A Boolean value that determines whether to use strided convolutions or MaxPooling while downsampling. When True, MaxPooling is used to | False/True | | | | | downsample; however, for an object detection network, we recommend setting this to False and using strided convolutions. | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | use_batch_norm | Boolean | False | A Boolean value that determines whether to use batch normalization layers or not | True/False | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | training precision | Proto Dictionary | -- | Contains a nested parameter that sets the precision of the back-end training framework | backend_floatx: FLOAT32 | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | load_graph | Boolean | False | A flag that determines whether to load the graph from the pretrained model file (with a False value, only the weights are loaded). For a pruned | True/False | | | | | model, set this parameter as True. Pruning modifies the original graph, hence both the pruned model graph and the weights need to be imported. | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | | | | | | | freeze_blocks | float | -- | This parameter defines which blocks may be frozen from the instantiated feature extractor template, and is different for different | * **ResNet series**: For the ResNet series, the block ID's valid for freezing is any subset of [0, 1, 2, 3](inclusive) | | | (repeated) | | feature extractor templates. | * **VGG series**: For the VGG series, the block ID's valid for freezing is any subset of [1, 2, 3, 4, 5](inclusive) | | | | | | | | | | | | | | | | | | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | freeze_bn | Boolean | False | You can choose to freeze the Batch | True/False | | | | | Normalization layers in the model during training. | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | model_input_height | int | -- | The model input height dimension of the model, which should be a multiple of 16. | >100 | | | | | | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | model_input_width | int | -- | The model input width dimension of the model, which should be a multiple of 16. | >100 | | | | | | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ | model_input_channels | int | -- | The model input channels dimension of the model, which should be set to 3 for a Resnet/VGG backbone. It can be set to 1 or 3 | 1/3 | | | | | for vanilla_unet based on the image input channel dimensions. If the input image channel is 1 and model input channels is set to 3 for vanilla | | | | | | unet, the input grayscale image is converted to RGB. | | +-----------------------+------------------+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+ .. Note:: The :code:`vanilla_unet` model was originally proposed in this paper: `U-Net: Convolutional Networks for Biomedical Image Segmentation `_. This model is recommended for the Binary Segmentation usecase. Training ^^^^^^^^ .. _training: This section outlines how to configure the training parameters. The following is an example :code:`training_config` element: .. code:: training_config { batch_size: 2 epochs: 3 log_summary_steps: 10 checkpoint_interval: 1 loss: "cross_dice_sum" learning_rate:0.0001 regularizer { type: L2 weight: 3.00000002618e-09 } optimizer { adam { epsilon: 9.99999993923e-09 beta1: 0.899999976158 beta2: 0.999000012875 } } } The following table describes the parameters for :code:`training_config`. .. TODO @julianak Add comment re: why are there extra lines in some rows in the Description column? +---------------------+--------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | **Parameter** | **Datatype** | **Default** | **Description** | **Supported Values** | +=====================+================================+===============+==========================================================================================================================================+========================================================================================+ | batch_size | int | 1 | The number of images per batch per gpu | >= 1 | +---------------------+--------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | epochs | int | None | The number of epochs to train the model. One epoch represents one iteration of training through the entire dataset. | > 1 | +---------------------+--------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | log_summary_steps | int | 1 | The summary-steps interval at which train details are printed out to the stdout | 1 - steps per epoch | +---------------------+--------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | checkpoint_interval | int | 1 | The number of epochs interval at which the checkpoint is saved | 1 - total number of epochs | +---------------------+--------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | loss | string | cross_entropy | The loss to be used for segmentation. The supported losses for tasks are as follows: | | | | | | | | | | | | | | cross_entropy, cross_dice_sum, dice | | | | | | | | | | |* Binary segmentation: Cross entropy, Dice loss, Cross entropy + dice loss (cross_dice_sum) | | | | | |* Multi-class segmentation: Cross entropy loss | | +---------------------+--------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | learning_rate | float | 0.0001 | The learning-rate initialization value. | 0 - 1 | +---------------------+--------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | regularizer | regularizer proto config | | This parameter configures the type and weight of the regularizer to be used during training. The two parameters | The supported values for type are: | | | | | include: | | | | | -- | | | | | | | | | | | | | | | | | | | | | * type: The type of the regularizer being used | * L2 | | | | | * weight: The floating point weight of the regularizer | | +---------------------+--------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ | optimizer | optimizer proto config | | This parameter defines which optimizer to use for training, and the parameters to configure it, namely: | | | | | | | | | | | | | | | | | | | | | | | | | * epsilon (float): Is a very small number to prevent any division by zero in the implementation | | | | | | * beta1 (float) | | | | | | * beta2 (float) | | +---------------------+--------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------+ .. Note:: Dice loss is currently supported only for binary segmentation. Generic Dice loss for multi-class segmentation is not supported. Dataset ^^^^^^^ .. _dataset_config_unet: This section helps you configure the dataset_config function. The following is an example :code:`dataset_config` element: .. code:: dataset_config { dataset: "custom" augment: False input_image_type: "grayscale" train_images_path:"/workspace/tlt-experiments/data/unet/isbi/images/train" train_masks_path:"/workspace/tlt-experiments/data/unet/isbi/masks/train" val_images_path:"/workspace/tlt-experiments/data/unet/isbi/images/val" val_masks_path:"/workspace/tlt-experiments/data/unet/isbi/masks/val" test_images_path:"/workspace/tlt-experiments/data/unet/isbi/images/test" data_class_config { target_classes { name: "foreground" mapping_class: "foreground" label_id: 0 } target_classes { name: "background" mapping_class: "background" label_id: 1 } } } The following tables describe the parameters used to configure :code: `dataset_config`: .. TODO @julianak Add comment re: Line 243 in Description section is indented. +---------------------+------------------+-------------+-----------------------------------------------------------------------------------------------------------------------------+------------------------------+ | **Parameter** | **Datatype** | **Default** | **Description** | **Supported Values** | +=====================+==================+=============+=============================================================================================================================+==============================+ | dataset | string | custom | The input type dataset used. The currently supported dataset is custom to the user. Open source datasets will be added | custom | | | | | in the future. | | +---------------------+------------------+-------------+-----------------------------------------------------------------------------------------------------------------------------+------------------------------+ | augment | bool | False | If the input should augmented online while training, the following augmentations are done at a probability of 0.5 | true / false | | | | | | | | | | | - Horizontal flip | | | | | | - Vertical flip | | | | | | - Random crop andesize | | | | | | - Random brightne | | | | | | | | +---------------------+------------------+-------------+-----------------------------------------------------------------------------------------------------------------------------+------------------------------+ | input_image_type | string | color | The input image type to indicate if input image is grayscale or color (RGB) | color/ grayscale | +---------------------+------------------+-------------+-----------------------------------------------------------------------------------------------------------------------------+------------------------------+ | train_images_path | string | None | The input train images path | UNIX path string | +---------------------+------------------+-------------+-----------------------------------------------------------------------------------------------------------------------------+------------------------------+ | train_masks_path | string | None | The input train masks path | UNIX path string | +---------------------+------------------+-------------+-----------------------------------------------------------------------------------------------------------------------------+------------------------------+ | val_images_path | string | None | The input validation images path | UNIX path string | +---------------------+------------------+-------------+-----------------------------------------------------------------------------------------------------------------------------+------------------------------+ | val_masks_path | string | None | The input validation masks path | UNIX path string | +---------------------+------------------+-------------+-----------------------------------------------------------------------------------------------------------------------------+------------------------------+ | test_images_path | string | None | The input test images path | UNIX path string | +---------------------+------------------+-------------+-----------------------------------------------------------------------------------------------------------------------------+------------------------------+ | target_classes | Proto Dictionary | | The repeated field for every training class. The following are required parameters for the target_classes config: | | | | | | | | | | | -- | * name (string): The name of the target class | | | | | | * mapping_class (string): The name of the mapping class for the target class. | | | | | | For example, "car" can be mapped to | | | | | | "vehicle". If the class needs to be trained as is, then | | | | | | `name` and `mapping_class` should be the same. | | | | | | * label_id (int): The pixel that belongs to this target class is assigned this `label_id` value in the mask image. | | | | | | | | +---------------------+------------------+-------------+-----------------------------------------------------------------------------------------------------------------------------+------------------------------+ .. Note:: The supported image extension formats for training images are ".png", ".jpg", ".jpeg", ".PNG", ".JPG", and ".JPEG". Training the Model ------------------ .. _training_the_model_unet: After preparing input data as per :ref:`these instructions here ` and setting up a :ref:`spec file `. You are now ready to start training a semantic segmentation network. UNet training command: .. code:: tlt unet train [-h] -k -r -e [-m ] [-n [--gpus ] [--gpu_index ] [--use_amp] Required Arguments ^^^^^^^^^^^^^^^^^^ * :code:`-r, --results_dir`: The path to a folder where experiment outputs should be written. * :code:`-k, –key`: A user-specific encoding key to save or load a :code:`.tlt` model. * :code:`-e, --experiment_spec_file`: The path to the spec file. Optional Arguments ^^^^^^^^^^^^^^^^^^ * :code:`-m, --pretrained_model_file`: The path to a pre-trained model to initialize. This parameter defaults to :code:`None`. * :code:`-n, --model_name`: The name that the final checkpoint will be saved as in the weights directory. The default value is :code:`model.tlt`. * :code:`--gpus`: The number of GPUs to use and processes to launch for training. The default value is 1. * :code:`--gpu_index`: The indices of the GPUs to use for training. The GPU indices are described in the :code:`./deviceQuery` CUDA samples. * :code:`--use_amp`: A flag that enables Automatic Mixed Precision mode * :code:`-h, --help`: Prints this help message. Sample Usage ^^^^^^^^^^^^ Here is an example of a command for two GPU training: .. code:: tlt unet train -e -r -k -n -m --gpus 2 .. Note:: UNet supports resuming training from intermediate checkpoints. If a previously running training experiment is stopped prematurely, you can restart the training from the last checkpoint by simply re-running the UNet training command with the same command-line arguments as before. The trainer for UNet finds the last saved checkpoint in the results directory and resumes the training from there. The interval at which the checkpoints are saved are defined by the `checkpoint_interval` parameter under the “training_config” for UNet. Do not use a pre-trained weights argument when resuming training. Evaluating the Model -------------------- .. _evaluating_the_model_unet: Execute :code:`evaluate` on a unet model as follows: .. code:: tlt unet evaluate [-h] -e -m -o -k [--gpu_index] Required Arguments ^^^^^^^^^^^^^^^^^^ * :code:`-e, --experiment_spec_file`: The experiment spec file for setting up the evaluation experiment. This should be the same as training spec file. * :code:`-m, --model_path`: The path to the model file to use for evaluation. This could be a :code:`.tlt` model file or a tensorrt engine generated using the :code:`export` tool. * :code:`-o, --output_dir`: The output dir where the evaluation metrics are saved as a JSON file. TLT inference is saved to :code:`output_dir/results_tlt.json` and TRT inference is saved to :code:`output_dir/results_trt.json`. The results JSON file has the precision, recall, f1-score, and IOU for every class. It also provides the weighted average, macro average and micro average for these metrics. For more information on the averaging metric, see the `classification report`_. * :code:`-k, -–key`: Provide the encryption key to decrypt the model. This is a required argument only with a :code:`.tlt` model file. .. _classification report: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html Optional Arguments ^^^^^^^^^^^^^^^^^^ * :code:`-h, --help`: Show this help message and exit. * :code:`--gpu_index`: The index of the GPU to run evaluation on If you have followed the example in :ref:`Training a Unet Model `, you may now evaluate the model using the following command: .. code:: tlt unet evaluate -e -m -o -k .. Note:: This command runs evaluation using the images and masks that are provided to :code:`val_images_path` and :code:`val_masks_path` in :code:`dataset_config`. Using Inference on the Model ---------------------------- .. _using_inference_on_the_model_unet: The :code:`inference` task for UNet may be used to visualize segmentation and generate frame-by-frame PNG format labels on a directory of images. An example of the command for this task is shown below: .. code:: tlt unet inference [-h] -e -m -o -k [--gpu_index] Required Parameters ^^^^^^^^^^^^^^^^^^^ * :code:`-e, --experiment_spec_file`: The path to an inference spec file. * :code:`-o, --output_dir`: The directory to the output annotated images and labels. The annotated images are in :code:`vis_overlay_tlt` and labels are in :code:`mask_labels_tlt`. The annotated images are saved in :code:`vis_overlay_trt` and predicted labels in :code:`mask_labels_trt` if the TRT engine is used for inference. * :code:`-k, --enc_key`: The key to load the model. The tool automatically generates segmentation overlayed images in :code:`output_dir/vis_overlay_tlt`. The labels will be generated in :code:`output_dir/mask_labels_tlt`. The annotated, segmented images and labels for :code:`trt` inference are saved in :code:`output_dir/vis_overlay_trt` and :code:`output_dir/mask_labels_trt` respectively. .. _here: https://docs.nvidia.com Exporting the Model ------------------- .. _exporting_the_model_unet: The UNet model application in the Transfer Learning Toolkit includes an :code:`export` sub-task to export and prepare a trained UNet model for :ref:`Deploying to DeepStream `. The :code:`export` sub-task optionally generates the calibration cache for TensorRT INT8 engine calibration. Exporting the model decouples the training process from deployment and allows conversion to TensorRT engines outside the TLT environment. TensorRT engines are specific to each hardware configuration and should be generated for each unique inference environment. This may be interchangeably referred to as the :code:`.trt` or :code:`.engine` file. The same exported TLT model may be used universally across training and deployment hardware. This is referred to as the :code:`.etlt` file, or encrypted TLT file. During model export, the TLT model is encrypted with a private key. This key is required when you deploy this model for inference. INT8 Mode Overview ^^^^^^^^^^^^^^^^^^ TensorRT engines can be generated in INT8 mode to run with lower precision, and thus improve performance. This process requires a cache file that contains scale factors for the tensors to help combat quantization errors, which may arise due to low-precision arithmetic. The calibration cache is generated using a calibration tensorfile when :code:`export` is run with the :code:`--data_type` flag set to :code:`int8`. Pre-generating the calibration information and caching it removes the need for calibrating the model on the inference machine. Moving the calibration cache is usually much more convenient than moving the calibration tensorfile since it is a much smaller file and can be moved with the exported model. Using the calibration cache also speeds up engine creation as building the cache can take several minutes to generate depending on the size of the Tensorfile and the model itself. The export tool can generate an INT8 calibration cache by ingesting training data. You will need to point the tool to a directory of images to use for calibrating the model. You will also need to create a sub-sampled directory of random images that best represent your training dataset. .. image:: ../../content/tlt_int8_calibration.png FP16/FP32 Model ^^^^^^^^^^^^^^^ The :code:`calibration.bin` is only required if you need to run inference at INT8 precision. For FP16/FP32 based inference, the export step is much simpler. All that is required is to provide a model from the :code:`train` step to :code:`export` to convert into an encrypted tlt model. .. image:: ../../content/fp16_fp32_export.png Exporting the UNet Model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Here's an example of the command line arguments for the :code:`export` command: .. code:: tlt unet export [-h] -m -k -e [-o ] [-s ] [--cal_data_file ] [--cal_image_dir ] [--data_type ] [--batches ] [--max_batch_size ] [--max_workspace_size ] [--engine_file ] [--verbose Verbosity of the logger] Required Arguments ****************** * :code:`-m, --model`: The path to the .tlt model file to be exported using :code:`export`. * :code:`-k, --key`: The key used to save the :code:`.tlt` model file. * :code:`-e, --experiment_spec`: The path to the spec file. Optional Arguments ****************** * :code:`-o, --output_file`: The path to save the exported model to. The default path is :code:`./.etlt`. * :code:`--data_type`: The engine data type for generating calibration cache if in INT8 mode. The options are :code:`fp32`, :code:`fp16`, and :code:`int8`. The default value is :code:`fp32`. If using int8, the :code:`int8` argument is required. * :code:`-s, --strict_type_constraints`: A Boolean flag to indicate whether or not to apply the TensorRT :code:`strict_type_constraints` when building the TensorRT engine. Note this is only for applying the strict type of INT8 mode. INT8 Export Mode Required Arguments ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * :code:`--cal_data_file`: The output file used with :code:`--cal_image_dir`. * :code:`--cal_image_dir`: The directory of images to use for calibration. .. Note:: If a valid path is provided to the :code:`--cal_data_file` argument over the command line, the export tool produces an intermediate TensorFile for re-use from random batches of images in the :code:`--cal_image_dir` directory of images . This tensorfile is used for calibration. If :code:`--cal_image_dir` is not provided, random input tensors are used for calibration. The number of batches in the generated tensorfile is obtained from the value set to the :code:`--batches` parameter, and the :code:`batch_size` is obtained from the value set to the :code:`--batch_size` parameter. Ensure that the directory mentioned in :code:`--cal_image_dir` has at least :code:`batch_size * batches` number of images in it. The valid image extensions are ".jpg", ".jpeg", and ".png". In this case, the :code:`input_dimensions` of the calibration tensors are derived from the input layer of the :code:`.tlt` model. INT8 Export Optional Arguments ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * :code:`--cal_cache_file`: The path to save the calibration cache file. The default value is :code:`./cal.bin`. * :code:`--batches`: The number of batches to use for calibration and inference testing. The default value is 10. * :code:`--batch_size`: The batch size to use for calibration. The default value is 8. * :code:`--max_batch_size`: The maximum batch size of the TensorRT engine. The default value is 16. * :code:`--max_workspace_size`: The maximum workspace size of the TensorRT engine. The default value is 1073741824 = 1<<30 * :code:`--experiment_spec`: The :code:`experiment_spec` for training/inference/evaluation. * :code:`--engine_file`: The path to the serialized TensorRT engine file. Note that this file is hardware specific and cannot be generalized across GPUs. The engine file allows you to quickly test your model accuracy using TensorRT on the host. Since a TensorRT engine file is hardware specific, you cannot use an engine file for deployment unless the deployment GPU is identical to the training GPU. .. Note:: UNet does not support QAT. Sample Usage for the Export Subtask ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Here's a sample command using the :code:`--cal_image_dir` option for a UNet model. .. code:: tlt unet export -m $USER_EXPERIMENT_DIR/unet/model.tlt -o $USER_EXPERIMENT_DIR/unet/model.int8.etlt -e $SPECS_DIR/unet_train_spec.txt --key $KEY --cal_image_dir $USER_EXPERIMENT_DIR/data/isbi/images/val --data_type int8 --batch_size 8 --batches 10 --cal_data_file $USER_EXPERIMENT_DIR/export/isbi_cal_data_file.txt --cal_cache_file $USER_EXPERIMENT_DIR/export/isbi_cal.bin --engine_file $USER_EXPERIMENT_DIR/export/int8.isbi.engine Deploying to Deepstream ----------------------- .. _deploying_to_deepstream_unet: The deep learning and computer vision models that you've trained can be deployed on edge devices, such as a Jetson Xavier or Jetson Nano, a discrete GPU, or in the cloud with NVIDIA GPUs. TLT has been designed to integrate with DeepStream SDK, so models trained with TLT will work out of the box with `DeepStream SDK`_. .. _Deepstream SDK: https://developer.nvidia.com/deepstream-sdk DeepStream SDK is a streaming analytic toolkit to accelerate building AI-based video analytic applications. This section will describe how to deploy a TLT UNet model to DeepStream SDK. To deploy a UNet model trained by TLT to DeepStream we have to generate a device specific optimized TensorRT engine using :code:`tlt-converter` which can then be ingested by DeepStream. Machine-specific optimizations are done as part of the engine creation process, so a distinct engine should be generated for each environment and hardware configuration. If the TensorRT or CUDA libraries of the inference environment are updated (including minor version updates), or if a new model is generated, new engines need to be generated. Running an engine that was generated with a different version of TensorRT and CUDA is not supported and will cause unknown behavior that affects inference speed, accuracy, and stability, or it may fail to run altogether. See :ref:`Exporting the Model ` for more details on how to export a TLT model. Generating an Engine Using tlt-converter ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. _generating_an_engine_using_tlt-converter_unet: This section outlines the steps required to create a TensorRT engine file as part of Option 2 mentioned in the previous section. The :code:`tlt-converter` tool is provided with TLT to facilitate the deployment of TLT-trained models on TensorRT and/or Deepstream. For deployment platforms with an x86-based CPU and discrete GPUs, the :code:`tlt-converter` is distributed within the TLT Docker. Therefore, we suggest using the Docker to generate the engine. However, this requires you to adhere to the same minor version of TensorRT as distributed with the Docker. The TLT Docker includes TensorRT version 7.1. To use the engine with a different minor version of TensorRT, download the converter from the `Developer Website`_. .. _Developer Website: https://developer.nvidia.com Instructions for x86 ******************** For an x86 platform with discrete GPUs, the default TLT package includes the :code:`tlt-converter` built for TensorRT 7.1 with CUDA 11.0 and CUDNN 8.0.3. However, for any other version of CUDA and TensorRT, visit the `Developer Website`_ for download. Once the :code:`tlt-converter` is downloaded, follow the instructions below to generate a TensorRT engine. .. _dev zone: https://developer.nvidia.com/tlt-converter-trt71 1. Unzip :code:`tlt-converter-trt7.x.zip` on the target machine. 2. Install the OpenSSL package using the command: .. code:: sudo apt-get install libssl-dev 3. Export the following environment variables: .. code:: $ export TRT_LIB_PATH=”/usr/lib/aarch64-linux-gnu” $ export TRT_INC_PATH=”/usr/include/aarch64-linux-gnu” 6. Run the :code:`tlt-converter` using the sample command below and generate the engine. Instructions for Jetson *********************** For the Jetson platform, the :code:`tlt-converter` is available to download from the `dev zone`_. Once the :code:`tlt-converter` is downloaded, follow the instructions below to generate a TensorRT engine. .. _dev zone: https://developer.nvidia.com/tlt-converter-trt71 1. Unzip :code:`tlt-converter-trt7.1.zip` on the target machine. 2. Install the OpenSSL package using the command: .. code:: sudo apt-get install libssl-dev 3. Export the following environment variables: .. code:: $ export TRT_LIB_PATH=”/usr/lib/aarch64-linux-gnu” $ export TRT_INC_PATH=”/usr/include/aarch64-linux-gnu” 4. For Jetson devices, TensorRT 7.1 comes pre-installed with `Jetpack`_. If you are using an older version of JetPack, upgrade to JetPack 4.4. 6. Run the :code:`tlt-converter` using the sample command below and generate the engine. .. Note:: Make sure to follow the output node names as mentioned in :ref:`Exporting the Model`. .. _Jetpack: https://developer.nvidia.com/embedded/jetpack Using the tlt-converter *********************** .. _using tlt-converter with UNet: .. code:: tlt-converter [-h] -k -p [-d ] [-o ] [-c ] [-e ] [-b ] [-m ] [-t ] [-w ] [-i ] [-s] [-u ] input_file Required Arguments ~~~~~~~~~~~~~~~~~~ * :code:`input_file`: The path to the :code:`.etlt` model exported using :code:`export`. * :code:`-p`: Optimization profiles for :code:`.etlt` models with dynamic shape. Use a comma-separated list of optimization profile shapes in the format :code:`,,,`, where each shape has the format: :code:`xxx`. This can be specified multiple times if there are multiple input tensors for the model. * :code:`-k`: The key used to encode the :code:`.tlt` model when doing the traning Optional Arguments ~~~~~~~~~~~~~~~~~~ * :code:`-e`: The path to save the engine to. The default path is default: :code:`./saved.engine`. Use :code:`.engine` or :code:`.trt` as an extension for the engine path. * :code:`-t`: The desired engine data type. This option generates a calibration cache if in INT8 mode. The default value is :code:`fp32`. The options are :code:`fp32`, :code:`fp16`, :code:`int8`. * :code:`-w`: The maximum workspace size for the TensorRT engine. The default value is :code:`1073741824(1<<30)`. * :code:`-i`: The input dimension ordering. The default value is :code:`nchw`. The options are :code:`nchw`, :code:`nhwc`, :code:`nc`. For UNet, we can omit this argument. * :code:`-s`: A Boolean value specifying whether to apply TensorRT strict type constraints when building the TensorRT engine. * :code:`-u`: Specifies the DLA core index when building the TensorRT engine on Jetson devices. * :code:`-d`: A comma-separated list of input dimensions that should match the dimensions used for :code:`export`. * :code:`-o`: A comma-separated list of output blob names that should match the output configuration used for :code:`export`. INT8 Mode Arguments ~~~~~~~~~~~~~~~~~~~ * :code:`-c`: The path to the calibration cache file for INT8 mode. The default path is :code:`./cal.bin`. * :code:`-b`: The batch size used during the :code:`export` step for INT8 calibration cache generation (default: :code:`8`). * :code:`-m`: The maximum batch size for the TensorRT engine. The default value is :code:`16`. If you encounter out-of-memory issues, decrease the batch size accordingly. This parameter is not required for :code:`.etlt` models generated with dynamic shape (which is only possible for new models introduced in TLT 3.0). Sample Output Log ~~~~~~~~~~~~~~~~~ Here is a sample log for exporting a UNet model. .. code:: tlt-converter -k $KEY -c $USER_EXPERIMENT_DIR/export/isbi_cal.bin -e $USER_EXPERIMENT_DIR/export/trt.int8.tlt.isbi.engine -t int8 -p input_1,1x1x572x572,4x1x572x572,16x1x572x572 /workspace/tlt-experiments/faster_rcnn/resnet18_pruned.epoch45.etlt .. [INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [INFO] Detected 1 inputs and 2 output network tensors. .. Note:: To use the default :code:`tlt-converter` available in the Transfer Learning Toolkit package, append :code:`tlt` to the sample usage of the :code:`tlt_converter` as mentioned :ref:`here `. Once the model and/or TensorRT engine file has been generated, two additional files are required: * Label file * DS configuration file Label File ^^^^^^^^^^ The label file is a text file containing the names of the classes that the UNet model is trained to segment. The order in which the classes are listed here must match the order in which the model predicts the output. This order is derived from the :code:`target_class_id_mapping.json` file that is saved in the :code:`results directory` after training. Here is an example of the :code:`target_class_id_mapping.json` file: .. code:: {"0": ["foreground"], "1": ["background"]} Here is an example of the corresponding :code:`unet_labels.txt` file. The order in the :code:`unet_labels.txt` should match the order in the :code:`target_class_id_mapping.json` keys: .. code:: foreground background DeepStream Configuration File ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The segmentation model is typically used as a primary inference engine. It can also be used as a secondary inference engine. Download :code:`ds-tlt` from `DeepStream tlt apps`_. .. _DeepStream tlt apps: https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps Follow these steps to use TensorRT engine file with the ds-tlt: 1. Generate the TensorRT engine using :code:`tlt-converter`. Detailed instructions are provided in the :ref:`Generating an engine using tlt-converter ` section. 2. Once the engine file is generated successfully, do the following to set up ds-tlt with DS 5.1. * Set :code:`NVDS_VERSION:=5.1` in :code:`apps/Makefile` and :code:`post_processor/Makefile` inside :code:`deepstream_tlt_apps` directory. This repository is downloaded from `DeepStream tlt apps`_. * Now, follow the instructions here to install ds-tlt: `DS Tlt installation`_. .. _DS Tlt installation: https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps#2-build-sample-application .. _deepstream source code: https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps/blob/release/tlt3.0/apps/tlt_segmentation/deepstream_seg_app.c#L46 * Change the output dimensions for UNet according to your model here: `deepstream source code`_. You need to change :code:`MODEL_OUTPUT_WIDTH` and :code:`MODEL_OUTPUT_HEIGHT` in the above source code to your model output dimensions. For example, For the Resnet18 - 3 channel model mentioned in this documentation, the lines will be changed to : .. code:: #define MODEL_OUTPUT_WIDTH 320 #define MODEL_OUTPUT_HEIGHT 320 To run this model in the sample :code:`ds-tlt`, you must modify the existing :code:`pgie_unet_tlt_config.txt` file here `unet tlt config`_. to point to this model. For all options, see the configuration file below. To learn more about the parameters, refer to the `DeepStream Development Guide`_. .. _DeepStream Development Guide: https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html .. _unet tlt config: https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps/blob/master/configs/unet_tlt/pgie_unet_tlt_config.txt .. code:: [property] gpu-id=0 net-scale-factor=0.007843 model-color-format=2 offsets=127.5 labelfile-path= ##Replace following path to your model file model-engine-file= #current DS cannot parse unet etlt model, so you need to #convert the etlt model to TensoRT engine first use tlt-convert infer-dims=c;h;w # where c = number of channels, h = height of the model input, w = width of model input. batch-size=1 ## 0=FP32, 1=INT8, 2=FP16 mode network-mode=2 num-detected-classes=2 interval=0 gie-unique-id=1 network-type=2 output-blob-names=softmax_1 segmentation-threshold=0.0 ##specify the output tensor order, 0(default value) for CHW and 1 for HWC segmentation-output-order=1 [class-attrs-all] roi-top-offset=0 roi-bottom-offset=0 detected-min-w=0 detected-min-h=0 detected-max-w=0 detected-max-h=0 An example of modified config file for :code:`resnet18`, 3-channel model trained on ISBI dataset is provided below: .. code:: [property] gpu-id=0 net-scale-factor=0.007843 # Since the model input channel is 3, using RGB color format. model-color-format=0 offsets=127.5;127.5;127.5 labelfile-path=/home/nvidia/deepstream_tlt_apps/configs/unet_tlt/unet_labels.txt ##Replace following path to your model file model-engine-file=/home/nvidia/deepstream_tlt_apps/models/unet/unet_resnet18_isbi.engine #current DS cannot parse onnx etlt model, so you need to #convert the etlt model to TensoRT engine first use tlt-convert infer-dims=3;320;320 batch-size=1 ## 0=FP32, 1=INT8, 2=FP16 mode network-mode=2 num-detected-classes=2 interval=0 gie-unique-id=1 network-type=2 output-blob-names=softmax_1 segmentation-threshold=0.0 ##specify the output tensor order, 0(default value) for CHW and 1 for HWC segmentation-output-order=1 [class-attrs-all] roi-top-offset=0 roi-bottom-offset=0 detected-min-w=0 detected-min-h=0 detected-max-w=0 detected-max-h=0 Below is the sample :code:`ds-tlt` command for inference on one image: .. code:: ds-tlt configs/unet_tlt/pgie_unet_tlt_config.txt image_isbi_rgb.jpg .. Note:: :code:`png` image format is not supported by DS. Inference image needs to be converted to :code:`.jpg`. Ensure to convert grayscale image to 3 channel image if the :code:`model_input_channels` is set to 3.