# BYOM UNET

Note

There are differences in running some of the subtasks for BYOM UNet. Most commands will be similar to the regular TAO UNet model.

UNet is a semantic segmentation model that supports the following tasks:

• train

• prune

• evaluate

• inference

• export

These tasks may be invoked from the TAO Toolkit Launcher using this convention from the command line:

Copy
Copied!

tao unet <sub_task> <args_per_subtask>


Where args_per_subtask are the command line arguments required for a given subtask. Each of these subtasks is explained in detail below.

## Creating a Configuration File

To perform training, evaluation, pruning, and inference for UNet, you will need to configure several components, each with their own parameters. The train, evaluate, prune, and inference tasks for a UNet experiment share the same configuration file.

The specification file for Unet training configures these components for the training pipeline:

• Model

• Trainer

• Dataset

### Model Config

The BYOM segmentation model can be configured using the model_config option in the spec file.

The following is a sample BYOM model config to instantiate a BYOM model from the TAO BYOM Converter:

Copy
Copied!

# Sample model config for to instantiate a resnet18 model freeze blocks 0, 1
# with all shortcuts having projection layers.

model_config {
arch: "byom"
# Pass the path of the converted BYOM model path
byom_model: "/path/to/your/byom/.tltb"
training_precision {
backend_floatx: FLOAT32
}
# the input image size should match that of your original ONNX model.
model_input_height: 320
model_input_width: 320
model_input_channels: 3
}


The Trainer and Dataset config for BYOM models are identical to the regular TAO models. For more information, refer to the unet page.

## Training the Model

After preparing input data as described in these instructions and setting up a spec file, you are now ready to start training a semantic segmentation network.

The following is the UNet training command:

Copy
Copied!

tao unet train [-h] -k <key>
-r <result directory>
-e <spec_file>
[-m <Pre-trained weights to initialize>]
[-n <name of the model>
[--gpus <num GPUs>]
[--gpu_index <space separate gpu indices>]
[--use_amp]


### Required Arguments

• -r, --results_dir: The path to a folder where experiment outputs should be written

• -k, –key: A user-specific encoding key to save or load a .tlt model

• -e, --experiment_spec_file: The path to the spec file

### Optional Arguments

• -m, --pretrained_model_file: The path to a pre-trained model to initialize. This parameter defaults to None. This parameter is configured to prune the model for re-training.

• -n, --model_name: The name that the final checkpoint will be saved as in the weights directory. The default value is model.tlt.

• --gpus: The number of GPUs to use and processes to launch for training. The default value is 1.

• --gpu_index: The indices of the GPUs to use for training. The GPU indices are described in the ./deviceQuery CUDA samples.

• --use_amp: A flag that enables Automatic Mixed Precision mode

• -h, --help: Prints this help message.

Note

BYOM UNet does not support updating the number of classes. You must convert an ONNX model that has an output size of (N, C, H, W), where C stands for number of classes in the target dataset.

### Input Requirement

• Input size: C * W * H (where C = 3 or 1, W = 572, H = 572 for vanilla unet and W >= 128, H >= 128 and W, H are multiples of 32 for other archs).

• Image format: JPG, JPEG, PNG, BMP

Note

The images and masks need not be equal to the model input size. The images/masks will be resized to the model input size during training.

### Sample Usage

Here is an example of a command for two-GPU training:

Copy
Copied!

tao unet train -e </path/to/spec/file>
-r </path/to/experiment/output>
-n <name_string_for_the_model>
-m <Pre-trained weights to initialize the model>
--gpus 2


Note

UNet supports resuming training from intermediate checkpoints. If a previously running training experiment is stopped prematurely, you can restart the training from the last checkpoint by simply re-running the UNet training command with the same command-line arguments as before. The trainer for UNet finds the last saved checkpoint in the results directory and resumes the training from there. The interval at which the checkpoints are saved are defined by the checkpoint_interval parameter under the “training_config” for UNet. Do not use a pre-trained weights argument when resuming training.

Note

UNet supports Tensorboard visualization for losses. The tensorboard logs are saved in the output directory in order to visualize them.

## Pruning the Model

Pruning removes parameters from the model to reduce the model size without compromising the integrity of the model itself using the prune command.

The prune task includes these parameters:

Copy
Copied!

tao unet prune [-h] -m <pretrained_model>
-e <spec_file>
-o <output_file>
-k <key>
[-n <normalizer>]
[-eq <equalization_criterion>]
[-pg <pruning_granularity>]
[-pth <pruning threshold>]
[-nf <min_num_filters>]
[-el [<excluded_list>]


### Required Arguments

• -m, --pretrained_model: The path to the model to be pruned. Usually, the last epoch model is used.

• -e, --experiment_spec_file: The path to the spec file

• -o, --output_file: The path to the pruned model

• -k, --key: The key to load a .tlt model

### Optional Arguments

• -h, --help: Show this help message and exit.

• -n, –normalizer: Specify max to normalize by dividing each norm by the maximum norm within a layer; specify L2 to normalize by dividing by the L2 norm of the vector comprising all kernel norms. The default value is max.

• -eq, --equalization_criterion: Criteria to equalize the stats of inputs to an element-wise op layer or depth-wise convolutional layer. This parameter is useful for resnets and mobilenets. The options are arithmetic_mean, geometric_mean, union, and intersection. The default value is union.

• -pg, -pruning_granularity: The number of filters to remove at a time. The default value is 8.

• -pth: The threshold to compare the normalized norm against. The default value is :0.1.

• -nf, --min_num_filters: The minimum number of filters to keep per layer. The default value is 16.

• -el, --excluded_layers: A list of excluded layers (e.g. -i item1 item2). The default value is [].

After pruning, the model needs to be retrained. See Re-training the Pruned Model for more details.

Note

Evaluation and inference are not directly supported for pruned models. You must re-train a pruned model before pefroming evaluation and inference.

### Using the Prune Command

Here’s an example of using the prune task:

Copy
Copied!

tao unet prune -e </path/to/spec/file>
-m </path/to/weights to be pruned>
-o </path/to/pruned weights>
-eq union
-pth 0.7
-k $KEY  ## Re-training the Pruned Model Once the model has been pruned, there might be a slight decrease in accuracy because some previously useful weights may have been removed. To regain the accuracy, we recommend retraining this pruned model over the same dataset using the train command, as documented in the Training the model section, with the -m, --pretrained_model argument pointing to the newly pruned model as the pretrained model file. We recommend setting the regularizer weight to zero in the training_config for UNet to recover the accuracy when retraining a pruned model. All other parameters may be retained in the spec file from the previous training. To load the pruned model, as well as for re-training, set the load_graph flag under model_config to true. ## Evaluating the Model Execute evaluate on a UNet model as follows: Copy Copied!  tao unet evaluate [-h] -e <experiment_spec> -m <model_file> -o <output folder> -k <key> [--gpu_index]  ### Required Arguments • -e, --experiment_spec_file: The experiment spec file for setting up the evaluation experiment. This should be the same as the training spec file. • -m, --model_path: The path to the model file to use for evaluation. This could be a .tlt model file or a tensorrt engine generated using the export tool. • -o, --output_dir: The output dir where the evaluation metrics are saved as a JSON file. TAO inference is saved to output_dir/results_tlt.json and TRT inference is saved to output_dir/results_trt.json. The results JSON file has the precision, recall, f1-score, and IOU for every class. It also provides the weighted average, macro average and micro average for these metrics. For more information on the averaging metric, see the classification report. • -k, -–key: The encryption key to decrypt the model. This argument is only required with a .tlt model file. ### Optional Arguments • -h, --help: Show this help message and exit. • --gpu_index: The index of the GPU to run evaluation on If you have followed the example in Training a Unet Model, you may now evaluate the model using the following command: ### Sample Usage Here is an example of a command for evaluating the model: Copy Copied!  tao unet evaluate -e </path/to/training/spec/file> -m </path/to/the/model> -o </path/to/evaluation/output> -k <key to load the model>  Note This command runs evaluation using the images and masks that are provided to val_images_path and val_masks_path or the text files provided under val_data_sourcesin :code:dataset_config. ## Using Inference on the Model The inference task for UNet may be used to visualize segmentation and generate frame-by-frame PNG format labels on a directory of images. An example of the command for this task is shown below: Copy Copied!  tao unet inference [-h] -e <experiment_spec> -m <model_file> -o <output folder to save inference images> -k <key> [--gpu_index]  ### Required Parameters • -e, --experiment_spec_file: The path to an inference spec file • -o, --output_dir: The directory to the output annotated images and labels. The annotated images are in vis_overlay_tlt and labels are in mask_labels_tlt. The annotated images are saved in vis_overlay_trt and predicted labels in mask_labels_trt if the TRT engine is used for inference. • -k, --enc_key: The key to load the model The tool automatically generates segmentation overlayed images in output_dir/vis_overlay_tlt. The labels will be generated in output_dir/mask_labels_tlt. The annotated, segmented images and labels for trt inference are saved in output_dir/vis_overlay_trt and output_dir/mask_labels_trt, respectively. ## Exporting the Model The UNet model application in the TAO Toolkit includes an export sub-task to export and prepare a trained UNet model for Deploying to DeepStream. The export sub-task optionally generates the calibration cache for TensorRT INT8 engine calibration. Exporting the model decouples the training process from deployment and allows conversion to TensorRT engines outside the TAO environment. TensorRT engines are specific to each hardware configuration and should be generated for each unique inference environment. This may be interchangeably referred to as the .trt or .engine file. The same exported TAO model may be used universally across training and deployment hardware. This is referred to as the .etlt file, or encrypted TAO file. During model export, the TAO model is encrypted with a private key. This key is required when you deploy this model for inference. ### INT8 Mode Overview TensorRT engines can be generated in INT8 mode to run with lower precision, and thus improve performance. This process requires a cache file that contains scale factors for the tensors to help combat quantization errors, which may arise due to low-precision arithmetic. The calibration cache is generated using a calibration tensorfile when export is run with the --data_type flag set to int8. Pre-generating the calibration information and caching it removes the need for calibrating the model on the inference machine. Moving the calibration cache is usually much more convenient than moving the calibration tensorfile since it is a much smaller file and can be moved with the exported model. Using the calibration cache also speeds up engine creation as building the cache can take several minutes to generate depending on the size of the Tensorfile and the model itself. The export tool can generate an INT8 calibration cache by ingesting training data. You will need to point the tool to a directory of images to use for calibrating the model. You will also need to create a sub-sampled directory of random images that best represent your training dataset. ### FP16/FP32 Model The calibration.bin is only required if you need to run inference at INT8 precision. For FP16/FP32 based inference, the export step is much simpler. All that is required is to provide a model from the train step to export to convert into an encrypted TAO model. ### Exporting the BYOM UNet Model Here’s an example of the command line arguments for the export command: Copy Copied!  tao unet export [-h] -m </path/to the .tlt model file generated by tao train> -k <key> -e </path/to/experiment/spec_file> [-o </path/to/output/file>] [-s <strict_type_constraints>] [--cal_data_file </path/to/tensor/file>] [--cal_image_dir </path/to/the/directory/images/to/calibrate/the/model] [--cal_cache_file </path/to/output/calibration/file>] [--data_type <Data type for the TensorRT backend during export>] [--batches <Number of batches to calibrate over>] [--max_batch_size <maximum trt batch size>] [--max_workspace_size <maximum workspace size] [--batch_size <batch size to TensorRT engine>] [--engine_file </path/to/the/TensorRT/engine_file>] [--gen_ds_config] <Flag to generate ds config and label file>] [--verbose Verbosity of the logger]  #### Required Arguments • -m, --model: The path to the .tlt model file to be exported using export • -k, --key: The key used to save the .tlt model file • -e, --experiment_spec: The path to the spec file #### Optional Arguments • -o, --output_file: The path to save the exported model to. The default path is ./<input_file>.etlt. • --data_type: The engine data type for generating calibration cache if in INT8 mode. The options are fp32, fp16, and int8. The default value is fp32. If using int8, the int8 argument is required. • --gen_ds_config: A Boolean flag indicating whether to generate the template DeepStream related configuration (nvinfer_config.txt) as well as a label file (labels.txt) in the same directory as the output_file. Note that the config file is not a complete configuration file and requires the user to update the sample config files in DeepStream with the parameters generated. • -s, --strict_type_constraints: A Boolean flag to indicate whether or not to apply the TensorRT strict_type_constraints when building the TensorRT engine. Note this is only for applying the strict type of INT8 mode. ### INT8 Export Mode Required Arguments • --cal_data_file: The output file used with --cal_image_dir. • --cal_image_dir: The directory of images to use for calibration. Note If a valid path is provided to the --cal_data_file argument over the command line, the export tool produces an intermediate TensorFile for re-use from random batches of images in the --cal_image_dir directory of images. This tensorfile is used for calibration. If --cal_image_dir is not provided, random input tensors are used for calibration. The number of batches in the generated tensorfile is obtained from the value set to the --batches parameter, and the batch_size is obtained from the value set to the --batch_size parameter. Ensure that the directory mentioned in --cal_image_dir has at least batch_size * batches number of images in it. The valid image extensions are .jpg, .jpeg, and .png. In this case, the input_dimensions of the calibration tensors are derived from the input layer of the .tlt model. ### INT8 Export Optional Arguments • --cal_cache_file: The path to save the calibration cache file. The default value is ./cal.bin. • --batches: The number of batches to use for calibration and inference testing. The default value is 10. • --batch_size: The batch size to use for calibration. The default value is 8. • --max_batch_size: The maximum batch size of the TensorRT engine. The default value is 1. • --min_batch_size: The minimum batch size of the TensorRT engine. The default value is 1. • --opt_batch_size: The optimum batch size of the TensorRT engine. The default value is 1. • --max_workspace_size: The maximum workspace size of the TensorRT engine. The default value is 1073741824 = 1<<30 • --experiment_spec: The experiment_spec for training/inference/evaluation. • --engine_file: The path to the serialized TensorRT engine file. Note that this file is hardware specific and cannot be generalized across GPUs. The engine file allows you to quickly test your model accuracy using TensorRT on the host. Since a TensorRT engine file is hardware specific, you cannot use an engine file for deployment unless the deployment GPU is identical to the training GPU. Note UNet BYOM does not support QAT. ### Sample Usage for the Export Subtask Here’s a sample command using the --cal_image_dir option for a UNet model. Copy Copied!  tao unet export -m$USER_EXPERIMENT_DIR/unet/model.tlt
-o $USER_EXPERIMENT_DIR/unet/model.int8.etlt -e$SPECS_DIR/unet_train_spec.txt
--key $KEY --cal_image_dir$USER_EXPERIMENT_DIR/data/isbi/images/val
--data_type int8
--batch_size 8
--batches 10
--cal_data_file $USER_EXPERIMENT_DIR/export/isbi_cal_data_file.txt --cal_cache_file$USER_EXPERIMENT_DIR/export/isbi_cal.bin
$export TRT_INC_PATH=”/usr/include/x86_64-linux-gnu”  1. Run the tao-converter using the sample command below and generate the engine. Note Make sure to follow the output node names as mentioned in Exporting the Model section of the respective model. ### Instructions for Jetson For the Jetson platform, the tao-converter is available to download in the NVIDIA developer zone. You may choose the version you wish to download as listed in the overview section. Once the tao-converter is downloaded, please follow the instructions below to generate a TensorRT engine. 1. Unzip the zip file on the target machine. 2. Install the OpenSSL package using the command: Copy Copied!  sudo apt-get install libssl-dev  3. Export the following environment variables: Copy Copied!  $ export TRT_LIB_PATH=”/usr/lib/aarch64-linux-gnu”
$export TRT_INC_PATH=”/usr/include/aarch64-linux-gnu”  1. For Jetson devices, TensorRT comes pre-installed with Jetpack. If you are using older JetPack, upgrade to JetPack-5.0DP. 2. Run the tao-converter using the sample command below and generate the engine. Note Make sure to follow the output node names as mentioned in Exporting the Model section of the respective model. #### Using the tao-converter Copy Copied!  tao-converter [-h] -k <encryption_key> -p <optimization_profiles> [-d <input_dimensions>] [-o <comma separated output nodes>] [-c </path/to/calibration/cache_file>] [-e </path/to/output/engine>] [-b <calibration batch size>] [-m <maximum batch size of the TRT engine>] [-t <engine datatype>] [-w <maximum workspace size of the TRT Engine>] [-i <input dimension ordering>] [-s] [-u <DLA_core>] input_file  ##### Required Arguments • input_file: The path to the .etlt model exported using export • -p: Optimization profiles for .etlt models with dynamic shape. Use a comma-separated list of optimization profile shapes in the format <input_name>,<min_shape>,<opt_shape>,<max_shape>, where each shape has the format: <n>x<c>x<h>x<w>. • -k: The key used to encode the .tlt model when doing the traning ##### Optional Arguments • -e: The path to save the engine to. The default path is ./saved.engine. Use .engine or .trt as an extension for the engine path. • -t: The desired engine data type. This option generates a calibration cache if in INT8 mode. The default value is fp32. The options are fp32, fp16, and int8. • -w: The maximum workspace size for the TensorRT engine. The default value is 1073741824(1<<30). • -i: The input dimension ordering. The default value is nchw. The options are nchw, nhwc, nc. For UNet, you can omit this argument. • -s: A Boolean value specifying whether to apply TensorRT strict type constraints when building the TensorRT engine • -u: Specifies the DLA core index when building the TensorRT engine on Jetson devices • -d: A comma-separated list of input dimensions that should match the dimensions used for export • -o: A comma-separated list of output blob names that should match the output configuration used for export ##### INT8 Mode Arguments • -c: The path to the calibration cache file for INT8 mode. The default path is ./cal.bin. • -b: The batch size used during the export step for INT8 calibration cache generation (default: 8). • -m: The maximum batch size for the TensorRT engine. The default value is 16. If you encounter out-of-memory issues, decrease the batch size accordingly. This parameter is not required for .etlt models generated with dynamic shape (which is only possible for new models introduced in TAO Toolkit 3.21.08 or later). ##### Sample Output Log Here is a sample log for exporting a BYOM UNet model. Copy Copied!  tao-converter -k$KEY
-c $USER_EXPERIMENT_DIR/export/isbi_cal.bin -e$USER_EXPERIMENT_DIR/export/trt.int8.tlt.isbi.engine
-t int8
-p input_1:0,1x1x572x572,4x1x572x572,16x1x572x572
/workspace/tao-experiments/faster_rcnn/resnet18_pruned.epoch45.etlt
..
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 2 output network tensors.


Note

To use the default tao-converter available in the TAO Toolkit package, append tao to the sample usage of the tao_converter as mentioned here.

Once the model and/or TensorRT engine file has been generated, two additional files are required:

• Label file

• Deepstream configuration file

### Label File

The label file is a text file containing the names of the classes that the UNet model is trained to segment. The order in which the classes are listed here must match the order in which the model predicts the output. This order is derived from the target_class_id_mapping.json file that is saved in the results directory after training. Here is an example of the target_class_id_mapping.json file:

Copy
Copied!

{"0": ["foreground"], "1": ["background"]}


Here is an example of the corresponding unet_labels.txt file. The order in the unet_labels.txt should match the order in the target_class_id_mapping.json keys:

Copy
Copied!

foreground
background


### DeepStream Configuration File

The segmentation model is typically used as a primary inference engine. It can also be used as a secondary inference engine. Download ds-tlt from the deepstream_tao_apps repository.

Follow these steps to use TensorRT engine file with ds-tlt:

1. Generate the TensorRT engine using tao-converter. Detailed instructions are provided in the Generating an engine using tao-converter section.

2. Once the engine file is generated successfully, do the following to set up ds-tlt with DS 5.1.

1. Set NVDS_VERSION:=5.1 in apps/Makefile and post_processor/Makefile inside the deepstream_tlt_apps directory. This repository is downloaded from deepstream_tao_apps.

2. Follow the Deepstream TAO installation instructions here to install ds-tlt.

1. Change the output dimensions for UNet according to your model here: deepstream source code. Change MODEL_OUTPUT_WIDTH and MODEL_OUTPUT_HEIGHT in the above source code to your model output dimensions.

For example, for the Resnet18 3-channel model mentioned in this documentation, the lines will be changed as follows:

Copy
Copied!

#define MODEL_OUTPUT_WIDTH 320
#define MODEL_OUTPUT_HEIGHT 320


To run this model in the sample ds-tlt, you must modify the existing pgie_unet_tlt_config.txt file here. to point to this model. For all options, see the configuration file below. To learn more about the parameters, refer to the DeepStream Development Guide.

Copy
Copied!

[property]
gpu-id=0
net-scale-factor=0.007843
model-color-format=2
offsets=127.5
labelfile-path=</Path/to/unet_labels.txt>
##Replace following path to your model file
model-engine-file=<Path/to/tensorrt engine generated by tao-converter>
#current DS cannot parse unet etlt model, so you need to
#convert the etlt model to TensoRT engine first use tao-converter
infer-dims=c;h;w # where c = number of channels, h = height of the model input, w = width of model input.
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=2
interval=0
gie-unique-id=1
network-type=2
output-blob-names=softmax_1
segmentation-threshold=0.0
##specify the output tensor order, 0(default value) for CHW and 1 for HWC
segmentation-output-order=1

[class-attrs-all]
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0


An example of the modified config file for the Resnet18 3-channel model trained on ISBI dataset is provided below:

Copy
Copied!

[property]
gpu-id=0
net-scale-factor=0.007843

# Since the model input channel is 3, using RGB color format.
model-color-format=0
offsets=127.5;127.5;127.5
labelfile-path=/home/nvidia/deepstream_tlt_apps/configs/unet_tlt/unet_labels.txt
##Replace following path to your model file
model-engine-file=/home/nvidia/deepstream_tlt_apps/models/unet/unet_resnet18_isbi.engine
#current DS cannot parse onnx etlt model, so you need to
#convert the etlt model to TensoRT engine first use tao-converter
infer-dims=3;320;320
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=2
interval=0
gie-unique-id=1
network-type=2
output-blob-names=softmax_1
segmentation-threshold=0.0
##specify the output tensor order, 0(default value) for CHW and 1 for HWC
segmentation-output-order=1

[class-attrs-all]
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0


Below is the sample ds-tlt command for inference on one image:

Copy
Copied!

ds-tlt configs/unet_tlt/pgie_unet_tlt_config.txt image_isbi_rgb.jpg


Note

The .png image format is not supported by Deepstream, so the inference image needs to be converted to .jpg. If the model_input_channels` is set to 3, ensure grayscale images are converted to 3-channel images.