Migrating from TAO Toolkit 3.x to TAO Toolkit 4.0

NVIDIA TAO Release 4.0.1

Container Mapping

TAO 4.0 consolidates all containers under one name with different tags. If you are using TAO directly at the container level, you need to change the name and tag to use the latest version. If you are using the TAO launcher CLI, then the containers will be upgraded automatically when you upgrade the launcher.

The old containers will not be displayed on NGC, but you can still pull them. In the future, these might be deprecated. There are two new containers and two with functionalities that are merged with other containers.

Public Display Name

Current Container Name

Current container tag (most recent)

New Display Name

New container name

New Container Tag

TAO Toolkit for CV

tao-toolkit-tf

v3.22.05-tf1.15.4-py3

TAO Toolkit

tao-toolkit

Merged into 4.0.0-tf1.15.5

tao-toolkit-tf

v3.22.05-tf1.15.5-py3

TAO Toolkit

tao-toolkit

4.0.0-tf1.15.5

tao-toolkit-tf

v3.22.05-beta-api

TAO Toolkit

tao-toolkit

4.0.0-api

Didn’t Exist

Didn’t exist

N/A

TAO Toolkit

tao-toolkit

4.0.0-tf2.9.1 (new)

TAO Toolkit for ConvAI

tao-toolkit-pyt

v3.22.05-py3

TAO Toolkit

tao-toolkit

4.0.0-pyt

TAO Toolkit for Language model

tao-toolkit-lm

v3.22.05-py3

TAO Toolkit

tao-toolkit

Merged into 4.0.0-pyt

Didn’t Exist

Didn’t exist

N/A

TAO Toolkit

tao-toolkit

4.0.0-deploy (new)

There are minor interface changes from TAO Toolkit 3.x (21.08, 21.11, 22.02, 22.05) to TAO Toolkit 4.0. This may affect you if you are using older notebooks, have the TAO workflow integrated into your own applications, or are training directly in the containers. If you use the newer notebooks from the TAO Getting Started, then this doesn’t apply, as these notebooks have already been updated.

TAO 4.0 has disaggregated the hybrid training-deployment container to separate training and deployment containers. Since the libraries for training and deployment are completely different, this allows for rapid development and updates to individual components.

The training container contains deep learning frameworks like TensorFlow and PyTorch, but the libraries and entrypoint to make the trained models deploy/inference ready has now been moved to the new deploy container. The deploy container now handles the generation of TensorRT engine and INT8 calibration caches, as well as TensorRT model evaluation and inference.

The image below highlights the changes related to INT8 calibration generation and TensorRT model evaluation. If you are training directly from the containers, then you will need to separately pull the tao-deploy container to run TensorRT conversion and evaluation. If you are using the launcher CLI or API, then this will be handled automatically by the CLI or API.

tao_deploy_workflow.jpg

  • TAO TensorFlow1 Training container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5

  • TAO TensorFlow1 Training container for MaskRCNN and UNet: nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5

  • TAO TensorFlow2 Training container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf2.9.1

  • TAO PyTorch Training Container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt

  • TAO deploy container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-deploy

This change only effects the models in the table below. For other models, the deploy artifacts are still contained in the training container and will be migrated out in the future.

TensorFlow 1.x

TensorFlow 2.x

PyTorch

Classification

Classification

Deformable DETR

DetectNet_v2

EfficientDet

Segformer

DSSD

EfficientDet

Faster RCNN

LPRNet

Mask RCNN

Multitask Classification

RetinaNet

SSD

UNet

YOLOv3

YOLOv4

YOLOv4_tiny

The detailed changes per network are provided in the table below. The commands are taken from the TAO Jupyter notebooks. Most representative networks have been included, and models introduced in 4.0 are not included.

Network

TAO Toolkit 3.x (21.08, 21.11, 22.02, 22.05)

TAO Toolkit 4.0

Classification

Copy
Copied!
            

tao classification export \ -m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_$EPOCH.tlt \ -o $USER_EXPERIMENT_DIR/export/final_model.etlt \ -k $KEY \ --cal_data_file $USER_EXPERIMENT_DIR/export/calibration.tensor \ --data_type int8 \ --batches 10 \ --cal_cache_file $USER_EXPERIMENT_DIR/export/final_model_int8_cache.bin \ --classmap_json $USER_EXPERIMENT_DIR/output_retrain/classmap.json \ --gen_ds_config tao converter $USER_EXPERIMENT_DIR/export/final_model.etlt \ -k $KEY \ -c $USER_EXPERIMENT_DIR/export/final_model_int8_cache.bin \ -o predictions/Softmax \ -d 3,224,224 \ -i nchw \ -m 64 -t int8 \ -e $USER_EXPERIMENT_DIR/export/final_model.trt \ -b 64

Copy
Copied!
            

tao classification_tf1 export \ -m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_$EPOCH.tlt \ -o $USER_EXPERIMENT_DIR/export/final_model.etlt \ -k $KEY \ --classmap_json $USER_EXPERIMENT_DIR/output_retrain/classmap.json \ --gen_ds_config tao-deploy classification_tf1 gen_trt_engine \ -m $USER_EXPERIMENT_DIR/export/final_model.etlt \ -e $SPECS_DIR/classification_retrain_spec.cfg \ -k $KEY \ --batch_size 64 \ --max_batch_size 64 \ --batches 10 \ --data_type int8 \ --cal_data_file $USER_EXPERIMENT_DIR/export/calibration.tensor \ --cal_cache_file $USER_EXPERIMENT_DIR/export/final_model_int8_cache.bin \ --cal_image_dir $DATA_DOWNLOAD_DIR/split/test/ \ --engine_file $USER_EXPERIMENT_DIR/export/final_model.trt

DetectNet_v2

Copy
Copied!
            

tao detectnet_v2 export \ -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \ -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \ -k $KEY \ --cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor \ --data_type int8 \ --batches 10 \ --batch_size 4 \ --max_batch_size 4\ --engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8 \ --cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \ --verbose tao converter $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \ -k $KEY \ -c $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \ -o output_cov/Sigmoid,output_bbox/BiasAdd \ -d 3,384,1248 \ -i nchw \ -m 64 \ -t int8 \ -e $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt \ -b 4

Copy
Copied!
            

tao detectnet_v2 export \ -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \ -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \ -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \ -k $KEY \ --gen_ds_config tao-deploy detectnet_v2 gen_trt_engine \ -m $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \ -k $KEY \ --data_type int8 \ --batches 10 \ --batch_size 4 \ --max_batch_size 64 \ --engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8 \ --cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \ -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \ --verbose

EfficientDet

Copy
Copied!
            

tao efficientdet export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.tlt \ -o $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.etlt \ -k $KEY \ -e $SPECS_DIR/efficientdet_d0_retrain.txt \ --batch_size 8 \ --data_type int8 \ --cal_image_dir $DATA_DOWNLOAD_DIR/raw-data/val2017 \ --batches 10 \ --max_batch_size 1 \ --cal_cache_file $USER_EXPERIMENT_DIR/export/efficientdet_d0.cal tao converter -k $KEY \ -c $USER_EXPERIMENT_DIR/export/trt.int8.cal \ -p image_arrays:0,1x512x512x3,8x512x512x3,16x512x512x3 \ -e $USER_EXPERIMENT_DIR/export/trt.int8.engine \ -t int8 \ -b 8 \ $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.etlt

Copy
Copied!
            

tao efficientdet_tf1 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.tlt \ -o $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.etlt \ -k $KEY \ -e $SPECS_DIR/efficientdet_d0_retrain.txt tao-deploy efficientdet_tf1 gen_trt_engine -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.etlt \ -k $KEY \ --batch_size 8 \ --data_type int8 \ --cal_image_dir $DATA_DOWNLOAD_DIR/raw-data/val2017 \ --batches 10 \ --min_batch_size 1 \ --opt_batch_size 8 \ --max_batch_size 16 \ --cal_cache_file $USER_EXPERIMENT_DIR/export/efficientdet_d0.cal \ --engine_file $USER_EXPERIMENT_DIR/export/trt.int8.engine

SSD

Copy
Copied!
            

tao ssd export --gpu_index=$GPU_INDEX \ -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt \ -o $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt \ -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \ -k $KEY \ --cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \ --data_type int8 \ --batch_size 16 \ --batches 10 \ --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin \ --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \ --gen_ds_config tao converter -k $KEY \ -d 3,300,300 \ -o NMS \ -c $USER_EXPERIMENT_DIR/export/cal.bin \ -e $USER_EXPERIMENT_DIR/export/trt.engine \ -b 8 \ -m 16 \ -t int8 \ -i nchw \ $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt

Copy
Copied!
            

tao ssd export --gpu_index=$GPU_INDEX \ -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt \ -k $KEY \ -o $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt \ -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \ --batch_size 16 \ --gen_ds_config tao-deploy ssd gen_trt_engine --gpu_index=$GPU_INDEX \ -m $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt \ -k $KEY \ -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \ --engine_file $USER_EXPERIMENT_DIR/export/trt.engine \ --cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \ --data_type int8 \ --max_batch_size 16 \ --batch_size 16 \ --batches 10 \ --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin \ --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile

UNet

Copy
Copied!
            

tao unet export --gpu_index=$GPU_INDEX -m $USER_EXPERIMENT_DIR/isbi_experiment_retrain/weights/model_isbi_retrained.tlt \ -k $KEY \ -e $SPECS_DIR/unet_train_resnet_unet_isbi_retrain.txt \ --data_type int8 \ --engine_file $USER_EXPERIMENT_DIR/export/int8.isbi.retrained.engine \ --data_type int8 \ --cal_data_file $USER_EXPERIMENT_DIR/export/isbi_cal_data_file.txt \ --cal_cache_file $USER_EXPERIMENT_DIR/export/isbi_cal.bin \ --cal_image_dir $DATA_DOWNLOAD_DIR/isbi/images/val \ --max_batch_size 3 \ --batch_size 1 \ --gen_ds_config tao converter -k $KEY \ -c $USER_EXPERIMENT_DIR/export/isbi_cal.bin \ -e $USER_EXPERIMENT_DIR/export/trt.int8.tlt.isbi.engine \ -i nchw \ -t int8 \ -p input_1:0,1x1x320x320,4x1x320x320,16x1x320x320 \ $USER_EXPERIMENT_DIR/isbi_experiment_retrain/weights/model_isbi_retrained.etlt

Copy
Copied!
            

tao unet export --gpu_index=$GPU_INDEX -m $USER_EXPERIMENT_DIR/isbi_experiment_retrain/weights/model_isbi_retrained.tlt \ -k $KEY \ -e $SPECS_DIR/unet_train_resnet_unet_isbi_retrain.txt \ --gen_ds_config tao-deploy unet gen_trt_engine --gpu_index=$GPU_INDEX -m $USER_EXPERIMENT_DIR/isbi_experiment_retrain/weights/model_isbi_retrained.etlt \ -k $KEY \ -e $SPECS_DIR/unet_train_resnet_unet_isbi_retrain.txt \ --data_type int8 \ --engine_file $USER_EXPERIMENT_DIR/export/int8.isbi.retrained.engine \ --data_type int8 \ --cal_data_file $USER_EXPERIMENT_DIR/export/isbi_cal_data_file.txt \ --cal_cache_file $USER_EXPERIMENT_DIR/export/isbi_cal.bin \ --cal_image_dir $DATA_DOWNLOAD_DIR/isbi/images/val \ --max_batch_size 3 \ --batch_size 1

YOLOv3

Copy
Copied!
            

tao yolo_v3 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov3_resnet18_epoch_$EPOCH.tlt \ -o $USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt \ -e $SPECS_DIR/yolo_v3_retrain_resnet18_tfrecord.txt \ -k $KEY \ --cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \ --data_type int8 \ --batch_size 16 \ --batches 10 \ --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin \ --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \ --gen_ds_config tao converter -k $KEY \ -p Input,1x3x384x1248,8x3x384x1248,16x3x384x1248 \ -c $USER_EXPERIMENT_DIR/export/cal.bin \ -e $USER_EXPERIMENT_DIR/export/trt.engine \ -b 8 \ -t int8 \ $USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt

Copy
Copied!
            

tao yolo_v3 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov3_resnet18_epoch_$EPOCH.tlt \ -k $KEY \ -o $USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt \ -e $SPECS_DIR/yolo_v3_retrain_resnet18_tfrecord.txt \ --gen_ds_config tao-deploy yolo_v3 gen_trt_engine -m $USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt \ -k $KEY \ -e $SPECS_DIR/yolo_v3_retrain_resnet18_tfrecord.txt \ --cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \ --data_type int8 \ --batch_size 16 \ --min_batch_size 1 \ --opt_batch_size 8 \ --max_batch_size 16 \ --batches 10 \ --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin \ --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \ --engine_file $USER_EXPERIMENT_DIR/export/trt.engine.int8

© Copyright 2023, NVIDIA.. Last updated on Aug 2, 2023.