Migrating from TAO Toolkit 3.x to TAO Toolkit 4.0

Container Mapping

TAO 4.0 consolidates all containers under one name with different tags. If you are using TAO directly at the container level, you need to change the name and tag to use the latest version. If you are using the TAO launcher CLI, then the containers will be upgraded automatically when you upgrade the launcher.

The old containers will not be displayed on NGC, but you can still pull them. In the future, these might be deprecated. There are two new containers and two with functionalities that are merged with other containers.

Public Display Name

Current Container Name

Current container tag (most recent)

New Display Name

New container name

New Container Tag

TAO Toolkit for CV

tao-toolkit-tf

v3.22.05-tf1.15.4-py3

TAO Toolkit

tao-toolkit

Merged into 4.0.0-tf1.15.5

tao-toolkit-tf

v3.22.05-tf1.15.5-py3

TAO Toolkit

tao-toolkit

4.0.0-tf1.15.5

tao-toolkit-tf

v3.22.05-beta-api

TAO Toolkit

tao-toolkit

4.0.0-api

Didn’t Exist

Didn’t exist

N/A

TAO Toolkit

tao-toolkit

4.0.0-tf2.9.1 (new)

TAO Toolkit for ConvAI

tao-toolkit-pyt

v3.22.05-py3

TAO Toolkit

tao-toolkit

4.0.0-pyt

TAO Toolkit for Language model

tao-toolkit-lm

v3.22.05-py3

TAO Toolkit

tao-toolkit

Merged into 4.0.0-pyt

Didn’t Exist

Didn’t exist

N/A

TAO Toolkit

tao-toolkit

4.0.0-deploy (new)

TAO Model Export and INT8 Calibration Changes

There are minor interface changes from TAO Toolkit 3.x (21.08, 21.11, 22.02, 22.05) to TAO Toolkit 4.0. This may affect you if you are using older notebooks, have the TAO workflow integrated into your own applications, or are training directly in the containers. If you use the newer notebooks from the TAO Getting Started, then this doesn’t apply, as these notebooks have already been updated.

TAO 4.0 has disaggregated the hybrid training-deployment container to separate training and deployment containers. Since the libraries for training and deployment are completely different, this allows for rapid development and updates to individual components.

The training container contains deep learning frameworks like TensorFlow and PyTorch, but the libraries and entrypoint to make the trained models deploy/inference ready has now been moved to the new deploy container. The deploy container now handles the generation of TensorRT engine and INT8 calibration caches, as well as TensorRT model evaluation and inference.

The image below highlights the changes related to INT8 calibration generation and TensorRT model evaluation. If you are training directly from the containers, then you will need to separately pull the tao-deploy container to run TensorRT conversion and evaluation. If you are using the launcher CLI or API, then this will be handled automatically by the CLI or API.

../_images/tao_deploy_workflow.jpg
  • TAO TensorFlow1 Training container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5

  • TAO TensorFlow2 Training container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf2.9.1

  • TAO PyTorch Training Container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt

  • TAO deploy container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-deploy

This change only effects the models in the table below. For other models, the deploy artifacts are still contained in the training container and will be migrated out in the future.

TensorFlow 1.x

TensorFlow 2.x

PyTorch

Classification

Classification

Deformable DETR

DetectNet_v2

EfficientDet

Segformer

DSSD

EfficientDet

Faster RCNN

LPRNet

Mask RCNN

Multitask Classification

RetinaNet

SSD

UNet

YOLOv3

YOLOv4

YOLOv4_tiny

The detailed changes per network are provided in the table below. The commands are taken from the TAO Jupyter notebooks. Most representative networks have been included, and models introduced in 4.0 are not included.

Network

TAO Toolkit 3.x (21.08, 21.11, 22.02, 22.05)

TAO Toolkit 4.0

Classification

tao classification export \
        -m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_$EPOCH.tlt \
        -o $USER_EXPERIMENT_DIR/export/final_model.etlt \
        -k $KEY \
        --cal_data_file $USER_EXPERIMENT_DIR/export/calibration.tensor \
        --data_type int8 \
        --batches 10 \
        --cal_cache_file $USER_EXPERIMENT_DIR/export/final_model_int8_cache.bin \
        --classmap_json $USER_EXPERIMENT_DIR/output_retrain/classmap.json \
        --gen_ds_config

tao converter $USER_EXPERIMENT_DIR/export/final_model.etlt \
           -k $KEY \
           -c $USER_EXPERIMENT_DIR/export/final_model_int8_cache.bin \
           -o predictions/Softmax \
           -d 3,224,224 \
           -i nchw \
           -m 64 -t int8 \
           -e $USER_EXPERIMENT_DIR/export/final_model.trt \
           -b 64
tao classification_tf1 export \
        -m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_$EPOCH.tlt \
        -o $USER_EXPERIMENT_DIR/export/final_model.etlt \
        -k $KEY \
        --classmap_json $USER_EXPERIMENT_DIR/output_retrain/classmap.json \
        --gen_ds_config

tao-deploy classification_tf1 gen_trt_engine \
        -m $USER_EXPERIMENT_DIR/export/final_model.etlt \
        -e $SPECS_DIR/classification_retrain_spec.cfg \
        -k $KEY \
        --batch_size 64 \
        --max_batch_size 64 \
        --batches 10 \
        --data_type int8 \
        --cal_data_file $USER_EXPERIMENT_DIR/export/calibration.tensor \
        --cal_cache_file $USER_EXPERIMENT_DIR/export/final_model_int8_cache.bin \
        --cal_image_dir $DATA_DOWNLOAD_DIR/split/test/ \
        --engine_file $USER_EXPERIMENT_DIR/export/final_model.trt

DetectNet_v2

tao detectnet_v2 export \
              -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
              -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
              -k $KEY  \
              --cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor \
              --data_type int8 \
              --batches 10 \
              --batch_size 4 \
              --max_batch_size 4\
              --engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8 \
              --cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \
              --verbose

tao converter $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
               -k $KEY \
                -c $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \
               -o output_cov/Sigmoid,output_bbox/BiasAdd \
               -d 3,384,1248 \
               -i nchw \
               -m 64 \
               -t int8 \
               -e $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt \
               -b 4
tao detectnet_v2 export \
              -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
              -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \
              -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
              -k $KEY \
              --gen_ds_config

tao-deploy detectnet_v2 gen_trt_engine \
              -m $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
              -k $KEY  \
              --data_type int8 \
              --batches 10 \
              --batch_size 4 \
              --max_batch_size 64 \
              --engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8 \
              --cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \
              -e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \
              --verbose

EfficientDet

tao efficientdet export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.tlt \
                     -o $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.etlt \
                     -k $KEY \
                     -e $SPECS_DIR/efficientdet_d0_retrain.txt \
                     --batch_size 8 \
                     --data_type int8 \
                     --cal_image_dir $DATA_DOWNLOAD_DIR/raw-data/val2017 \
                     --batches 10 \
                     --max_batch_size 1 \
                     --cal_cache_file $USER_EXPERIMENT_DIR/export/efficientdet_d0.cal

tao converter -k $KEY  \
           -c $USER_EXPERIMENT_DIR/export/trt.int8.cal \
           -p image_arrays:0,1x512x512x3,8x512x512x3,16x512x512x3 \
           -e $USER_EXPERIMENT_DIR/export/trt.int8.engine \
           -t int8 \
           -b 8 \
           $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.etlt
tao efficientdet_tf1 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.tlt \
                     -o $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.etlt \
                     -k $KEY \
                     -e $SPECS_DIR/efficientdet_d0_retrain.txt

tao-deploy efficientdet_tf1 gen_trt_engine -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.etlt \
                                        -k $KEY \
                                        --batch_size 8 \
                                        --data_type int8 \
                                        --cal_image_dir $DATA_DOWNLOAD_DIR/raw-data/val2017 \
                                        --batches 10 \
                                        --min_batch_size 1 \
                                        --opt_batch_size 8 \
                                        --max_batch_size 16 \
                                        --cal_cache_file $USER_EXPERIMENT_DIR/export/efficientdet_d0.cal \
                                        --engine_file $USER_EXPERIMENT_DIR/export/trt.int8.engine

SSD

tao ssd export --gpu_index=$GPU_INDEX \

             -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt  \
             -o $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt \
             -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
             -k $KEY \
             --cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \
             --data_type int8 \
             --batch_size 16 \
             --batches 10 \
             --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin  \
             --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \
             --gen_ds_config

tao converter -k $KEY  \
                -d 3,300,300 \
                -o NMS \
                -c $USER_EXPERIMENT_DIR/export/cal.bin \
                -e $USER_EXPERIMENT_DIR/export/trt.engine \
                -b 8 \
                -m 16 \
                -t int8 \
                -i nchw \
                $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt
tao ssd export --gpu_index=$GPU_INDEX \
            -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt \
            -k $KEY \
            -o $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt \
            -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
            --batch_size 16 \
            --gen_ds_config

tao-deploy ssd gen_trt_engine --gpu_index=$GPU_INDEX \
                           -m $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt \
                           -k $KEY \
                           -e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
                           --engine_file $USER_EXPERIMENT_DIR/export/trt.engine \
                           --cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \
                           --data_type int8 \
                           --max_batch_size 16 \
                           --batch_size 16 \
                           --batches 10 \
                           --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin \
                           --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile

UNet

tao unet export --gpu_index=$GPU_INDEX -m $USER_EXPERIMENT_DIR/isbi_experiment_retrain/weights/model_isbi_retrained.tlt \
            -k $KEY \
            -e $SPECS_DIR/unet_train_resnet_unet_isbi_retrain.txt \
            --data_type int8 \
            --engine_file $USER_EXPERIMENT_DIR/export/int8.isbi.retrained.engine \
            --data_type int8 \
            --cal_data_file $USER_EXPERIMENT_DIR/export/isbi_cal_data_file.txt \
            --cal_cache_file $USER_EXPERIMENT_DIR/export/isbi_cal.bin \
            --cal_image_dir $DATA_DOWNLOAD_DIR/isbi/images/val \
            --max_batch_size 3 \
            --batch_size 1 \
            --gen_ds_config

tao converter -k $KEY  \
           -c $USER_EXPERIMENT_DIR/export/isbi_cal.bin \
           -e $USER_EXPERIMENT_DIR/export/trt.int8.tlt.isbi.engine \
           -i nchw \
           -t int8 \
           -p input_1:0,1x1x320x320,4x1x320x320,16x1x320x320 \
           $USER_EXPERIMENT_DIR/isbi_experiment_retrain/weights/model_isbi_retrained.etlt
tao unet export --gpu_index=$GPU_INDEX -m $USER_EXPERIMENT_DIR/isbi_experiment_retrain/weights/model_isbi_retrained.tlt \
           -k $KEY \
           -e $SPECS_DIR/unet_train_resnet_unet_isbi_retrain.txt \
           --gen_ds_config

tao-deploy unet gen_trt_engine --gpu_index=$GPU_INDEX -m $USER_EXPERIMENT_DIR/isbi_experiment_retrain/weights/model_isbi_retrained.etlt \
                            -k $KEY \
                            -e $SPECS_DIR/unet_train_resnet_unet_isbi_retrain.txt \
                            --data_type int8 \
                            --engine_file $USER_EXPERIMENT_DIR/export/int8.isbi.retrained.engine \
                            --data_type int8 \
                            --cal_data_file $USER_EXPERIMENT_DIR/export/isbi_cal_data_file.txt \
                            --cal_cache_file $USER_EXPERIMENT_DIR/export/isbi_cal.bin \
                            --cal_image_dir $DATA_DOWNLOAD_DIR/isbi/images/val \
                            --max_batch_size 3 \
                            --batch_size 1

YOLOv3

tao yolo_v3 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov3_resnet18_epoch_$EPOCH.tlt  \
                 -o $USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt \
                 -e $SPECS_DIR/yolo_v3_retrain_resnet18_tfrecord.txt \
                 -k $KEY \
                 --cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \
                 --data_type int8 \
                 --batch_size 16 \
                 --batches 10 \
                 --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin  \
                 --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \
                 --gen_ds_config

tao converter -k $KEY  \
                -p Input,1x3x384x1248,8x3x384x1248,16x3x384x1248 \
                -c $USER_EXPERIMENT_DIR/export/cal.bin \
                -e $USER_EXPERIMENT_DIR/export/trt.engine \
                -b 8 \
                -t int8 \
                $USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt
tao yolo_v3 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov3_resnet18_epoch_$EPOCH.tlt \
                -k $KEY \
                -o $USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt \
                -e $SPECS_DIR/yolo_v3_retrain_resnet18_tfrecord.txt \
                --gen_ds_config

tao-deploy yolo_v3 gen_trt_engine -m $USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt \
                               -k $KEY \
                               -e $SPECS_DIR/yolo_v3_retrain_resnet18_tfrecord.txt \
                               --cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \
                               --data_type int8 \
                               --batch_size 16 \
                               --min_batch_size 1 \
                               --opt_batch_size 8 \
                               --max_batch_size 16 \
                               --batches 10 \
                               --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin  \
                               --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \
                               --engine_file $USER_EXPERIMENT_DIR/export/trt.engine.int8