There are minor interface changes from TAO Toolkit 3.x (21.08, 21.11, 22.02, 22.05) to TAO Toolkit 4.0. This may affect
you if you are using older notebooks, have the TAO workflow integrated into your own applications, or are
training directly in the containers. If you use the newer notebooks from the TAO Getting Started,
then this doesn’t apply, as these notebooks have already been updated.
TAO 4.0 has disaggregated the hybrid training-deployment container to separate training and deployment containers.
Since the libraries for training and deployment are completely different, this allows for rapid development and
updates to individual components.
The training container contains deep learning frameworks like TensorFlow and PyTorch, but the libraries and
entrypoint to make the trained models deploy/inference ready has now been moved to the new deploy container.
The deploy container now handles the generation of TensorRT engine and INT8 calibration caches, as well as
TensorRT model evaluation and inference.
The image below highlights the changes related to INT8 calibration generation and TensorRT model evaluation.
If you are training directly from the containers, then you will need to separately pull the tao-deploy
container to run TensorRT conversion and evaluation. If you are using the launcher CLI or API, then this will
be handled automatically by the CLI or API.
TAO TensorFlow1 Training container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5
TAO TensorFlow1 Training container for MaskRCNN and UNet: nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5
TAO TensorFlow2 Training container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf2.9.1
TAO PyTorch Training Container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
TAO deploy container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-deploy
This change only effects the models in the table below. For other models, the deploy artifacts are still contained
in the training container and will be migrated out in the future.
TensorFlow 1.x |
TensorFlow 2.x |
PyTorch |
Classification |
Classification |
Deformable DETR |
DetectNet_v2 |
EfficientDet |
Segformer |
DSSD |
|
|
EfficientDet |
|
|
Faster RCNN |
|
|
LPRNet |
|
|
Mask RCNN |
|
|
Multitask Classification |
|
|
RetinaNet |
|
|
SSD |
|
|
UNet |
|
|
YOLOv3 |
|
|
YOLOv4 |
|
|
YOLOv4_tiny |
|
|
The detailed changes per network are provided in the table below. The commands are taken from the TAO
Jupyter notebooks. Most representative networks have been included, and models introduced in 4.0 are not
included.
Network |
TAO Toolkit 3.x (21.08, 21.11, 22.02, 22.05) |
TAO Toolkit 4.0 |
Classification |
tao classification export \
-m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_$EPOCH.tlt \
-o $USER_EXPERIMENT_DIR/export/final_model.etlt \
-k $KEY \
--cal_data_file $USER_EXPERIMENT_DIR/export/calibration.tensor \
--data_type int8 \
--batches 10 \
--cal_cache_file $USER_EXPERIMENT_DIR/export/final_model_int8_cache.bin \
--classmap_json $USER_EXPERIMENT_DIR/output_retrain/classmap.json \
--gen_ds_config
tao converter $USER_EXPERIMENT_DIR/export/final_model.etlt \
-k $KEY \
-c $USER_EXPERIMENT_DIR/export/final_model_int8_cache.bin \
-o predictions/Softmax \
-d 3,224,224 \
-i nchw \
-m 64 -t int8 \
-e $USER_EXPERIMENT_DIR/export/final_model.trt \
-b 64
|
tao classification_tf1 export \
-m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_$EPOCH.tlt \
-o $USER_EXPERIMENT_DIR/export/final_model.etlt \
-k $KEY \
--classmap_json $USER_EXPERIMENT_DIR/output_retrain/classmap.json \
--gen_ds_config
tao-deploy classification_tf1 gen_trt_engine \
-m $USER_EXPERIMENT_DIR/export/final_model.etlt \
-e $SPECS_DIR/classification_retrain_spec.cfg \
-k $KEY \
--batch_size 64 \
--max_batch_size 64 \
--batches 10 \
--data_type int8 \
--cal_data_file $USER_EXPERIMENT_DIR/export/calibration.tensor \
--cal_cache_file $USER_EXPERIMENT_DIR/export/final_model_int8_cache.bin \
--cal_image_dir $DATA_DOWNLOAD_DIR/split/test/ \
--engine_file $USER_EXPERIMENT_DIR/export/final_model.trt
|
DetectNet_v2 |
tao detectnet_v2 export \
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
-o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
-k $KEY \
--cal_data_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.tensor \
--data_type int8 \
--batches 10 \
--batch_size 4 \
--max_batch_size 4\
--engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8 \
--cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \
--verbose
tao converter $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
-k $KEY \
-c $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \
-o output_cov/Sigmoid,output_bbox/BiasAdd \
-d 3,384,1248 \
-i nchw \
-m 64 \
-t int8 \
-e $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt \
-b 4
|
tao detectnet_v2 export \
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
-e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \
-o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
-k $KEY \
--gen_ds_config
tao-deploy detectnet_v2 gen_trt_engine \
-m $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
-k $KEY \
--data_type int8 \
--batches 10 \
--batch_size 4 \
--max_batch_size 64 \
--engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8 \
--cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \
-e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt \
--verbose
|
EfficientDet |
tao efficientdet export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.tlt \
-o $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.etlt \
-k $KEY \
-e $SPECS_DIR/efficientdet_d0_retrain.txt \
--batch_size 8 \
--data_type int8 \
--cal_image_dir $DATA_DOWNLOAD_DIR/raw-data/val2017 \
--batches 10 \
--max_batch_size 1 \
--cal_cache_file $USER_EXPERIMENT_DIR/export/efficientdet_d0.cal
tao converter -k $KEY \
-c $USER_EXPERIMENT_DIR/export/trt.int8.cal \
-p image_arrays:0,1x512x512x3,8x512x512x3,16x512x512x3 \
-e $USER_EXPERIMENT_DIR/export/trt.int8.engine \
-t int8 \
-b 8 \
$USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.etlt
|
tao efficientdet_tf1 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.tlt \
-o $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.etlt \
-k $KEY \
-e $SPECS_DIR/efficientdet_d0_retrain.txt
tao-deploy efficientdet_tf1 gen_trt_engine -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.etlt \
-k $KEY \
--batch_size 8 \
--data_type int8 \
--cal_image_dir $DATA_DOWNLOAD_DIR/raw-data/val2017 \
--batches 10 \
--min_batch_size 1 \
--opt_batch_size 8 \
--max_batch_size 16 \
--cal_cache_file $USER_EXPERIMENT_DIR/export/efficientdet_d0.cal \
--engine_file $USER_EXPERIMENT_DIR/export/trt.int8.engine
|
SSD |
tao ssd export --gpu_index=$GPU_INDEX \
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt \
-o $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt \
-e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
-k $KEY \
--cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \
--data_type int8 \
--batch_size 16 \
--batches 10 \
--cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin \
--cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \
--gen_ds_config
tao converter -k $KEY \
-d 3,300,300 \
-o NMS \
-c $USER_EXPERIMENT_DIR/export/cal.bin \
-e $USER_EXPERIMENT_DIR/export/trt.engine \
-b 8 \
-m 16 \
-t int8 \
-i nchw \
$USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt
|
tao ssd export --gpu_index=$GPU_INDEX \
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_resnet18_epoch_$EPOCH.tlt \
-k $KEY \
-o $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt \
-e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
--batch_size 16 \
--gen_ds_config
tao-deploy ssd gen_trt_engine --gpu_index=$GPU_INDEX \
-m $USER_EXPERIMENT_DIR/export/ssd_resnet18_epoch_$EPOCH.etlt \
-k $KEY \
-e $SPECS_DIR/ssd_retrain_resnet18_kitti.txt \
--engine_file $USER_EXPERIMENT_DIR/export/trt.engine \
--cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \
--data_type int8 \
--max_batch_size 16 \
--batch_size 16 \
--batches 10 \
--cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin \
--cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile
|
UNet |
tao unet export --gpu_index=$GPU_INDEX -m $USER_EXPERIMENT_DIR/isbi_experiment_retrain/weights/model_isbi_retrained.tlt \
-k $KEY \
-e $SPECS_DIR/unet_train_resnet_unet_isbi_retrain.txt \
--data_type int8 \
--engine_file $USER_EXPERIMENT_DIR/export/int8.isbi.retrained.engine \
--data_type int8 \
--cal_data_file $USER_EXPERIMENT_DIR/export/isbi_cal_data_file.txt \
--cal_cache_file $USER_EXPERIMENT_DIR/export/isbi_cal.bin \
--cal_image_dir $DATA_DOWNLOAD_DIR/isbi/images/val \
--max_batch_size 3 \
--batch_size 1 \
--gen_ds_config
tao converter -k $KEY \
-c $USER_EXPERIMENT_DIR/export/isbi_cal.bin \
-e $USER_EXPERIMENT_DIR/export/trt.int8.tlt.isbi.engine \
-i nchw \
-t int8 \
-p input_1:0,1x1x320x320,4x1x320x320,16x1x320x320 \
$USER_EXPERIMENT_DIR/isbi_experiment_retrain/weights/model_isbi_retrained.etlt
|
tao unet export --gpu_index=$GPU_INDEX -m $USER_EXPERIMENT_DIR/isbi_experiment_retrain/weights/model_isbi_retrained.tlt \
-k $KEY \
-e $SPECS_DIR/unet_train_resnet_unet_isbi_retrain.txt \
--gen_ds_config
tao-deploy unet gen_trt_engine --gpu_index=$GPU_INDEX -m $USER_EXPERIMENT_DIR/isbi_experiment_retrain/weights/model_isbi_retrained.etlt \
-k $KEY \
-e $SPECS_DIR/unet_train_resnet_unet_isbi_retrain.txt \
--data_type int8 \
--engine_file $USER_EXPERIMENT_DIR/export/int8.isbi.retrained.engine \
--data_type int8 \
--cal_data_file $USER_EXPERIMENT_DIR/export/isbi_cal_data_file.txt \
--cal_cache_file $USER_EXPERIMENT_DIR/export/isbi_cal.bin \
--cal_image_dir $DATA_DOWNLOAD_DIR/isbi/images/val \
--max_batch_size 3 \
--batch_size 1
|
YOLOv3 |
tao yolo_v3 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov3_resnet18_epoch_$EPOCH.tlt \
-o $USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt \
-e $SPECS_DIR/yolo_v3_retrain_resnet18_tfrecord.txt \
-k $KEY \
--cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \
--data_type int8 \
--batch_size 16 \
--batches 10 \
--cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin \
--cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \
--gen_ds_config
tao converter -k $KEY \
-p Input,1x3x384x1248,8x3x384x1248,16x3x384x1248 \
-c $USER_EXPERIMENT_DIR/export/cal.bin \
-e $USER_EXPERIMENT_DIR/export/trt.engine \
-b 8 \
-t int8 \
$USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt
|
tao yolo_v3 export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov3_resnet18_epoch_$EPOCH.tlt \
-k $KEY \
-o $USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt \
-e $SPECS_DIR/yolo_v3_retrain_resnet18_tfrecord.txt \
--gen_ds_config
tao-deploy yolo_v3 gen_trt_engine -m $USER_EXPERIMENT_DIR/export/yolov3_resnet18_epoch_$EPOCH.etlt \
-k $KEY \
-e $SPECS_DIR/yolo_v3_retrain_resnet18_tfrecord.txt \
--cal_image_dir $DATA_DOWNLOAD_DIR/testing/image_2 \
--data_type int8 \
--batch_size 16 \
--min_batch_size 1 \
--opt_batch_size 8 \
--max_batch_size 16 \
--batches 10 \
--cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin \
--cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \
--engine_file $USER_EXPERIMENT_DIR/export/trt.engine.int8
|