TAO Toolkit Source Code

NVIDIA TAO Toolkit is a low-code AI toolkit, containing solutions to train, fine tune, and optimize Deep Learning models for various computer vision use cases. These Deep Learning solutions are implemented across many popular training frameworks, such as TensorFlow (version 1.15.x and version 2.x), PyTorch (including PyTorch lightning), and NVIDIA TensorRT. As of TAO Toolkit version 5.0.0, the source code for all the core deep learning network implementations has been open sourced, allowing you to get visibility into the workings of the different networks and customize them to suit your use cases.

The source code repositories are organized into the following core repositories, from which the TAO Toolkit Deep Learning containers are built.

tao_tensorflow1_backend: TAO Toolkit deep learning networks with TensorFlow 1.x backend
tao_tensorflow2_backend: TAO Toolkit deep learning networks with TensorFlow 2.x backend
tao_pytorch_backend: TAO Toolkit deep learning networks with PyTorch backend
tao_dataset_suite: A set of advanced data augmentation and analytics tools. The source code in this repository maps
to the routines contained with the data services arm of TAO Toolkit.
tao_deploy: A package that uses TensorRT to both optimize TAO Toolkit trained models and run inference and evaluation.

There is also a lightweight repository with supplementary tooling:

tao_launcher: A Python CLI to interact with the TAO Toolkit containers that can be installed
using pip.
tao_front_end_services: TAO Toolkit as a stand-alone service and TAO Client CLI package
tao_tutorials: Quick start scripts and tutorial notebooks to get started with TAO Toolkit

The diagrams below illustrate how the commands issued by the user flow to the system.

Running a TensorFlow 1.x network

Running a PyTorch network

Quick start instructions

Along with the source code, as of TAO Toolkit 5.0.0, each repository also includes a pre-built development container (referred to as the base container) that contains all the pre-built GPU dependency libraries and 3rd party Python packages required to interact with the source code, so you don’t have to build and install the dependencies from source or worry about package versions.

NVIDIA strongly encourages developers to use this execution model. In order to interact with the base container, all the repositories are packaged with a default runner, which is exported as a binary. To export this runner, you can simply run the following:

Copy
Copied!

            
            source $REPOSITORY_ROOT/scripts/envsetup.sh

A sample output of this command is shown below (from the TensorFlow 1.x repository).

Copy
Copied!

            
            TAO Toolkit TensorFlow build environment set up.

The following environment variables have been set:

NV_TAO_TF_TOP       /path/to/the/root/of/tao_tensorflow1_backend

The following functions have been added to your environment:

tao_tf                 Run command inside the container.

Once you run this command, you can simply execute the required script in the repository as follows:

Copy
Copied!

            
            tao_tf <runner_args> -- python path/to/script.py --<script_args>

For example, to run the detectnet_v2 entrypoint from the source code repository, you can run the following command:

Copy
Copied!

            
            $ tao_tf -- python nvidia_tao_tf1/cv/detectnet_v2/entrypoint/detectnet_v2.py --help

usage: detectnet_v2 [-h] [--num_processes NUM_PROCESSES] [--gpus GPUS] [--gpu_index GPU_INDEX [GPU_INDEX ...]] [--use_amp] [--log_file LOG_FILE]
                {train,prune,inference,export,evaluate,dataset_convert,calibration_tensorfile} ...

Transfer Learning Toolkit

optional arguments:
-h, --help            show this help message and exit
--num_processes NUM_PROCESSES, -np NUM_PROCESSES
                        The number of horovod child processes to be spawned. Default is -1(equal to --gpus).
--gpus GPUS           The number of GPUs to be used for the job.
--gpu_index GPU_INDEX [GPU_INDEX ...]
                        The indices of the GPU's to be used.
--use_amp             Flag to enable Auto Mixed Precision.
--log_file LOG_FILE   Path to the output log file.

tasks:
{train,prune,inference,export,evaluate,dataset_convert,calibration_tensorfile}

Each repository walks through the process of building, interacting, and even upgrading the base containers if needed. For more information about the runner and its configurable parameters, refer to the individual repositories linked in this section.

Previous Data Analytics

Next Release Notes