TAO Toolkit Source Code

NVIDIA TAO Toolkit is a low-code AI toolkit, containing solutions to train, fine tune, and optimize Deep Learning models for various computer vision use cases. These Deep Learning solutions are implemented across many popular training frameworks, such as TensorFlow (version 1.15.x and version 2.x), PyTorch (including PyTorch lightning), and NVIDIA TensorRT. As of TAO Toolkit version 5.0.0, the source code for all the core deep learning network implementations has been open sourced, allowing you to get visibility into the workings of the different networks and customize them to suit your use cases.

The source code repositories are organized into the following core repositories, from which the TAO Toolkit Deep Learning containers are built.

  1. tao_tensorflow1_backend: TAO Toolkit deep learning networks with TensorFlow 1.x backend

  2. tao_tensorflow2_backend: TAO Toolkit deep learning networks with TensorFlow 2.x backend

  3. tao_pytorch_backend: TAO Toolkit deep learning networks with PyTorch backend

  4. tao_dataset_suite: A set of advanced data augmentation and analytics tools. The source code in this repository maps

    to the routines contained with the data services arm of TAO Toolkit.

  5. tao_deploy: A package that uses TensorRT to both optimize TAO Toolkit trained models and run inference and evaluation.

There is also a lightweight repository with supplementary tooling:

  1. tao_launcher: A Python CLI to interact with the TAO Toolkit containers that can be installed

    using pip.

  2. tao_front_end_services: TAO Toolkit as a stand-alone service and TAO Client CLI package

  3. tao_tutorials: Quick start scripts and tutorial notebooks to get started with TAO Toolkit

The diagrams below illustrate how the commands issued by the user flow to the system.

Running a TensorFlow 1.x network

tao_tf_user_interaction.jpg

Running a PyTorch network

tao_pt_user_interaction.jpg

Along with the source code, as of TAO Toolkit 5.0.0, each repository also includes a pre-built development container (referred to as the base container) that contains all the pre-built GPU dependency libraries and 3rd party Python packages required to interact with the source code, so you don’t have to build and install the dependencies from source or worry about package versions.

NVIDIA strongly encourages developers to use this execution model. In order to interact with the base container, all the repositories are packaged with a default runner, which is exported as a binary. To export this runner, you can simply run the following:

Copy
Copied!
            

source $REPOSITORY_ROOT/scripts/envsetup.sh

A sample output of this command is shown below (from the TensorFlow 1.x repository).

Copy
Copied!
            

TAO Toolkit TensorFlow build environment set up. The following environment variables have been set: NV_TAO_TF_TOP /path/to/the/root/of/tao_tensorflow1_backend The following functions have been added to your environment: tao_tf Run command inside the container.

Once you run this command, you can simply execute the required script in the repository as follows:

Copy
Copied!
            

tao_tf <runner_args> -- python path/to/script.py --<script_args>

For example, to run the detectnet_v2 entrypoint from the source code repository, you can run the following command:

Copy
Copied!
            

$ tao_tf -- python nvidia_tao_tf1/cv/detectnet_v2/entrypoint/detectnet_v2.py --help usage: detectnet_v2 [-h] [--num_processes NUM_PROCESSES] [--gpus GPUS] [--gpu_index GPU_INDEX [GPU_INDEX ...]] [--use_amp] [--log_file LOG_FILE] {train,prune,inference,export,evaluate,dataset_convert,calibration_tensorfile} ... Transfer Learning Toolkit optional arguments: -h, --help show this help message and exit --num_processes NUM_PROCESSES, -np NUM_PROCESSES The number of horovod child processes to be spawned. Default is -1(equal to --gpus). --gpus GPUS The number of GPUs to be used for the job. --gpu_index GPU_INDEX [GPU_INDEX ...] The indices of the GPU's to be used. --use_amp Flag to enable Auto Mixed Precision. --log_file LOG_FILE Path to the output log file. tasks: {train,prune,inference,export,evaluate,dataset_convert,calibration_tensorfile}

Each repository walks through the process of building, interacting, and even upgrading the base containers if needed. For more information about the runner and its configurable parameters, refer to the individual repositories linked in this section.

© Copyright 2023, NVIDIA.. Last updated on Dec 8, 2023.