Advanced Users
NVIDIA TAO is a low-code AI toolkit, containing solutions to train, fine tune, and optimize Deep Learning models for various computer vision use cases. These Deep Learning solutions are implemented across many popular training frameworks, such as TensorFlow (version 2.11.x and version 2.x), PyTorch (including PyTorch lightning), and NVIDIA TensorRT. As of TAO version 5.0.0, the source code for all the core deep learning network implementations has been open sourced, allowing you to get visibility into the workings of the different networks and customize them to suit your use cases.
The source code repositories are organized into the following core repositories, from which the TAO Deep Learning containers are built.
tao_tensorflow1_backend: TAO deep learning networks with TensorFlow 1.x backend
tao_tensorflow2_backend: TAO deep learning networks with TensorFlow 2.x backend
tao_pytorch_backend: TAO deep learning networks with PyTorch backend
tao_dataset_suite: A set of advanced data augmentation and analytics tools. The source code in this repository maps to the routines contained with the data services arm of TAO.
tao_deploy: A package that uses TensorRT to both optimize TAO trained models and run inference and evaluation.
tao_launcher: A Python CLI to interact with the TAO containers that can be installed using pip.
tao_front_end_services: TAO as a stand-alone service and TAO Client CLI package
There is also a lightweight repository with supplementary tooling and tutorials:
tao_tutorials: Quick start scripts and tutorial notebooks to get started with TAO
The below diagram illustrates how the commands issued by the user flow to the system.
Running a PyTorch network
Along with the source code, as of TAO 5.0.0, each repository also includes a pre-built development container (referred to as the base container). The base container has all the pre-built GPU dependency libraries and 3rd party Python packages required to interact with the source code. The base container installs the necessary dependencies from the source, which removes the need to manage package versions and manage your Python environments.
NVIDIA strongly encourages developers to use this execution model. To interact with the base container, all the repositories are packaged with a default runner, which is exported as a binary. To export this runner, you can run the following command:
source $REPOSITORY_ROOT/scripts/envsetup.sh
A sample output of this command is shown below (from the PyTorch repository):
TAO pytorch build environment set up.
The following environment variables have been set:
NV_TAO_PYTORCH_TOP /localhome/local-vpraveen/Software/tao_gitlab/tlt-pytorch
The following functions have been added to your environment:
tao_pt Run command inside the container.
After you run this command, execute the required script in the repository as follows:
tao_pt <runner_args> -- python path/to/script.py --<script_args>
For example, to run the grounding_dino
entrypoint from the source code repository, you can run the following command:
$ tao_pt -- python nvidia_tao_pytorch/cv/grounding_dino/entrypoint/grounding_dino.py --help
usage: grounding_dino [-h] [-e EXPERIMENT_SPEC_FILE] {evaluate,export,inference,train}
Train Adapt Optimize entrypoint for grounding_dino
positional arguments:
{evaluate,export,inference,train}
Subtask for a given task/model.
options:
-h, --help show this help message and exit
-e EXPERIMENT_SPEC_FILE, --experiment_spec_file EXPERIMENT_SPEC_FILE
Path to the experiment spec file.
Each repository walks through the process of building, interacting, and even upgrading the base containers if needed. For more information about the runner and its configurable parameters, refer to the individual repositories linked in this section.