Running TAO from Source#

NVIDIA TAO is a low-code AI toolkit that contains solutions to train, fine-tune, and optimize Deep Learning models for various computer vision use cases. These deep learning solutions are implemented across many popular training frameworks, such as TensorFlow (version 2.11.x and version 2.x), PyTorch (including PyTorch Lightning), and NVIDIA® TensorRT. The source code for all the core deep learning network implementations has been open-sourced as of TAO version 5.0.0, allowing you to get visibility into the workings of the different networks and customize them to suit your use cases.

The TAO deep learning containers are build from the following core repositories:

  1. tao_tensorflow1_backend: TAO deep learning networks with TensorFlow 1.x backend.

  2. tao_tensorflow2_backend: TAO deep learning networks with TensorFlow 2.x backend.

  3. tao_pytorch_backend: TAO deep learning networks with PyTorch backend.

  4. tao_dataset_suite: A set of advanced data augmentation and analytics tools. The source code in this repository maps to the routines contained with the data services arm of TAO.

  5. tao_deploy: A package that uses TensorRT to both optimize TAO trained models and run inference and evaluation.

  6. tao_launcher: A Python CLI to interact with the TAO containers that can be installed using pip.

  7. tao_front_end_services: TAO as a stand-alone service and TAO Client CLI package.

tao_tutorials is an additional lightweight repository with supplementary tooling and tutorials. It contains start scripts and tutorial notebooks to get started with TAO.

The diagram below shows how the commands issued by the user flow through the system.

../../_images/tao_pt_user_interaction.jpg

Running a PyTorch network#

Quick Start Instructions#

As of TAO 5.0.0, each repository includes a pre-built development container (called the base container) along with the source code. The base container has all the pre-built GPU dependency libraries and third-party Python packages required to interact with the source code. The base container installs the necessary dependencies from the source, which removes the need to manage package versions and manage your Python environments.

NVIDIA strongly encourages developers to use this execution model. To interact with the base container, all the repositories are packaged with a default runner, which is exported as a binary. To export this runner, run the following command:

source $REPOSITORY_ROOT/scripts/envsetup.sh

The output of the command (from the PyTorch repository) looks like this:

TAO pytorch build environment set up.

The following environment variables have been set:

NV_TAO_PYTORCH_TOP       /localhome/local-vpraveen/Software/tao_gitlab/tlt-pytorch

The following functions have been added to your environment:

tao_pt                 Run command inside the container.

After you run this command, execute the required script in the repository like this:

tao_pt <runner_args> -- python path/to/script.py --<script_args>

For example, to run the grounding_dino entry point from the source code repository, you would enter this command:

$ tao_pt -- python nvidia_tao_pytorch/cv/grounding_dino/entrypoint/grounding_dino.py --help

usage: grounding_dino [-h] [-e EXPERIMENT_SPEC_FILE] {evaluate,export,inference,train}

Train Adapt Optimize entrypoint for grounding_dino

positional arguments:
{evaluate,export,inference,train}
                        Subtask for a given task/model.

options:
-h, --help            show this help message and exit
-e EXPERIMENT_SPEC_FILE, --experiment_spec_file EXPERIMENT_SPEC_FILE
                        Path to the experiment spec file.

Each repository walks through the process of building, interacting, and even upgrading the base containers if needed. For more information about the runner and its configurable parameters, refer to the individual repositories linked in this section.