Running TAO via the Containers#

TAO encapsulates DNN training pipelines that may be developed across different training frameworks. To isolate dependencies and training environments, these DNN applications are housed in different containers. The TAO Launcher abstracts the details of which network is associated with which container. However, it requires you to run TAO from an environment where Docker containers can be instantiated by the launcher. This requires elevated user privileges or a Docker IN Docker (DIND) setup to call a Docker from within your container. This may be less than ideal in several scenarios, such as:

Running on a remote cluster where the SLURM instantiates a container on the provisioned cluster node
Running on a machine without elevated user privileges
Running multi-node training jobs
Running on an instance of a multi-instanced GPU (a MIG)

Invoking the Containers Directly#

To run the DNNs from one of the enclosed containers, you first need to know which networks are housed in which container. A simple way to get this information is to install the TAO Launcher on your local machine and run tao info –verbose, enclosed across multiple containers.

This is sample output from TAO 6.0.0:

Configuration of the TAO Toolkit Instance

task_group:
model:
        dockers:
        nvidia/tao/tao-toolkit:
                6.0.0-tf2:
                docker_registry: nvcr.io
                tasks:
                        1. classification_tf2
                        2. efficientdet_tf2
                6.0.0-pyt:
                docker_registry: nvcr.io
                tasks:
                        1. action_recognition
                        2. centerpose
                        3. classification_pyt
                        4. deformable_detr
                        5. dino
                        6. grounding_dino
                        7. mask_grounding_dino
                        8. mask2former
                        9. mal
                        10. mae
                        11. ml_recog
                        12. nvdinov2
                        13. ocdnet
                        14. ocrnet
                        15. optical_inspection
                        16. pointpillars
                        17. pose_classification
                        18. re_identification
                        19. rtdetr
                        20. segformer
                        21. stylegan_xl
                        22. visual_changenet
        nvidia/tao/tao-toolkit:
                5.5.0-pyt:
                docker_registry: nvcr.io
                tasks:
                        1. bevfusion
dataset:
        dockers:
        nvidia/tao/tao-toolkit:
                6.0.0-data-services:
                docker_registry: nvcr.io
                tasks:
                        1. augmentation
                        2. auto_label
                        3. annotations
                        4. analytics
deploy:
        dockers:
        nvidia/tao/tao-toolkit:
                6.0.0-deploy:
                docker_registry: nvcr.io
                tasks:
                        5. centerpose
                        6. classification_pyt
                        7. classification_tf1
                        8. classification_tf2
                        9. deformable_detr
                        10. detectnet_v2
                        11. dino
                        12. dssd
                        13. efficientdet_tf1
                        14. efficientdet_tf2
                        15. faster_rcnn
                        16. grounding_dino
                        17. lprnet
                        18. mask2former
                        19. mask_grounding_dino
                        20. mask_rcnn
                        21. mae
                        22. model_agnostic
                        23. ml_recog
                        24. multitask_classification
                        25. nvdinov2
                        26. ocdnet
                        27. ocrnet
                        28. optical_inspection
                        29. retinanet
                        30. rtdetr
                        31. segformer
                        32. ssd
                        33. trtexec
                        34. unet
                        35. visual_changenet
                        36. yolo_v3
                        37. yolo_v4
                        38. yolo_v4_tiny
format_version: 3.0
toolkit_version: 6.0.0
published_date: 05/08/2025

The container name associated with the task can be derived as $DOCKER_REGISTRY/$DOCKER_NAME:$DOCKER_TAG. For example, from the log above, the Docker name to run detectnet_v2 can be derived like this:

export DOCKER_REGISTRY="nvcr.io"
export DOCKER_NAME="nvidia/tao/tao-toolkit"
export DOCKER_TAG="5.0.0-tf1.15.5"

export DOCKER_CONTAINER=$DOCKER_REGISTRY/$DOCKER_NAME:$DOCKER_TAG

Once you have the Docker name, invoke the container by running the commands defined by the network without the tao prefix. For example, this command runs a detectnet_v2 training job with four GPUs:

docker run -it --rm --gpus all \
-v /path/in/host:/path/in/docker \
$DOCKER_CONTAINER \
detectnet_v2 train -e /path/to/experiment/spec.txt \
-r /path/to/results/dir \
-k $KEY --gpus 4

Running Multi-Node Training#

As of 6.0.0, TAO supports multi-node training for all the models in PyTorch. TAO uses PyTorch Lightning Trainer to orchestrate multi-GPU and multi-node training.

To invoke multi-node training, simply add the train.num_nodes argument to the train command to specify the number of nodes, and the train.num_gpus argument to specify the number of GPUs per node.

For example, you could express the multi-GPU training command shown above as a multi-node command:

grounding_dino train -e /path/to/experiments.yaml \
results_dir=/path/to/results \
train.num_nodes=2 \
train.num_gpus=4

This command specifies two nodes and four GPUs per node.

Running Training on a Multi-Instance GPU#

NVIDIA Multi-Instance GPUs (MIGs) expand the performance and value of data-center class GPUs (NVIDIA H100, A100 and A30 Tensor Core GPUs) by letting you partition a single GPU into as many as seven instances, each with its own fully isolated high-bandwidth memory, cache, and compute cores. For more information on setting up MIG, refer to NVIDIA Multi-Instance GPU User Guide.

Note

Read the discussion of supported configurations in the MIG documentation to understand the best way to split and improve utilization.

This sample command runs a MAE training session on a MIG-enabled GPU:

docker run -it --rm --runtime=nvidia \
        -e NVIDIA_VISIBLE_DEVICES="MIG-<DEVICE_UUID>,MIG-<DEVICE_UUID>" \
        -v /path/in/host:/path/in/container \
        nvcr.io/nvidia/tao/tao-toolkit:6.0.0-pyt \
        mae train -e /path/to/experiments.yaml \
        results_dir=/path/to/results \
        train.num_gpus=2

Note

You must add the --runtime=nvidia flag to the docker command and export the NVIDIA_VISIBLE_DEVICES environment variable with the UUID of the GPU instance. You can get the UUID of the specific MIG instance by running the nvidia-smi -L command.

Running without Elevated User Privileges#

To run TAO via the TAO Launcher you must first install docker-ce because the launcher interacts with the Docker service on the local host to run the commands. Installing Docker requires elevated user privileges to run as root. If you don’t have elevated user privileges on your compute machine, you may run TAO using the Singularity container. This requires you to bypass the tao-launcher and interact directly with the component Docker containers. For information on which tasks are implemented in different Dockers, run the tao info --verbose command. Once you have derived the task-to-Docker mapping, you may run the tasks like this:

Enter this singularity command to pull the required Docker:

  singularity pull tao-toolkit-pyt:6.0.0-pyt.sif docker://nvcr.io/nvidia/tao/tao-toolkit:6.0.0-pyt

.. Note::

   For this command to work, the latest version of ``singularity`` must be installed.

Enter this command to instantiate the Docker:

singularity run --nv -B /path/to/workspace:/path/to/workspace tao-toolkit-pyt:6.0.0-pyt.sif

Run the commands inside the container without the tao model prefix. For example, to run a grounding_dino training in the tao-toolkit container, use the following command:

grounding_dino train -e /path/to/experiments.yaml \
    results_dir=/path/to/results \
    train.num_gpus=4