Running TAO via the Containers#
TAO encapsulates DNN training pipelines that may be developed across different training frameworks. To isolate dependencies and training environments, these DNN applications are housed in different containers. The TAO Launcher abstracts the details of which network is associated with which container. However, it requires you to run TAO from an environment where Docker containers can be instantiated by the launcher. This requires elevated user privileges or a Docker IN Docker (DIND) setup to call a Docker from within your container. This may be less than ideal in several scenarios, such as:
Running on a remote cluster where the SLURM instantiates a container on the provisioned cluster node
Running on a machine without elevated user privileges
Running multi-node training jobs
Running on an instance of a multi-instanced GPU (a MIG)
Invoking the Containers Directly#
To run the DNNs from one of the enclosed containers, you first need to know which networks are housed in which container. A simple way to get this information is to install the TAO Launcher on your local machine and run tao info –verbose, enclosed across multiple containers.
This is sample output from TAO 6.0.0:
Configuration of the TAO Toolkit Instance
task_group:
model:
dockers:
nvidia/tao/tao-toolkit:
6.0.0-tf2:
docker_registry: nvcr.io
tasks:
1. classification_tf2
2. efficientdet_tf2
6.0.0-pyt:
docker_registry: nvcr.io
tasks:
1. action_recognition
2. centerpose
3. classification_pyt
4. deformable_detr
5. dino
6. grounding_dino
7. mask_grounding_dino
8. mask2former
9. mal
10. mae
11. ml_recog
12. nvdinov2
13. ocdnet
14. ocrnet
15. optical_inspection
16. pointpillars
17. pose_classification
18. re_identification
19. rtdetr
20. segformer
21. stylegan_xl
22. visual_changenet
nvidia/tao/tao-toolkit:
5.5.0-pyt:
docker_registry: nvcr.io
tasks:
1. bevfusion
dataset:
dockers:
nvidia/tao/tao-toolkit:
6.0.0-data-services:
docker_registry: nvcr.io
tasks:
1. augmentation
2. auto_label
3. annotations
4. analytics
deploy:
dockers:
nvidia/tao/tao-toolkit:
6.0.0-deploy:
docker_registry: nvcr.io
tasks:
5. centerpose
6. classification_pyt
7. classification_tf1
8. classification_tf2
9. deformable_detr
10. detectnet_v2
11. dino
12. dssd
13. efficientdet_tf1
14. efficientdet_tf2
15. faster_rcnn
16. grounding_dino
17. lprnet
18. mask2former
19. mask_grounding_dino
20. mask_rcnn
21. mae
22. model_agnostic
23. ml_recog
24. multitask_classification
25. nvdinov2
26. ocdnet
27. ocrnet
28. optical_inspection
29. retinanet
30. rtdetr
31. segformer
32. ssd
33. trtexec
34. unet
35. visual_changenet
36. yolo_v3
37. yolo_v4
38. yolo_v4_tiny
format_version: 3.0
toolkit_version: 6.0.0
published_date: 05/08/2025
The container name associated with the task can be derived as $DOCKER_REGISTRY/$DOCKER_NAME:$DOCKER_TAG
.
For example, from the log above, the Docker name to run detectnet_v2
can be derived like this:
export DOCKER_REGISTRY="nvcr.io"
export DOCKER_NAME="nvidia/tao/tao-toolkit"
export DOCKER_TAG="5.0.0-tf1.15.5"
export DOCKER_CONTAINER=$DOCKER_REGISTRY/$DOCKER_NAME:$DOCKER_TAG
Once you have the Docker name, invoke the container by running the commands defined by the network without the
tao
prefix. For example, this command runs a detectnet_v2 training job with four GPUs:
docker run -it --rm --gpus all \
-v /path/in/host:/path/in/docker \
$DOCKER_CONTAINER \
detectnet_v2 train -e /path/to/experiment/spec.txt \
-r /path/to/results/dir \
-k $KEY --gpus 4
Running Multi-Node Training#
As of 6.0.0, TAO supports multi-node training for all the models in PyTorch. TAO uses PyTorch Lightning Trainer to orchestrate multi-GPU and multi-node training.
To invoke multi-node training, simply add the train.num_nodes
argument to the train
command to specify the number of nodes, and the train.num_gpus
argument to specify the number of GPUs per node.
For example, you could express the multi-GPU training command shown above as a multi-node command:
grounding_dino train -e /path/to/experiments.yaml \
results_dir=/path/to/results \
train.num_nodes=2 \
train.num_gpus=4
This command specifies two nodes and four GPUs per node.
Running Training on a Multi-Instance GPU#
NVIDIA Multi-Instance GPUs (MIGs) expand the performance and value of data-center class GPUs (NVIDIA H100, A100 and A30 Tensor Core GPUs) by letting you partition a single GPU into as many as seven instances, each with its own fully isolated high-bandwidth memory, cache, and compute cores. For more information on setting up MIG, refer to NVIDIA Multi-Instance GPU User Guide.
Note
Read the discussion of supported configurations in the MIG documentation to understand the best way to split and improve utilization.
This sample command runs a MAE training session on a MIG-enabled GPU:
docker run -it --rm --runtime=nvidia \
-e NVIDIA_VISIBLE_DEVICES="MIG-<DEVICE_UUID>,MIG-<DEVICE_UUID>" \
-v /path/in/host:/path/in/container \
nvcr.io/nvidia/tao/tao-toolkit:6.0.0-pyt \
mae train -e /path/to/experiments.yaml \
results_dir=/path/to/results \
train.num_gpus=2
Note
You must add the --runtime=nvidia
flag to the docker
command and export the
NVIDIA_VISIBLE_DEVICES
environment variable with the UUID of the GPU instance. You can get the UUID of the
specific MIG instance by running the nvidia-smi -L
command.
Running without Elevated User Privileges#
To run TAO via the TAO Launcher you must first install docker-ce
because the
launcher interacts with the Docker service on the local host to run the commands. Installing Docker
requires elevated user privileges to run as root. If you don’t have elevated user privileges
on your compute machine, you may run TAO using the Singularity container. This requires you to bypass the tao-launcher
and interact directly with the component Docker containers. For information on
which tasks are implemented in different Dockers, run the tao info --verbose
command. Once you have derived
the task-to-Docker mapping, you may run the tasks like this:
Enter this
singularity
command to pull the required Docker:singularity pull tao-toolkit-pyt:6.0.0-pyt.sif docker://nvcr.io/nvidia/tao/tao-toolkit:6.0.0-pyt .. Note:: For this command to work, the latest version of ``singularity`` must be installed.
Enter this command to instantiate the Docker:
singularity run --nv -B /path/to/workspace:/path/to/workspace tao-toolkit-pyt:6.0.0-pyt.sif
Run the commands inside the container without the
tao model
prefix. For example, to run agrounding_dino
training in thetao-toolkit
container, use the following command:
grounding_dino train -e /path/to/experiments.yaml \ results_dir=/path/to/results \ train.num_gpus=4