TAO Toolkit WandB Integration

NVIDIA TAO Release 4.0.1

The following networks in TAO Toolkit interface with Weights & Biases to help you continuously iterate, visualize and track multiple training experiments, and compile meaningful insights into the training use case.

  1. DetectNet-v2

  2. FasterRCNN

  3. Image Classification - TF2

  4. RetinaNet

  5. YOLOv4/YOLOv4-Tiny

  6. YOLOv3

  7. SSD

  8. DSSD

  9. EfficientDet - TF2

In TAO Toolkit 4.0.1, the Weights & Biases visualization suite synchronizes with the data rendered in TensorBoard. Therefore, to see rendered data over the weights and biases server, you will need to enable TensorBoard visualization. The integration also includes the ability to send you alerts via slack or email for training runs that have failed.


Enabling MLOPS integration does not require you to install tensorboard.

These are the broad steps involved with setting up Weights & Biases for TAO Toolkit:

  1. Setting up a Weights & Biases account

  2. Acquiring a Weights & Biases API key

  3. Logging in to Weights & Biases

  4. Setting configurable data for the Weights & Biases experiment

Setting up a Weights & Biases account

Sign up for a free account at the Weights & Biases website and then log in to your account.


Wandb login screen

Acquiring a Weights & Biases API key

Once you have logged in to your Weights & Biases account, find your API key here.


Wandb credentials page

Install the wandb library

Install the wandb library on your local machine in a Python3 environment.


python3 -m pip install wandb

Log in to the wandb client in the TAO Toolkit Container

To communicate the data from the local compute unit and render data on the Weights & Biases server dashboard, the wandb client in the TAO Toolkit container must be logged in and synchronized with your profile. To include the wandb client in the container log in, set the WANDB_API_KEY environment variable in the TAO Toolkit containers with the API key you received when setting up your Weights & Biases account.

To set the environment variable via the TAO Toolkit launcher, use the sample JSON file below for reference and replace the value field under the Envs element of the ~/.tao_mounts.json file.


Weights and biases requires access to the /config directory in the container. Therefore, you will be required to instantiate the container with root access. Make sure to unset the user field under the DockerOptions settings in the ~/.tao_mounts.json file.


{ "Mounts": [ { "source": "/path/to/your/data", "destination": "/workspace/tao-experiments/data" }, { "source": "/path/to/your/local/results", "destination": "/workspace/tao-experiments/results" }, { "source": "/path/to/config/files", "destination": "/workspace/tao-experiments/specs" } ], "Envs": [ { "variable": "WANDB_API_KEY", "value": "<api_key_value_from_wandb>" } ], "DockerOptions": { "shm_size": "16G", "ulimits": { "memlock": -1, "stack": 67108864 }, "ports": { "8888": 8888 } } }


When running the networks from TAO toolkit containers directly, use the -e flag of the docker command. For example, to run detectnet_v2 with Weights & Biases directly via the container, use the following the code.


docker run -it --rm --gpus all \ -v /path/in/host:/path/in/docker \ -e WANDB_API_KEY=<api_key_value> nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5 \ detectnet_v2 train -e /path/to/experiment/spec.txt \ -r /path/to/results/dir \ -k $KEY --gpus 4

TAO Toolkit provides the following options to configure the wandb client:

  1. project: A string containing the name of the project that the experiment data is uploaded to

  2. entity: A string containing the name of the entity (group) under which the project is created

  3. tags: A list of strings that can be used to tag the experiment

  4. notes: A short description of the experiment

  5. name: The name of the experiment. In order to maintain a unique name per run, TAO Toolkit appends to the name string a timestamp indicating when the experiment run was created.

Depending upon the schema the network follows, the spec file snippet to be added to the network may vary slightly.

For DetectNet_v2, FasterRCNN, YOLOv3/YOLOv4/YOLOv4-Tiny, RetinaNet, SSD/DSSD, MaskRCNN, and UNet, add the following snippet under the training_config config element of the network.


visualizer{ enabled: true wandb_config{ project: "name_of_project" entity: "name_of_entity" tags: "training" tags: "tao_toolkit" name: "training_experiment_name" notes: "short description of experiment" } }

For EfficientDet-TF2 and Classification-TF2, add the following snippet under the train config element in the train.yaml file.


wandb: entity: "name_of_entity" name: "name_of_the_experiment" project: "name_of_the_project"

The following are sample images from a successful visualization run for DetectNet_v2.


Image showing intermediate inference images with bounding boxes before clustering using DBScan or NMS


Image showing system utilization plots.


Configuration of the given experiment was saved for records.


Metrics plotted during training


Streaming logs from the local machine running the training.


Weight histograms of the trained model.

© Copyright 2023, NVIDIA.. Last updated on Aug 2, 2023.