NVIDIA Docs Hub NVIDIA TAO TAO v5.5.0 TAO WandB Integration

TAO WandB Integration

The following networks in TAO interface with Weights & Biases to help you continuously iterate, visualize and track multiple training experiments, and compile meaningful insights into the training use case.

DetectNet-v2
FasterRCNN
Image Classification - TF2
RetinaNet
YOLOv4/YOLOv4-Tiny
YOLOv3
SSD
DSSD
EfficientDet - TF2
MaskRCNN
UNet
Data Analytics

In TAO 4.0.1, the Weights & Biases visualization suite synchronizes with the data rendered in TensorBoard. Therefore, to see rendered data over the weights and biases server, you will need to enable TensorBoard visualization. The integration also includes the ability to send you alerts via slack or email for training runs that have failed.

Note

Enabling MLOPS integration does not require you to install tensorboard.

Quick Start

These are the broad steps involved with setting up Weights & Biases for TAO:

Setting up a Weights & Biases account
Acquiring a Weights & Biases API key
Logging in to Weights & Biases
Setting configurable data for the Weights & Biases experiment

Setting up a Weights & Biases account

Wandb login screen

Acquiring a Weights & Biases API key

Once you have logged in to your Weights & Biases account, find your API key here.

Wandb credentials page

Install the wandb library

Install the wandb library on your local machine in a Python3 environment.

Copy
Copied!

            
            python3 -m pip install wandb

Log in to the wandb client in the TAO Container

To communicate the data from the local compute unit and render data on the Weights & Biases server dashboard, the wandb client in the TAO container must be logged in and synchronized with your profile. To include the wandb client in the container log in, set the WANDB_API_KEY environment variable in the TAO containers with the API key you received when setting up your Weights & Biases account.

To set the environment variable via the TAO launcher, use the sample JSON file below for reference and replace the value field under the Envs element of the ~/.tao_mounts.json file.

Warning

Weights and biases requires access to the /config directory in the container. Therefore, you will be required to instantiate the container with root access. Make sure to unset the user field under the DockerOptions settings in the ~/.tao_mounts.json file.

Copy
Copied!

            
            {
    "Mounts": [
        {
            "source": "/path/to/your/data",
            "destination": "/workspace/tao-experiments/data"
        },
        {
            "source": "/path/to/your/local/results",
            "destination": "/workspace/tao-experiments/results"
        },
        {
            "source": "/path/to/config/files",
            "destination": "/workspace/tao-experiments/specs"
        }
    ],
    "Envs": [
        {
            "variable": "WANDB_API_KEY",
            "value": "<api_key_value_from_wandb>"
        }
    ],
    "DockerOptions": {
        "shm_size": "16G",
        "ulimits": {
            "memlock": -1,
            "stack": 67108864
        },
        "ports": {
            "8888": 8888
        }
    }
}

Note

When running the networks from TAO containers directly, use the -e flag of the docker command. For example, to run detectnet_v2 with Weights & Biases directly via the container, use the following the code.

Copy
Copied!

            
            docker run -it --rm --gpus all \
           -v /path/in/host:/path/in/docker \
           -e WANDB_API_KEY=<api_key_value>
           nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 \
           detectnet_v2 train -e /path/to/experiment/spec.txt \
           -r /path/to/results/dir \
           -k $KEY --gpus 4

Configuring the wandb element in the training spec

TAO provides the following options to configure the wandb client:

project: A string containing the name of the project that the experiment data is uploaded to
entity: A string containing the name of the entity (group) under which the project is created
tags: A list of strings that can be used to tag the experiment
notes: A short description of the experiment
name: The name of the experiment. In order to maintain a unique name per run, TAO appends to the name string a timestamp indicating when the experiment run was created.

Depending upon the schema the network follows, the spec file snippet to be added to the network may vary slightly.

For DetectNet_v2, UNet, FasterRCNN, YOLOv3/YOLOv4/YOLOv4-Tiny, RetinaNet, SSD/DSSD, MaskRCNN, and UNet, add the following snippet under the training_config config element of the network.

Copy
Copied!

            
            visualizer{
    enabled: true
    wandb_config{
        project: "name_of_project"
        entity: "name_of_entity"
        tags: "training"
        tags: "tao_toolkit"
        name: "training_experiment_name"
        notes: "short description of experiment"
    }
}

For MaskRCNN, add the following snippet in the network’s training configuration

Copy
Copied!

            
            wandb_config{
    project: "name_of_project"
    entity: "name_of_entity"
    tags: "training"
    tags: "tao_toolkit"
    name: "training_experiment_name"
    notes: "short description of experiment"
}

For EfficientDet-TF2 and Classification-TF2, add the following snippet under the train config element in the train.yaml file.

Copy
Copied!

            
            wandb:
    entity: "name_of_entity"
    name: "name_of_the_experiment"
    project: "name_of_the_project"

Visualization output

The following are sample images from a successful visualization run for DetectNet_v2.

Image showing intermediate inference images with bounding boxes before clustering using DBScan or NMS

Image showing system utilization plots.

Configuration of the given experiment was saved for records.

Metrics plotted during training

Streaming logs from the local machine running the training.

Weight histograms of the trained model.

Previous TAO MLOPS Integration

Next TAO Clearml Integration