TAO Toolkit WandB Integration

The following networks in TAO Toolkit interface with Weights & Biases to help you continuously iterate, visualize and track multiple training experiments, and compile meaningful insights into the training use case.

  1. DetectNet-v2

  2. FasterRCNN

  3. Image Classification - TF2

  4. RetinaNet

  5. YOLOv4/YOLOv4-Tiny

  6. YOLOv3

  7. SSD

  8. DSSD

  9. EfficientDet - TF2

  10. MaskRCNN

  11. UNet

  12. Data Analytics

In TAO Toolkit 4.0.1, the Weights & Biases visualization suite synchronizes with the data rendered in TensorBoard. Therefore, to see rendered data over the weights and biases server, you will need to enable TensorBoard visualization. The integration also includes the ability to send you alerts via slack or email for training runs that have failed.

Note

Enabling MLOPS integration does not require you to install tensorboard.

These are the broad steps involved with setting up Weights & Biases for TAO Toolkit:

  1. Setting up a Weights & Biases account

  2. Acquiring a Weights & Biases API key

  3. Logging in to Weights & Biases

  4. Setting configurable data for the Weights & Biases experiment

Setting up a Weights & Biases account

Sign up for a free account at the Weights & Biases website and then log in to your account.

wandb_login.png

Wandb login screen

Acquiring a Weights & Biases API key

Once you have logged in to your Weights & Biases account, find your API key here.

wandb_credentials.png

Wandb credentials page

Install the wandb library

Install the wandb library on your local machine in a Python3 environment.

Copy
Copied!
            

python3 -m pip install wandb

Log in to the wandb client in the TAO Toolkit Container

To communicate the data from the local compute unit and render data on the Weights & Biases server dashboard, the wandb client in the TAO Toolkit container must be logged in and synchronized with your profile. To include the wandb client in the container log in, set the WANDB_API_KEY environment variable in the TAO Toolkit containers with the API key you received when setting up your Weights & Biases account.

To set the environment variable via the TAO Toolkit launcher, use the sample JSON file below for reference and replace the value field under the Envs element of the ~/.tao_mounts.json file.

Warning

Weights and biases requires access to the /config directory in the container. Therefore, you will be required to instantiate the container with root access. Make sure to unset the user field under the DockerOptions settings in the ~/.tao_mounts.json file.

Copy
Copied!
            

{ "Mounts": [ { "source": "/path/to/your/data", "destination": "/workspace/tao-experiments/data" }, { "source": "/path/to/your/local/results", "destination": "/workspace/tao-experiments/results" }, { "source": "/path/to/config/files", "destination": "/workspace/tao-experiments/specs" } ], "Envs": [ { "variable": "WANDB_API_KEY", "value": "<api_key_value_from_wandb>" } ], "DockerOptions": { "shm_size": "16G", "ulimits": { "memlock": -1, "stack": 67108864 }, "ports": { "8888": 8888 } } }

Note

When running the networks from TAO toolkit containers directly, use the -e flag of the docker command. For example, to run detectnet_v2 with Weights & Biases directly via the container, use the following the code.

Copy
Copied!
            

docker run -it --rm --gpus all \ -v /path/in/host:/path/in/docker \ -e WANDB_API_KEY=<api_key_value> nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 \ detectnet_v2 train -e /path/to/experiment/spec.txt \ -r /path/to/results/dir \ -k $KEY --gpus 4

TAO Toolkit provides the following options to configure the wandb client:

  1. project: A string containing the name of the project that the experiment data is uploaded to

  2. entity: A string containing the name of the entity (group) under which the project is created

  3. tags: A list of strings that can be used to tag the experiment

  4. notes: A short description of the experiment

  5. name: The name of the experiment. In order to maintain a unique name per run, TAO Toolkit appends to the name string a timestamp indicating when the experiment run was created.

Depending upon the schema the network follows, the spec file snippet to be added to the network may vary slightly.

For DetectNet_v2, UNet, FasterRCNN, YOLOv3/YOLOv4/YOLOv4-Tiny, RetinaNet, SSD/DSSD, MaskRCNN, and UNet, add the following snippet under the training_config config element of the network.

Copy
Copied!
            

visualizer{ enabled: true wandb_config{ project: "name_of_project" entity: "name_of_entity" tags: "training" tags: "tao_toolkit" name: "training_experiment_name" notes: "short description of experiment" } }

For MaskRCNN, add the following snippet in the network’s training configuration

Copy
Copied!
            

wandb_config{ project: "name_of_project" entity: "name_of_entity" tags: "training" tags: "tao_toolkit" name: "training_experiment_name" notes: "short description of experiment" }

For EfficientDet-TF2 and Classification-TF2, add the following snippet under the train config element in the train.yaml file.

Copy
Copied!
            

wandb: entity: "name_of_entity" name: "name_of_the_experiment" project: "name_of_the_project"

The following are sample images from a successful visualization run for DetectNet_v2.

rich_media_images1.png

Image showing intermediate inference images with bounding boxes before clustering using DBScan or NMS

system_utilization1.png

Image showing system utilization plots.

experiment_config.png

Configuration of the given experiment was saved for records.

metric_plots1.png

Metrics plotted during training

logging1.png

Streaming logs from the local machine running the training.

histograms1.png

Weight histograms of the trained model.

Previous TAO Toolkit MLOPS Integration
Next TAO Toolkit Clearml Integration
© Copyright 2024, NVIDIA. Last updated on Mar 22, 2024.