TAO Toolkit WandB Integration
The following networks in TAO Toolkit interface with Weights & Biases to help you continuously iterate, visualize and track multiple training experiments, and compile meaningful insights into the training use case.
DetectNet-v2
FasterRCNN
Image Classification - TF2
RetinaNet
YOLOv4/YOLOv4-Tiny
YOLOv3
SSD
DSSD
EfficientDet - TF2
MaskRCNN
UNet
Data Analytics
In TAO Toolkit 4.0.1, the Weights & Biases visualization suite synchronizes with the data rendered in TensorBoard. Therefore, to see rendered data over the weights and biases server, you will need to enable TensorBoard visualization. The integration also includes the ability to send you alerts via slack or email for training runs that have failed.
Enabling MLOPS integration does not require you to install tensorboard.
These are the broad steps involved with setting up Weights & Biases for TAO Toolkit:
Setting up a Weights & Biases account
Acquiring a Weights & Biases API key
Logging in to Weights & Biases
Setting configurable data for the Weights & Biases experiment
Setting up a Weights & Biases account
Sign up for a free account at the Weights & Biases website and then log in to your account.
Wandb login screen
Acquiring a Weights & Biases API key
Once you have logged in to your Weights & Biases account, find your API key here.
Wandb credentials page
Install the wandb library
Install the wandb library on your local machine in a Python3 environment.
python3 -m pip install wandb
Log in to the wandb client in the TAO Toolkit Container
To communicate the data from the local compute unit and render data on the Weights & Biases
server dashboard, the wandb client in the TAO Toolkit container must
be logged in and synchronized with your profile. To include the wandb client in the container log in,
set the WANDB_API_KEY
environment variable in the TAO Toolkit containers with the API key
you received when setting up your Weights & Biases account.
To set the environment variable via the TAO Toolkit launcher, use the sample JSON file below for
reference and replace the value
field under the Envs
element of the
~/.tao_mounts.json
file.
Weights and biases requires access to the /config
directory in the container. Therefore,
you will be required to instantiate the container with root access. Make sure to unset the
user
field under the DockerOptions
settings in the ~/.tao_mounts.json
file.
{
"Mounts": [
{
"source": "/path/to/your/data",
"destination": "/workspace/tao-experiments/data"
},
{
"source": "/path/to/your/local/results",
"destination": "/workspace/tao-experiments/results"
},
{
"source": "/path/to/config/files",
"destination": "/workspace/tao-experiments/specs"
}
],
"Envs": [
{
"variable": "WANDB_API_KEY",
"value": "<api_key_value_from_wandb>"
}
],
"DockerOptions": {
"shm_size": "16G",
"ulimits": {
"memlock": -1,
"stack": 67108864
},
"ports": {
"8888": 8888
}
}
}
When running the networks from TAO toolkit containers directly, use the -e
flag
of the docker
command. For example, to run detectnet_v2 with Weights & Biases directly
via the container, use the following the code.
docker run -it --rm --gpus all \
-v /path/in/host:/path/in/docker \
-e WANDB_API_KEY=<api_key_value>
nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 \
detectnet_v2 train -e /path/to/experiment/spec.txt \
-r /path/to/results/dir \
-k $KEY --gpus 4
TAO Toolkit provides the following options to configure the wandb client:
project
: A string containing the name of the project that the experiment data is uploaded toentity
: A string containing the name of the entity (group) under which the project is createdtags
: A list of strings that can be used to tag the experimentnotes
: A short description of the experimentname
: The name of the experiment. In order to maintain a unique name per run, TAO Toolkit appends to the name string a timestamp indicating when the experiment run was created.
Depending upon the schema the network follows, the spec file snippet to be added to the network may vary slightly.
For DetectNet_v2, UNet, FasterRCNN, YOLOv3/YOLOv4/YOLOv4-Tiny, RetinaNet, SSD/DSSD, MaskRCNN, and UNet,
add the following snippet under the training_config
config element of the network.
visualizer{
enabled: true
wandb_config{
project: "name_of_project"
entity: "name_of_entity"
tags: "training"
tags: "tao_toolkit"
name: "training_experiment_name"
notes: "short description of experiment"
}
}
For MaskRCNN, add the following snippet in the network’s training configuration
wandb_config{
project: "name_of_project"
entity: "name_of_entity"
tags: "training"
tags: "tao_toolkit"
name: "training_experiment_name"
notes: "short description of experiment"
}
For EfficientDet-TF2 and Classification-TF2, add the following snippet under the train
config element
in the train.yaml
file.
wandb:
entity: "name_of_entity"
name: "name_of_the_experiment"
project: "name_of_the_project"
The following are sample images from a successful visualization run for DetectNet_v2.
Image showing intermediate inference images with bounding boxes before clustering using DBScan or NMS
Image showing system utilization plots.
Configuration of the given experiment was saved for records.
Metrics plotted during training
Streaming logs from the local machine running the training.
Weight histograms of the trained model.