TAO Toolkit Clearml Integration

TAO Toolkit v5.2.0

The following networks in TAO Toolkit interface with ClearML, allowing you to continuously iterate, visualize and track multiple training experiments, and compile meaningful insights into a training use case.

  1. DetectNet-v2

  2. FasterRCNN

  3. Image Classification - TF2

  4. RetinaNet

  5. YOLOv4/YOLOv4-Tiny

  6. YOLOv3

  7. SSD

  8. DSSD

  9. EfficientDet - TF2

  10. MaskRCNN

  11. UNet

In TAO Toolkit 4.0.1, the ClearML visualization suite synchronizes with the data rendered in TensorBoard. Therefore, to see rendered data over the ClearML server, you need to enable TensorBoard visualization. The integration also includes the ability to send you alerts via slack or Email for training runs that have failed.


Enabling MLOPS integration does not require you to install tensorboard.

These are the broad steps involved with setting up ClearML for TAO Toolkit:

  1. Setting up a ClearML account

  2. Acquiring a ClearML credentials

  3. Logging into the ClearML client

  4. Setting the configurable data for the ClearML experiment

Setting up a ClearML Account

Sign up for a free account at the ClearML website and then log in to your ClearML account.

Acquiring a ClearML API Credentials

Once you have logged in to your ClearML account, generate new credentials by navigating to the settings pane in the top-right corner of this window and clicking on Generate New Credentials.


NVIDIA recommends getting the credentials in the form of environment variables for maximum ease of use. You can get these variables by clicking on the Jupyter Notebook tab and copying the env variables.


Jupyter notebook tab from the credentials under Settings/Workspace

Install clearml Library

Install the clearml library on your local machine in a Python3 environment.


python3 -m pip install clearml

Log in to the ClearML Client in the TAO Toolkit Container

To communicate the data from the local compute unit and render data on the ClearML server dashboard, the ClearML client in the TAO Toolkit container must be logged in and synchronized with your profile. To have the clearml client in the container log in, set the following environment variables with the data you received when setting up your ClearML account.


%env CLEARML_WEB_HOST=https://app.clear.ml %env CLEARML_API_HOST=https://api.clear.ml %env CLEARML_FILES_HOST=https://files.clear.ml %env CLEARML_API_ACCESS_KEY=<API_ACCESS_KEY> %env CLEARML_API_SECRET_KEY=<API_SECRET_KEY>

To set the environment variable via the TAO Toolkit launcher, use the sample JSON file below for reference and replace the value field under the Envs element of the ~/.tao_mounts.json file.


{ "Mounts": [ { "source": "/path/to/your/data", "destination": "/workspace/tao-experiments/data" }, { "source": "/path/to/your/local/results", "destination": "/workspace/tao-experiments/results" }, { "source": "/path/to/config/files", "destination": "/workspace/tao-experiments/specs" } ], "Envs": [ { "variable": "CLEARML_WEB_HOST", "value": "https://app.clear.ml" }, { "variable": "CLEARML_API_HOST", "value": "https://api.clear.ml" }, { "variable": "CLEARML_FILES_HOST", "value": "https://files.clear.ml" }, { "variable": "CLEARML_API_ACCESS_KEY", "value": "<API_ACCESS_KEY>" }, { "variable": "CLEARML_API_SECRET_KEY", "value": "<API_SECRET_KEY>" } ], "DockerOptions": { "shm_size": "16G", "ulimits": { "memlock": -1, "stack": 67108864 }, "user": "1000:1000", "ports": { "8888": 8888 } } }


When running the networks from TAO Toolkit containers directly, use the -e flag with the docker command. For example, to run detectnet_v2 with ClearML directly via the container, use the following code.


docker run -it --rm --gpus all \ -v /path/in/host:/path/in/docker \ -e CLEARML_WEB_HOST="https://app.clear.ml" \ -e CLEARML_API_HOST="https://api.clear.ml" \ -e CLEARML_FILES_HOST="https://files.clear.ml" \ -e CLEARML_API_ACCESS_KEY="<API_ACCESS_KEY>" \ -e CLEARML_API_SECRET_KEY="<API_SECRET_KEY>" \ nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 \ detectnet_v2 train -e /path/to/experiment/spec.txt \ -r /path/to/results/dir \ -k $KEY --gpus 4

TAO Toolkit provides a few options to configure the clearml client:

  1. project: A string containing the name of the project the experiment data gets uploaded to

  2. tags: A list of strings that can be used to tag the experiment

  3. task: The name of the experiment. In order to maintain a unique name per run, TAO Toolkit appends to the name string a timestamp indicating when the experiment run was created.

Depending on the schema the network follows, the spec file snippet to be added to the network may vary slightly.

For DetectNet_v2, UNet, FasterRCNN, YOLOv3/YOLOv4/YOLOv4-Tiny, RetinaNet, and SSD/DSSD, please add the following snippet under the training_config config element of the network.


visualizer{ enabled: true clearml_config{ project: "name_of_project" tags: "training" tags: "tao_toolkit" task: "training_experiment_name" } }

For MaskRCNN, add the following snippet in the network’s training configuration


clearml_config{ project: "name_of_project" tags: "training" tags: "tao_toolkit" task: "training_experiment_name" }

For EfficientDet-TF2 and Classification-TF2, add the following snippet under the train config element in the train.yaml file.


clearml: task: "name_of_the_experiment" project: "name_of_the_project"

The following are sample images from a successful visualization run for DetectNet_v2.


Image showing intermediate inference images with bounding boxes before clustering using DBScan or NMS


Image showing system utilization plots.


Metrics plotted during training


Streaming logs from the local machine running the training.


Weight histograms of the trained model.

Previous TAO Toolkit WandB Integration
Next ASR
© Copyright 2024, NVIDIA. Last updated on Mar 18, 2024.