TAO Clearml Integration#
TAO’s TensorFlow 2 networks integrate with ClearML, enabling you to continuously iterate, visualize and track multiple training experiments, and compile meaningful insights into a training use case.
ClearML visualization suite synchronizes with the data rendered in TensorBoard. Therefore, to see rendered data over the ClearML server, you need to enable TensorBoard visualization. The integration also includes the ability to send you alerts via slack or Email for training runs that have failed.
Note
Enabling MLOPS integration does not require you to install tensorboard.
Quick Start#
These are the broad steps involved with setting up ClearML for TAO:
Set up a ClearML account
Acquire ClearML credentials
Log into the ClearML client
Configure experiment spec for ClearML
Set up a ClearML Account#
Sign up for a free account at the ClearML website and then log in to your ClearML account.
Acquire ClearML API Credentials#
Log in to your ClearML account
Navigate to the Configuration Page
Click on Create new credentials to generate API keys
Note
NVIDIA recommends getting the credentials in the form of environment variables for maximum
ease of use. You can get these variables by clicking on the Jupyter Notebook
tab and
copying the env variables.

Jupyter notebook tab from the credentials under Settings/Workspace#
Log in to the ClearML Client#
To send data from your local compute unit and display it on the ClearML server dashboard,
log in to the ClearML client within the TAO Finetuning Microservice and synchronize it with your profile.
Add the following parameters to the docker_env_vars
dictionary in your
create_experiment request body.
For example:
{
"docker_env_vars": {
"CLEARML_WEB_HOST": "<web_host>",
"CLEARML_API_HOST": "<api_host>",
"CLEARML_FILES_HOST": "<files_host>",
"CLEARML_API_ACCESS_KEY": "<api_access_key>",
"CLEARML_API_SECRET_KEY": "<api_secret_key>",
}
}
Configure the ClearML Element in the Training Specification
The TAO toolkit provides the following configuration options for ClearML:
project
: String specifying the project name where experiment data is uploaded
tags
: List of strings for experiment tagging
deferred_init
: Boolean to determine whether to wait for the experiment to be fully initialized
continue_last_task
: Boolean to resume execution from a previous experiment
reuse_last_task_id
: Forces new experiment creation with an existing task ID
task
: Names the experiment (TAO appends a timestamp to ensure unique names per run)
Add these configurations to the clearml
element in your
specs request body. For example:
{
"specs": {
"train": {
"clearml": {
"project": "tao_toolkit",
"tags": ["training", "tao_toolkit"],
"deferred_init": true,
"continue_last_task": true,
"task": "training_experiment_name"
}
}
}
}
To communicate the data from the local compute unit and render data on the ClearML server dashboard, the ClearML client in the TAO container must be logged in and synchronized with your profile. To have the clearml client in the container log in, set the following environment variables with the data you received when setting up your ClearML account.
%env CLEARML_WEB_HOST=https://app.clear.ml
%env CLEARML_API_HOST=https://api.clear.ml
%env CLEARML_FILES_HOST=https://files.clear.ml
%env CLEARML_API_ACCESS_KEY=<API_ACCESS_KEY>
%env CLEARML_API_SECRET_KEY=<API_SECRET_KEY>
To set the environment variable via the TAO launcher, use the sample JSON file below for
reference and replace the value
field under the Envs
element of the
~/.tao_mounts.json
file.
{
"Mounts": [
{
"source": "/path/to/your/data",
"destination": "/workspace/tao-experiments/data"
},
{
"source": "/path/to/your/local/results",
"destination": "/workspace/tao-experiments/results"
},
{
"source": "/path/to/config/files",
"destination": "/workspace/tao-experiments/specs"
}
],
"Envs": [
{
"variable": "CLEARML_WEB_HOST",
"value": "https://app.clear.ml"
},
{
"variable": "CLEARML_API_HOST",
"value": "https://api.clear.ml"
},
{
"variable": "CLEARML_FILES_HOST",
"value": "https://files.clear.ml"
},
{
"variable": "CLEARML_API_ACCESS_KEY",
"value": "<API_ACCESS_KEY>"
},
{
"variable": "CLEARML_API_SECRET_KEY",
"value": "<API_SECRET_KEY>"
}
],
"DockerOptions": {
"shm_size": "16G",
"ulimits": {
"memlock": -1,
"stack": 67108864
},
"user": "1000:1000",
"ports": {
"8888": 8888
}
}
}
Note
When running the networks from TAO containers directly, use the -e
flag
with the docker
command. For example, to run classification_tf2 with ClearML directly
via the container, use the following code.
docker run -it --rm --gpus all \
-v /path/in/host:/path/in/docker \
-e CLEARML_WEB_HOST="https://app.clear.ml" \
-e CLEARML_API_HOST="https://api.clear.ml" \
-e CLEARML_FILES_HOST="https://files.clear.ml" \
-e CLEARML_API_ACCESS_KEY="<API_ACCESS_KEY>" \
-e CLEARML_API_SECRET_KEY="<API_SECRET_KEY>" \
nvcr.io/nvidia/tao/tao-toolkit:6.0.0-tf2 \
classification_tf2 train -e /path/to/experiment/spec.yaml
Configure the ClearML Element in the Training Spec
TAO provides a few options to configure the clearml client:
project
: String specifying the project name where experiment data is uploaded
tags
: List of strings for experiment tagging
deferred_init
: Boolean to determine whether to wait for the experiment to be fully initialized
continue_last_task
: Boolean to resume execution from a previous experiment
reuse_last_task_id
: Forces new experiment creation with an existing task ID
task
: Names the experiment (TAO appends a timestamp to ensure unique names per run)
For EfficientDet-TF2 and Classification-TF2, add the following snippet under the train
config element
in the train.yaml
file.
clearml:
task: "name_of_the_experiment"
project: "name_of_the_project"
Visualization Output#
The following are sample images from a successful visualization run for DetectNet_v2.

Image showing intermediate inference images with bounding boxes before clustering using DBScan or NMS#

Image showing system utilization plots.#

Metrics plotted during training#

Streaming logs from the local machine running the training.#

Weight histograms of the trained model.#