Remote Client Overview and Examples#

TAO Remote Client is a command line tool that allows you to create experiments using the command line instead of relying on API calls.

Note

Datasets provided in these examples are subject to the following license Dataset License.

Installation#

Setting up Your Python Environment#

We recommend setting up a Python environment using miniconda. The following instructions show how to setup a Python conda environment.

Follow the instructions in this link to set up a Conda environment using Miniconda.
After you have installed miniconda, create a new environment and set the Python version to 3.12.
conda create -n tao python=3.12
Activate the conda environment that you have just created.
conda activate tao
Verify that the command prompt shows the name of your Conda environment.
(tao) desktop:

When you are done with your session, you can deactivate your conda environment using the deactivate command:

conda deactivate

You may re-instantiate this conda environment using the following command:

conda activate tao

Install the TAO Client#

After you setup and activate the python environment, you could install the TAO Client using the following command:

python -m pip install nvidia-tao-client

CLI Specs#

User authentication is based on the NGC Personal Key and can be done with the following command:

BASE_URL=https://IP_of_machine_deployed/api/v1

NGC_API_KEY=nvapi-zZYtczM5amdtdDcwNjk0cnA2bGU2bXQ3bnQ6NmQ4NjNhMDItMTdmZS00Y2QxLWI2ZjktNmE5M2YxZTc0OGyS

$ tao-client login --ngc-key $NGC_API_KEY --ngc-org-name ea-tlt

After authentication, the command line syntax is:

$ tao-client <network> <action> <args>

For example:

$ tao-client dino experiment-run-action --action train --id 042559ec-ab3e-438d-9c94-2cab38f76efc --specs '<json_loadable_specs_string_from_get_spec_action>'

Note

You can always use the --help argument to retrieve the command usage information.

To list supported networks:

$ tao-client -–help

To list supported Dino actions:

$ tao-client dino --help

Each network have the following actions:

Command

Description

workspace-create

Create a workspace and return the id

dataset-create

Create a dataset and return the id

dataset-delete

Delete a dataset

dataset-run-action

Run a dataset action

experiment-create

Create an experiment and return the id

experiment-delete

Delete an experiment

experiment-run-action

Run an experiment action

get-action-status

Get action status

get-job-logs

Get the logs of a job

get-metadata

Get the metadata of the mentioned artifact

get-spec

Return default spec of an action

job-cancel

Cancel a running job

job-pause

Pause a running job

job-resume

Resume a paused job

list-base-experiments

Return the list of base experiments

list-datasets

Return the list of datasets

list-experiments

Return the list of experiments

list-job-files

List the files, specs and logs of a job

download-entire-job

Download all files w.r.t to the job

download-selective-files

Download job files based on the arguments passed

model-automl-defaults

Return default automl parameters

patch-artifact-metadata

Patch the metadata of the mentioned artifact

publish-model

Publish model

remove-published-model

Remove published model

workspace-backup

Backup mongo database to cloud storage

workspace-restore

Restore a workspace from cloud storage

Workspaces#

In TAO 6.0, cloud workspaces are used to pull datasets and store experiment results in popular cloud storage providers.

AWS - cloud_type: aws; cloud_specific_details needed: access_key, secret_key, aws_region, s3_bucket_name

Azure - cloud_type: azure; cloud_specific_details needed: account_name, access_key, azure_region, azure_blob_name

HuggingFace datasets - cloud_type: huggingface; cloud_specific_details needed: token (Not applicable for experiment results storage)

Datasets#

User can either use datasets stored in the cloud workspace with cloud_file_path or public dataset with an https url

Object Detection Use Case Example with CLI#

Creating a Cloud Workspace

This returns a UUID representing the workspace ID that will be used during the creation of datasets and workspaces. The following are CLI argument parameters: * name - a identifiable name for the workspace * cloud_type - one of the cloud storage types supported by TAO (aws, azure, huggingface) * cloud_details - dictionary of required cloud storage values like bucket name, access credentials with write permissions, etc.
```
WORKSPACE_ID=$(tao-client dino workspace-create \
--name 'AWS Public' \
--cloud_type 'aws' \
--cloud_details '{
   "cloud_region": "us-west-1",
   "cloud_bucket_name": "bucket_name",
   "access_key": "access_key",
   "secret_key": "secret_key"
}')
echo $WORKSPACE_ID
```

Creating the training dataset

This returns a UUID representing the train dataset ID for other steps like dataset_convert and train. Examples include options for the private cloud dataset using workspace_id and public cloud dataset using url. The following are CLI argument parameters:
- dataset_type - one of TAO’s supported dataset types
- dataset_format - one of the format supported for the type chosen above
- use_for - a list of use cases for the dataset. Choose from [“training”, “evaluation”, “testing”]
For private cloud dataset:
- workspace - the ID from Step 1.
- cloud_file_path - the path to the dataset folder in cloud.
For public cloud dataset:
- url - the url of the dataset.
```
TRAIN_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --url https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/data/object_detection_train/ --use_for '["training"]')
TRAIN_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --workspace $WORKSPACE_ID --cloud_file_path <path to daataset folder in cloud> --use_for '["training"]')
echo $TRAIN_DATASET_ID
```

To monitor the status of the train dataset download, run (pull) the get-metadata:

TRAIN_DATASET_PULL_STATUS=$(tao-client dino get-metadata --id $TRAIN_DATASET_ID --job_type dataset)
echo $TRAIN_DATASET_PULL_STATUS

Creating the validation dataset

This returns a UUID representing the eval dataset id for other steps like dataset_convert and evaluate. Examples include options for the private cloud dataset using workspace_id and public cloud dataset using url. The possible CLI arguments are the same as for the train dataset.

EVAL_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --url https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/data/object_detection_val/ --use_for '["evaluation"]')
EVAL_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --workspace $WORKSPACE_ID --cloud_file_path /data/tao_od_synthetic_subset_val_no_convert --use_for '["evaluation"]')
echo $EVAL_DATASET_ID

To monitor the status of the validation dataset download, run (pull) the get-metadata:

EVAL_DATASET_PULL_STATUS=$(tao-client dino get-metadata --id $EVAL_DATASET_ID --job_type dataset)
echo $EVAL_DATASET_PULL_STATUS

Finding a base experiment

The following command lists the base experiments available for use. Pick one that corresponds to Dino and use it in step 5.

BASE_EXP_RESPONSE=$(tao-client dino list-base-experiments --filter_params '{"network_arch": "dino"}')

# Post Processing to convert bash output to a json string
BASE_EXP_RESPONSE="${BASE_EXP_RESPONSE:1:-1}"
BASE_EXP_RESPONSE=$(echo "$BASE_EXP_RESPONSE" | sed "s/'/\"/g")
BASE_EXP_RESPONSE=$(echo "$BASE_EXP_RESPONSE" | sed -e "s/None/null/g" -e "s/True/true/g" -e "s/False/false/g")
BASE_EXP_RESPONSE=$(echo "$BASE_EXP_RESPONSE" | sed 's/}, {/},\n{/g')
BASE_EXP_RESPONSE="[$BASE_EXP_RESPONSE]"

BASE_EXPERIMENT_ID=$(echo "$BASE_EXP_RESPONSE" | jq . | jq -r '[.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .id')
echo $BASE_EXPERIMENT_ID

Creating an experiment

This returns a UUID representing the experiment id for all of the future steps like train, evaluate, and inference. CLI arguments for creating an experiment are:
- network_arch - one of TAO’s supported network architectures
- workspace - the ID of workspace
- encryption_key - encryption key for loading the Base Experiment
```
EXPERIMENT_ID=$(tao-client dino experiment-create --network_arch dino --encryption_key nvidia_tlt --workspace $WORKSPACE_ID)
echo $EXPERIMENT_ID
```

Assign datasets and a base experiment to the experiment

UPDATE_INFO=$(cat <<EOF
{
    "base_experiment": ["$BASE_EXPERIMENT_ID"],
    "train_datasets": ["$TRAIN_DATASET_ID"],
    "eval_dataset": "$EVAL_DATASET_ID",
    "inference_dataset": "$EVAL_DATASET_ID",
    "calibration_dataset": "$TRAIN_DATASET_ID"
}
EOF
)
EXPERIMENT_METADATA=$(tao-client dino patch-artifact-metadata --id $EXPERIMENT_ID --job_type experiment --update_info "$UPDATE_INFO")
echo $EXPERIMENT_METADATA | jq

Key-value pairs:

base_experiment: Base Experiment ID of the base_experiment from Step 3.
train_datasets: The train dataset IDs
eval_dataset: The eval dataset ID
inference_dataset: The test dataset ID
calibration_dataset: The train dataset ID
docker_env_vars: Key value pairs of MLOPs settings: wandbApiKey, clearMlWebHost, clearMlApiHost, clearMlFilesHost, clearMlApiAccessKey, clearMlApiSecretKey.

Training an experiment

Get specs:

Returns a JSON loadable string of specs to be used in the train step.
TRAIN_SPECS=$(tao-client dino get-spec --action train --job_type experiment --id $EXPERIMENT_ID)
echo $TRAIN_SPECS | jq
Modify specs:

Modify the specs from the previous step, if necessary.
TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_epochs=10')
TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_gpus=2')
echo $TRAIN_SPECS | jq
Run the train action:

CLI arguments for running the train action are:

actions - action to be executed

specs - spec dictionary of the action to be executed
TRAIN_ID=$(tao-client dino experiment-run-action --action train --id $EXPERIMENT_ID --specs "$TRAIN_SPECS")
echo $TRAIN_ID
Check status of training Job:

To monitor the status of training job, run (pull) the get-action-status:

CLI arguments for geting action metadata are:

id - ID of the experiment

job - Job for which action metadata is to be retrieved

job_type - experiment
tao-client dino get-action-status --job_type experiment --id $EXPERIMENT_ID --job $TRAIN_ID | jq

Evaluating an experiment

Get specs:

Returns a JSON loadable string of specs to be used in the evaluate step.
EVALUATE_SPECS=$(tao-client dino get-spec --action evaluate --job_type experiment --id $EXPERIMENT_ID)
echo $EVALUATE_SPECS | jq
Modify specs:

Modify the specs from the previous step, if necessary.
Run the evaluate action.

CLI arguments for running the evaluate action are:

parent_job_id - ID of the parent job if any

actions - action to be executed

specs - spec dictionary of the action to be executed
EVALUATE_ID=$(tao-client dino experiment-run-action --action evaluate --id $EXPERIMENT_ID --parent_job_id $TRAIN_ID --specs "$EVALUATE_SPECS")
echo $EVALUATE_ID
Check the status of the evaluate job:

To monitor the status of the evaluation job, run (pull) the get-action-status:

CLI arguments for geting action metadata:

id - ID of the experiment

job - Job for which action metadata is to be retrieved

job_type - experiment
tao-client dino get-action-status --job_type experiment --id $EXPERIMENT_ID --job $EVALUATE_ID | jq

Inference for an experiment

Get specs:

Returns a JSON loadable string of specs to be used in the inference step.
INFERENCE_SPECS=$(tao-client dino get-spec --action inference --job_type experiment --id $EXPERIMENT_ID)
echo $INFERENCE_SPECS | jq
Modify specs:

Modify the specs from the previous step, if necessary.
Run the inference action:

CLI arguments for running evaluate action:

parent_job_id - ID of the parent job if any

actions - action to be executed

specs - spec dictionary of the action to be executed
INFERENCE_ID=$(tao-client dino experiment-run-action --action inference --id $EXPERIMENT_ID --parent_job_id $TRAIN_ID --specs "$INFERENCE_SPECS")
echo $INFERENCE_ID
Check the status of the inference job:

To monitor the status of inference job, run (pull) the get-action-status:

CLI arguments for geting action metadata:

id - ID of the experiment

job - Job for which action metadata is to be retrieved

job_type - experiment
tao-client dino get-action-status --job_type experiment --id $EXPERIMENT_ID --job $INFERENCE_ID | jq

Command	Description
workspace-create	Create a workspace and return the id
dataset-create	Create a dataset and return the id
dataset-delete	Delete a dataset
dataset-run-action	Run a dataset action
experiment-create	Create an experiment and return the id
experiment-delete	Delete an experiment
experiment-run-action	Run an experiment action
get-action-status	Get action status
get-job-logs	Get the logs of a job
get-metadata	Get the metadata of the mentioned artifact
get-spec	Return default spec of an action
job-cancel	Cancel a running job
job-pause	Pause a running job
job-resume	Resume a paused job
list-base-experiments	Return the list of base experiments
list-datasets	Return the list of datasets
list-experiments	Return the list of experiments
list-job-files	List the files, specs and logs of a job
download-entire-job	Download all files w.r.t to the job
download-selective-files	Download job files based on the arguments passed
model-automl-defaults	Return default automl parameters
patch-artifact-metadata	Patch the metadata of the mentioned artifact
publish-model	Publish model
remove-published-model	Remove published model
workspace-backup	Backup mongo database to cloud storage
workspace-restore	Restore a workspace from cloud storage