Remote Client Overview and Examples#

TAO Remote Client is a command line tool that allows you to create experiments using the command line instead of relying on API calls.

Note

Datasets provided in these examples are subject to the following license Dataset License.

Installation#

Setting up Your Python Environment#

We recommend setting up a Python environment using miniconda. The following instructions show how to setup a Python conda environment.

  1. Follow the instructions in this link to set up a Conda environment using Miniconda.

  2. After you have installed miniconda, create a new environment and set the Python version to 3.12.

    conda create -n tao python=3.12
    
  3. Activate the conda environment that you have just created.

    conda activate tao
    
  4. Verify that the command prompt shows the name of your Conda environment.

    (tao) desktop:
    

When you are done with your session, you can deactivate your conda environment using the deactivate command:

conda deactivate

You may re-instantiate this conda environment using the following command:

conda activate tao

Install the TAO Client#

After you setup and activate the python environment, you could install the TAO Client using the following command:

python -m pip install nvidia-tao-client

CLI Specs#

User authentication is based on the NGC Personal Key and can be done with the following command:

BASE_URL=https://IP_of_machine_deployed/api/v1

NGC_API_KEY=nvapi-zZYtczM5amdtdDcwNjk0cnA2bGU2bXQ3bnQ6NmQ4NjNhMDItMTdmZS00Y2QxLWI2ZjktNmE5M2YxZTc0OGyS

$ tao-client login --ngc-key $NGC_API_KEY --ngc-org-name ea-tlt

After authentication, the command line syntax is:

$ tao-client <network> <action> <args>

For example:

$ tao-client dino experiment-run-action --action train --id 042559ec-ab3e-438d-9c94-2cab38f76efc --specs '<json_loadable_specs_string_from_get_spec_action>'

Note

You can always use the --help argument to retrieve the command usage information.

To list supported networks:

$ tao-client -–help

To list supported Dino actions:

$ tao-client dino --help

Each network have the following actions:

Command

Description

workspace-create

Create a workspace and return the id

dataset-create

Create a dataset and return the id

dataset-delete

Delete a dataset

dataset-run-action

Run a dataset action

experiment-create

Create an experiment and return the id

experiment-delete

Delete an experiment

experiment-run-action

Run an experiment action

get-action-status

Get action status

get-job-logs

Get the logs of a job

get-metadata

Get the metadata of the mentioned artifact

get-spec

Return default spec of an action

job-cancel

Cancel a running job

job-pause

Pause a running job

job-resume

Resume a paused job

list-base-experiments

Return the list of base experiments

list-datasets

Return the list of datasets

list-experiments

Return the list of experiments

list-job-files

List the files, specs and logs of a job

download-entire-job

Download all files w.r.t to the job

download-selective-files

Download job files based on the arguments passed

model-automl-defaults

Return default automl parameters

patch-artifact-metadata

Patch the metadata of the mentioned artifact

publish-model

Publish model

remove-published-model

Remove published model

workspace-backup

Backup mongo database to cloud storage

workspace-restore

Restore a workspace from cloud storage

Workspaces#

In TAO 6.0, cloud workspaces are used to pull datasets and store experiment results in popular cloud storage providers.

  • AWS - cloud_type: aws; cloud_specific_details needed: access_key, secret_key, aws_region, s3_bucket_name

  • Azure - cloud_type: azure; cloud_specific_details needed: account_name, access_key, azure_region, azure_blob_name

  • HuggingFace datasets - cloud_type: huggingface; cloud_specific_details needed: token (Not applicable for experiment results storage)

Datasets#

User can either use datasets stored in the cloud workspace with cloud_file_path or public dataset with an https url

Object Detection Use Case Example with CLI#

  1. Creating a Cloud Workspace

    This returns a UUID representing the workspace ID that will be used during the creation of datasets and workspaces. The following are CLI argument parameters: * name - a identifiable name for the workspace * cloud_type - one of the cloud storage types supported by TAO (aws, azure, huggingface) * cloud_details - dictionary of required cloud storage values like bucket name, access credentials with write permissions, etc.

    WORKSPACE_ID=$(tao-client dino workspace-create \
    --name 'AWS Public' \
    --cloud_type 'aws' \
    --cloud_details '{
       "cloud_region": "us-west-1",
       "cloud_bucket_name": "bucket_name",
       "access_key": "access_key",
       "secret_key": "secret_key"
    }')
    echo $WORKSPACE_ID
    
  1. Creating the training dataset

    This returns a UUID representing the train dataset ID for other steps like dataset_convert and train. Examples include options for the private cloud dataset using workspace_id and public cloud dataset using url. The following are CLI argument parameters:

    • dataset_type - one of TAO’s supported dataset types

    • dataset_format - one of the format supported for the type chosen above

    • use_for - a list of use cases for the dataset. Choose from [“training”, “evaluation”, “testing”]

    For private cloud dataset:

    • workspace - the ID from Step 1.

    • cloud_file_path - the path to the dataset folder in cloud.

    For public cloud dataset:

    • url - the url of the dataset.

    TRAIN_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --url https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/data/object_detection_train/ --use_for '["training"]')
    TRAIN_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --workspace $WORKSPACE_ID --cloud_file_path <path to daataset folder in cloud> --use_for '["training"]')
    echo $TRAIN_DATASET_ID
    

To monitor the status of the train dataset download, run (pull) the get-metadata:

TRAIN_DATASET_PULL_STATUS=$(tao-client dino get-metadata --id $TRAIN_DATASET_ID --job_type dataset)
echo $TRAIN_DATASET_PULL_STATUS
  1. Creating the validation dataset

    This returns a UUID representing the eval dataset id for other steps like dataset_convert and evaluate. Examples include options for the private cloud dataset using workspace_id and public cloud dataset using url. The possible CLI arguments are the same as for the train dataset.

    EVAL_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --url https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/data/object_detection_val/ --use_for '["evaluation"]')
    EVAL_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --workspace $WORKSPACE_ID --cloud_file_path /data/tao_od_synthetic_subset_val_no_convert --use_for '["evaluation"]')
    echo $EVAL_DATASET_ID
    

To monitor the status of the validation dataset download, run (pull) the get-metadata:

EVAL_DATASET_PULL_STATUS=$(tao-client dino get-metadata --id $EVAL_DATASET_ID --job_type dataset)
echo $EVAL_DATASET_PULL_STATUS
  1. Finding a base experiment

    The following command lists the base experiments available for use. Pick one that corresponds to Dino and use it in step 5.

    BASE_EXP_RESPONSE=$(tao-client dino list-base-experiments --filter_params '{"network_arch": "dino"}')
    
    # Post Processing to convert bash output to a json string
    BASE_EXP_RESPONSE="${BASE_EXP_RESPONSE:1:-1}"
    BASE_EXP_RESPONSE=$(echo "$BASE_EXP_RESPONSE" | sed "s/'/\"/g")
    BASE_EXP_RESPONSE=$(echo "$BASE_EXP_RESPONSE" | sed -e "s/None/null/g" -e "s/True/true/g" -e "s/False/false/g")
    BASE_EXP_RESPONSE=$(echo "$BASE_EXP_RESPONSE" | sed 's/}, {/},\n{/g')
    BASE_EXP_RESPONSE="[$BASE_EXP_RESPONSE]"
    
    BASE_EXPERIMENT_ID=$(echo "$BASE_EXP_RESPONSE" | jq . | jq -r '[.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .id')
    echo $BASE_EXPERIMENT_ID
    
  1. Creating an experiment

    This returns a UUID representing the experiment id for all of the future steps like train, evaluate, and inference. CLI arguments for creating an experiment are:

    • network_arch - one of TAO’s supported network architectures

    • workspace - the ID of workspace

    • encryption_key - encryption key for loading the Base Experiment

    EXPERIMENT_ID=$(tao-client dino experiment-create --network_arch dino --encryption_key nvidia_tlt --workspace $WORKSPACE_ID)
    echo $EXPERIMENT_ID
    
  2. Assign datasets and a base experiment to the experiment

    UPDATE_INFO=$(cat <<EOF
    {
        "base_experiment": ["$BASE_EXPERIMENT_ID"],
        "train_datasets": ["$TRAIN_DATASET_ID"],
        "eval_dataset": "$EVAL_DATASET_ID",
        "inference_dataset": "$EVAL_DATASET_ID",
        "calibration_dataset": "$TRAIN_DATASET_ID"
    }
    EOF
    )
    EXPERIMENT_METADATA=$(tao-client dino patch-artifact-metadata --id $EXPERIMENT_ID --job_type experiment --update_info "$UPDATE_INFO")
    echo $EXPERIMENT_METADATA | jq
    

    Key-value pairs:

    • base_experiment: Base Experiment ID of the base_experiment from Step 3.

    • train_datasets: The train dataset IDs

    • eval_dataset: The eval dataset ID

    • inference_dataset: The test dataset ID

    • calibration_dataset: The train dataset ID

    • docker_env_vars: Key value pairs of MLOPs settings: wandbApiKey, clearMlWebHost, clearMlApiHost, clearMlFilesHost, clearMlApiAccessKey, clearMlApiSecretKey.

  3. Training an experiment

  1. Get specs:

    Returns a JSON loadable string of specs to be used in the train step.

    TRAIN_SPECS=$(tao-client dino get-spec --action train --job_type experiment --id $EXPERIMENT_ID)
    echo $TRAIN_SPECS | jq
    
  2. Modify specs:

    Modify the specs from the previous step, if necessary.

    TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_epochs=10')
    TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_gpus=2')
    echo $TRAIN_SPECS | jq
    
  3. Run the train action:

    CLI arguments for running the train action are:

    • actions - action to be executed

    • specs - spec dictionary of the action to be executed

    TRAIN_ID=$(tao-client dino experiment-run-action --action train --id $EXPERIMENT_ID --specs "$TRAIN_SPECS")
    echo $TRAIN_ID
    
  4. Check status of training Job:

    To monitor the status of training job, run (pull) the get-action-status:

    CLI arguments for geting action metadata are:

    • id - ID of the experiment

    • job - Job for which action metadata is to be retrieved

    • job_type - experiment

    tao-client dino get-action-status --job_type experiment --id $EXPERIMENT_ID --job $TRAIN_ID | jq
    
  1. Evaluating an experiment

  1. Get specs:

    Returns a JSON loadable string of specs to be used in the evaluate step.

    EVALUATE_SPECS=$(tao-client dino get-spec --action evaluate --job_type experiment --id $EXPERIMENT_ID)
    echo $EVALUATE_SPECS | jq
    
  2. Modify specs:

    Modify the specs from the previous step, if necessary.

  3. Run the evaluate action.

    CLI arguments for running the evaluate action are:

    • parent_job_id - ID of the parent job if any

    • actions - action to be executed

    • specs - spec dictionary of the action to be executed

    EVALUATE_ID=$(tao-client dino experiment-run-action --action evaluate --id $EXPERIMENT_ID --parent_job_id $TRAIN_ID --specs "$EVALUATE_SPECS")
    echo $EVALUATE_ID
    
  4. Check the status of the evaluate job:

    To monitor the status of the evaluation job, run (pull) the get-action-status:

    CLI arguments for geting action metadata:

    • id - ID of the experiment

    • job - Job for which action metadata is to be retrieved

    • job_type - experiment

    tao-client dino get-action-status --job_type experiment --id $EXPERIMENT_ID --job $EVALUATE_ID | jq
    
  1. Inference for an experiment

  1. Get specs:

    Returns a JSON loadable string of specs to be used in the inference step.

    INFERENCE_SPECS=$(tao-client dino get-spec --action inference --job_type experiment --id $EXPERIMENT_ID)
    echo $INFERENCE_SPECS | jq
    
  2. Modify specs:

    Modify the specs from the previous step, if necessary.

  3. Run the inference action:

    CLI arguments for running evaluate action:

    • parent_job_id - ID of the parent job if any

    • actions - action to be executed

    • specs - spec dictionary of the action to be executed

    INFERENCE_ID=$(tao-client dino experiment-run-action --action inference --id $EXPERIMENT_ID --parent_job_id $TRAIN_ID --specs "$INFERENCE_SPECS")
    echo $INFERENCE_ID
    
  4. Check the status of the inference job:

    To monitor the status of inference job, run (pull) the get-action-status:

    CLI arguments for geting action metadata:

    • id - ID of the experiment

    • job - Job for which action metadata is to be retrieved

    • job_type - experiment

    tao-client dino get-action-status --job_type experiment --id $EXPERIMENT_ID --job $INFERENCE_ID | jq