Remote Client

The remote client allows you to create experiments using the command line instead of relying on API calls.

Note

Datasets provided in these examples are subject to the following license Dataset License.

Installation

$ pip3 install -y nvidia-transfer-learning-client==5.3.1.dev0

CLI Specs

User authentication is based on the NGC API KEY and can be done with the following command:

BASE_URL=https://api-ea4.tao.ngc.nvidia.com/api/v1

NGC_API_KEY=zZYtczM5amdtdDcwNjk0cnA2bGU2bXQ3bnQ6NmQ4NjNhMDItMTdmZS00Y2QxLWI2ZjktNmE5M2YxZTc0OGyS

$ tao-client login --ngc-api-key $NGC_API_KEY --ngc-org-name ea-tlt

After authentication, the command line syntax is:

$ tao-client <network> <action> <args>

For example:

$ tao-client dino experiment-run-action --action train --id 042559ec-ab3e-438d-9c94-2cab38f76efc --specs '<json_loadable_specs_string_from_get_spec_action>'

Note

You can always use the --help argument to retrieve the command usage information.

To list supported networks:

$ tao-client -–help

To list supported Dino actions:

$ tao-client dino --help

Object Detection Use Case Example with CLI

Creating the training dataset

This returns a UUID representing the train dataset id for other steps like dataset_convert and train. CLI arguments parameters: * type - one of TAO’s supported dataset types * format - one of the format supported for the type chosen above * cloud_details - dictionary of required cloud storage values like bucket name, access credentials with write permissions, etc.

TRAIN_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --cloud_details '{
    "cloud_type": "self_hosted",
    "cloud_file_type": "file",
    "cloud_specific_details": {
        "url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_train_coco.tar.gz"
        }
    }')
echo $TRAIN_DATASET_ID

# Cloud details example for AWS CSP
#     "cloud_type": "aws",
#     "cloud_specific_details": {
#         "cloud_region": "us-west-1",
#         "cloud_bucket_name": "bucket_name",
#         "access_key": "access_key",
#         "secret_key": "secret_key"
#     }
# }

To monitor the status of the train dataset download, run (pull) the get-metadata:

TRAIN_DATASET_PULL_STATUS=$(tao-client dino get-metadata --id $TRAIN_DATASET_ID --job_type dataset)
echo $TRAIN_DATASET_PULL_STATUS

Creating the validation dataset

This returns a UUID representing the eval dataset id for other steps like dataset_convert and evaluate. The possible CLI arguments are the same as for the train dataset.

EVAL_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --cloud_details '{
    "cloud_type": "self_hosted",
    "cloud_file_type": "file",
    "cloud_specific_details": {
        "url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_val_coco.tar.gz"
        }
    }')
echo $EVAL_DATASET_ID

To monitor the status of the validation dataset download, run (pull) the get-metadata:

EVAL_DATASET_PULL_STATUS=$(tao-client dino get-metadata --id $EVAL_DATASET_ID --job_type dataset)
echo $EVAL_DATASET_PULL_STATUS

Finding a base experiment

The following command lists the base experiments available for use. Pick one that corresponds to Dino and use it in step 5.

BASE_EXP_RESPONSE=$(tao-client dino list-base-experiments --filter_params '{"network_arch": "dino"}')

# Post Processing to convert bash output to a json string
BASE_EXP_RESPONSE="${BASE_EXP_RESPONSE:1:-1}"
BASE_EXP_RESPONSE=$(echo "$BASE_EXP_RESPONSE" | sed "s/'/\"/g")
BASE_EXP_RESPONSE=$(echo "$BASE_EXP_RESPONSE" | sed -e "s/None/null/g" -e "s/True/true/g" -e "s/False/false/g")
BASE_EXP_RESPONSE=$(echo "$BASE_EXP_RESPONSE" | sed 's/}, {/},\n{/g')
BASE_EXP_RESPONSE="[$BASE_EXP_RESPONSE]"

BASE_EXPERIMENT_ID=$(echo "$BASE_EXP_RESPONSE" | jq . | jq -r '[.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .id')
echo $BASE_EXPERIMENT_ID

Creating an experiment

This returns a UUID representing the experiment id for all of the future steps like train, evaluate, and inference. CLI arguments for creating an experiment are:
- network_arch - one of TAO’s supported network architectures
- encryption_key - encryption key for loading the Base Experiment
- cloud_type - aws/azure
- cloud_details - dictionary of required cloud storage values like bucket name, access credentials with write permissions, etc.
Note

Modify the cloud_details keys based on your cloud bucket credentials.
```
EXPERIMENT_ID=$(tao-client dino experiment-create --network_arch dino --encryption_key nvidia_tlt --cloud_details '{
    "cloud_type": "aws",
    "cloud_specific_details": {
        "cloud_region": "us-west-1",
        "cloud_bucket_name": "bucket_name",
        "access_key": "access_key",
        "secret_key": "secret_key"
        }
    }')
echo $EXPERIMENT_ID
```

Assign datasets and a base experiment to the experiment

UPDATE_INFO=$(cat <<EOF
{
    "base_experiment": ["$BASE_EXPERIMENT_ID"],
    "train_datasets": ["$TRAIN_DATASET_ID"],
    "eval_dataset": "$EVAL_DATASET_ID",
    "inference_dataset": "$EVAL_DATASET_ID",
    "calibration_dataset": "$TRAIN_DATASET_ID"
}
EOF
)
EXPERIMENT_METADATA=$(tao-client dino patch-artifact-metadata --id $EXPERIMENT_ID --job_type experiment --update_info "$UPDATE_INFO")
echo $EXPERIMENT_METADATA | jq

Key-value pairs:

base_experiment: Base Experiment ID of the base_experiment from Step 3.
train_datasets: The train dataset IDs
eval_dataset: The eval dataset ID
inference_dataset: The test dataset ID
calibration_dataset: The train dataset ID
docker_env_vars: Key value pairs of MLOPs settings: wandbApiKey, clearMlWebHost, clearMlApiHost, clearMlFilesHost, clearMlApiAccessKey, clearMlApiSecretKey.

Training an experiment

Get specs:

Returns a JSON loadable string of specs to be used in the train step.
TRAIN_SPECS=$(tao-client dino get-spec --action train --job_type experiment --id $EXPERIMENT_ID)
echo $TRAIN_SPECS | jq
Modify specs:

Modify the specs from the previous step, if necessary.
TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_epochs=10')
TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_gpus=2')
echo $TRAIN_SPECS | jq
Run the train action:

CLI arguments for running the train action are:

actions - action to be executed

specs - spec dictionary of the action to be executed
TRAIN_ID=$(tao-client dino experiment-run-action --action train --id $EXPERIMENT_ID --specs "$TRAIN_SPECS")
echo $TRAIN_ID
Check status of training Job:

To monitor the status of training job, run (pull) the get-action-status:

CLI arguments for geting action metadata are:

id - ID of the experiment

job - Job for which action metadata is to be retrieved

job_type - experiment
tao-client dino get-action-status --job_type experiment --id $EXPERIMENT_ID --job $TRAIN_ID | jq

Evaluating an experiment

Get specs:

Returns a JSON loadable string of specs to be used in the evaluate step.
EVALUATE_SPECS=$(tao-client dino get-spec --action evaluate --job_type experiment --id $EXPERIMENT_ID)
echo $EVALUATE_SPECS | jq
Modify specs:

Modify the specs from the previous step, if necessary.
Run the evaluate action.

CLI arguments for running the evaluate action are:

parent_job_id - ID of the parent job if any

actions - action to be executed

specs - spec dictionary of the action to be executed
EVALUATE_ID=$(tao-client dino experiment-run-action --action evaluate --id $EXPERIMENT_ID --parent_job_id $TRAIN_ID --specs "$EVALUATE_SPECS")
echo $EVALUATE_ID
Check the status of the evaluate job:

To monitor the status of the evaluation job, run (pull) the get-action-status:

CLI arguments for geting action metadata:

id - ID of the experiment

job - Job for which action metadata is to be retrieved

job_type - experiment
tao-client dino get-action-status --job_type experiment --id $EXPERIMENT_ID --job $EVALUATE_ID | jq

Inference for an experiment

Get specs:

Returns a JSON loadable string of specs to be used in the inference step.
INFERENCE_SPECS=$(tao-client dino get-spec --action inference --job_type experiment --id $EXPERIMENT_ID)
echo $INFERENCE_SPECS | jq
Modify specs:

Modify the specs from the previous step, if necessary.
Run the inference action:

CLI arguments for running evaluate action:

parent_job_id - ID of the parent job if any

actions - action to be executed

specs - spec dictionary of the action to be executed
INFERENCE_ID=$(tao-client dino experiment-run-action --action inference --id $EXPERIMENT_ID --parent_job_id $TRAIN_ID --specs "$INFERENCE_SPECS")
echo $INFERENCE_ID
Check the status of the inference job:

To monitor the status of inference job, run (pull) the get-action-status:

CLI arguments for geting action metadata:

id - ID of the experiment

job - Job for which action metadata is to be retrieved

job_type - experiment
tao-client dino get-action-status --job_type experiment --id $EXPERIMENT_ID --job $INFERENCE_ID | jq

AutoML

AutoML is a TAO Toolkit API service that automatically selects deep learning hyperparameters for a chosen model and dataset.

See the AutoML docs for more details.