Remote Client

The remote client allows you to create experiments using the command line instead of relying on API calls.

Note

Datasets provided in these examples are subject to the following license Dataset License.

Installation

$ pip3 install -y nvidia-transfer-learning-client==5.3.1.dev0

CLI Specs

User authentication is based on the NGC API KEY and can be done with the following command:

BASE_URL=https://api-ea4.tao.ngc.nvidia.com/api/v1

NGC_API_KEY=zZYtczM5amdtdDcwNjk0cnA2bGU2bXQ3bnQ6NmQ4NjNhMDItMTdmZS00Y2QxLWI2ZjktNmE5M2YxZTc0OGyS

$ tao-client login --ngc-api-key $NGC_API_KEY --ngc-org-name ea-tlt

After authentication, the command line syntax is:

$ tao-client <network> <action> <args>

For example:

$ tao-client dino experiment-run-action --action train --id 042559ec-ab3e-438d-9c94-2cab38f76efc --specs '<json_loadable_specs_string_from_get_spec_action>'

Note

You can always use the --help argument to retrieve the command usage information.

To list supported networks:

$ tao-client -–help

To list supported Dino actions:

$ tao-client dino --help

Object Detection Use Case Example with CLI

  1. Creating the training dataset

    This returns a UUID representing the train dataset id for other steps like dataset_convert and train. CLI arguments parameters: * type - one of TAO’s supported dataset types * format - one of the format supported for the type chosen above * cloud_details - dictionary of required cloud storage values like bucket name, access credentials with write permissions, etc.

    TRAIN_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --cloud_details '{
        "cloud_type": "self_hosted",
        "cloud_file_type": "file",
        "cloud_specific_details": {
            "url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_train_coco.tar.gz"
            }
        }')
    echo $TRAIN_DATASET_ID
    
    # Cloud details example for AWS CSP
    #     "cloud_type": "aws",
    #     "cloud_specific_details": {
    #         "cloud_region": "us-west-1",
    #         "cloud_bucket_name": "bucket_name",
    #         "access_key": "access_key",
    #         "secret_key": "secret_key"
    #     }
    # }
    

To monitor the status of the train dataset download, run (pull) the get-metadata:

TRAIN_DATASET_PULL_STATUS=$(tao-client dino get-metadata --id $TRAIN_DATASET_ID --job_type dataset)
echo $TRAIN_DATASET_PULL_STATUS
  1. Creating the validation dataset

    This returns a UUID representing the eval dataset id for other steps like dataset_convert and evaluate. The possible CLI arguments are the same as for the train dataset.

    EVAL_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --cloud_details '{
        "cloud_type": "self_hosted",
        "cloud_file_type": "file",
        "cloud_specific_details": {
            "url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_val_coco.tar.gz"
            }
        }')
    echo $EVAL_DATASET_ID
    

To monitor the status of the validation dataset download, run (pull) the get-metadata:

EVAL_DATASET_PULL_STATUS=$(tao-client dino get-metadata --id $EVAL_DATASET_ID --job_type dataset)
echo $EVAL_DATASET_PULL_STATUS
  1. Finding a base experiment

    The following command lists the base experiments available for use. Pick one that corresponds to Dino and use it in step 5.

    BASE_EXP_RESPONSE=$(tao-client dino list-base-experiments --filter_params '{"network_arch": "dino"}')
    
    # Post Processing to convert bash output to a json string
    BASE_EXP_RESPONSE="${BASE_EXP_RESPONSE:1:-1}"
    BASE_EXP_RESPONSE=$(echo "$BASE_EXP_RESPONSE" | sed "s/'/\"/g")
    BASE_EXP_RESPONSE=$(echo "$BASE_EXP_RESPONSE" | sed -e "s/None/null/g" -e "s/True/true/g" -e "s/False/false/g")
    BASE_EXP_RESPONSE=$(echo "$BASE_EXP_RESPONSE" | sed 's/}, {/},\n{/g')
    BASE_EXP_RESPONSE="[$BASE_EXP_RESPONSE]"
    
    BASE_EXPERIMENT_ID=$(echo "$BASE_EXP_RESPONSE" | jq . | jq -r '[.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .id')
    echo $BASE_EXPERIMENT_ID
    
  2. Creating an experiment

    This returns a UUID representing the experiment id for all of the future steps like train, evaluate, and inference. CLI arguments for creating an experiment are:

    • network_arch - one of TAO’s supported network architectures

    • encryption_key - encryption key for loading the Base Experiment

    • cloud_type - aws/azure

    • cloud_details - dictionary of required cloud storage values like bucket name, access credentials with write permissions, etc.

    Note

    Modify the cloud_details keys based on your cloud bucket credentials.

    EXPERIMENT_ID=$(tao-client dino experiment-create --network_arch dino --encryption_key nvidia_tlt --cloud_details '{
        "cloud_type": "aws",
        "cloud_specific_details": {
            "cloud_region": "us-west-1",
            "cloud_bucket_name": "bucket_name",
            "access_key": "access_key",
            "secret_key": "secret_key"
            }
        }')
    echo $EXPERIMENT_ID
    
  3. Assign datasets and a base experiment to the experiment

    UPDATE_INFO=$(cat <<EOF
    {
        "base_experiment": ["$BASE_EXPERIMENT_ID"],
        "train_datasets": ["$TRAIN_DATASET_ID"],
        "eval_dataset": "$EVAL_DATASET_ID",
        "inference_dataset": "$EVAL_DATASET_ID",
        "calibration_dataset": "$TRAIN_DATASET_ID"
    }
    EOF
    )
    EXPERIMENT_METADATA=$(tao-client dino patch-artifact-metadata --id $EXPERIMENT_ID --job_type experiment --update_info "$UPDATE_INFO")
    echo $EXPERIMENT_METADATA | jq
    

    Key-value pairs:

    • base_experiment: Base Experiment ID of the base_experiment from Step 3.

    • train_datasets: The train dataset IDs

    • eval_dataset: The eval dataset ID

    • inference_dataset: The test dataset ID

    • calibration_dataset: The train dataset ID

    • docker_env_vars: Key value pairs of MLOPs settings: wandbApiKey, clearMlWebHost, clearMlApiHost, clearMlFilesHost, clearMlApiAccessKey, clearMlApiSecretKey.

  4. Training an experiment

  1. Get specs:

    Returns a JSON loadable string of specs to be used in the train step.

    TRAIN_SPECS=$(tao-client dino get-spec --action train --job_type experiment --id $EXPERIMENT_ID)
    echo $TRAIN_SPECS | jq
    
  2. Modify specs:

    Modify the specs from the previous step, if necessary.

    TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_epochs=10')
    TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_gpus=2')
    echo $TRAIN_SPECS | jq
    
  3. Run the train action:

    CLI arguments for running the train action are:

    • actions - action to be executed

    • specs - spec dictionary of the action to be executed

    TRAIN_ID=$(tao-client dino experiment-run-action --action train --id $EXPERIMENT_ID --specs "$TRAIN_SPECS")
    echo $TRAIN_ID
    
  4. Check status of training Job:

    To monitor the status of training job, run (pull) the get-action-status:

    CLI arguments for geting action metadata are:

    • id - ID of the experiment

    • job - Job for which action metadata is to be retrieved

    • job_type - experiment

    tao-client dino get-action-status --job_type experiment --id $EXPERIMENT_ID --job $TRAIN_ID | jq
    
  1. Evaluating an experiment

  1. Get specs:

    Returns a JSON loadable string of specs to be used in the evaluate step.

    EVALUATE_SPECS=$(tao-client dino get-spec --action evaluate --job_type experiment --id $EXPERIMENT_ID)
    echo $EVALUATE_SPECS | jq
    
  2. Modify specs:

    Modify the specs from the previous step, if necessary.

  3. Run the evaluate action.

    CLI arguments for running the evaluate action are:

    • parent_job_id - ID of the parent job if any

    • actions - action to be executed

    • specs - spec dictionary of the action to be executed

    EVALUATE_ID=$(tao-client dino experiment-run-action --action evaluate --id $EXPERIMENT_ID --parent_job_id $TRAIN_ID --specs "$EVALUATE_SPECS")
    echo $EVALUATE_ID
    
  4. Check the status of the evaluate job:

    To monitor the status of the evaluation job, run (pull) the get-action-status:

    CLI arguments for geting action metadata:

    • id - ID of the experiment

    • job - Job for which action metadata is to be retrieved

    • job_type - experiment

    tao-client dino get-action-status --job_type experiment --id $EXPERIMENT_ID --job $EVALUATE_ID | jq
    
  1. Inference for an experiment

  1. Get specs:

    Returns a JSON loadable string of specs to be used in the inference step.

    INFERENCE_SPECS=$(tao-client dino get-spec --action inference --job_type experiment --id $EXPERIMENT_ID)
    echo $INFERENCE_SPECS | jq
    
  2. Modify specs:

    Modify the specs from the previous step, if necessary.

  3. Run the inference action:

    CLI arguments for running evaluate action:

    • parent_job_id - ID of the parent job if any

    • actions - action to be executed

    • specs - spec dictionary of the action to be executed

    INFERENCE_ID=$(tao-client dino experiment-run-action --action inference --id $EXPERIMENT_ID --parent_job_id $TRAIN_ID --specs "$INFERENCE_SPECS")
    echo $INFERENCE_ID
    
  4. Check the status of the inference job:

    To monitor the status of inference job, run (pull) the get-action-status:

    CLI arguments for geting action metadata:

    • id - ID of the experiment

    • job - Job for which action metadata is to be retrieved

    • job_type - experiment

    tao-client dino get-action-status --job_type experiment --id $EXPERIMENT_ID --job $INFERENCE_ID | jq
    

AutoML

AutoML is a TAO Toolkit API service that automatically selects deep learning hyperparameters for a chosen model and dataset.

See the AutoML docs for more details.