TAO REST API

The TAO (Train, Adapt, Optimize) API exposes dataset and experiment endpoints for setting up and running actions.

Examples in this section are based on cURL commands and jq JSON data processing on a Linux machine with CURL and the jq tool pre-installed.

User Authentication

User authentication is based on NGC API KEY. For more details, see the API reference.

For example:

BASE_URL=https://api-ea4.tao.ngc.nvidia.com/api/v1

NGC_API_KEY=zZYtczM5amdtdDcwNjk0cnA2bGU2bXQ3bnQ6NmQ4NjNhMDItMTdmZS00Y2QxLWI2ZjktNmE5M2YxZTc0OGyS

CREDS=$(curl -s -X POST $BASE_URL/login -d '{"ngc_api_key": "'"$NGC_API_KEY"'"}')

TOKEN=$(echo $CREDS | jq -r '.token')

For example, an API call for listing datasets might be:

curl -s -X GET $BASE_URL/orgs/ea-tlt/datasets -H "Authorization: Bearer $TOKEN"

API Specs

The TAO API service includes methods for dealing with the content of experimental workspaces, such as user datasets and experiments. It also includes methods for executing TAO actions applicable to data and specifications stored in experimental workspaces.

Typically, you create a dataset for a specific network type, create an experiment that is pointing to this dataset, pick a base experiment, and customize specs before executing network-related actions.

/api/v1/orgs/ea-tlt/datasets

/api/v1/orgs/ea-tlt/experiments

  • List datasets for user

  • List experiments for a user

  • Retrieve a new dataset

  • Retrieve an experiment

  • Delete a dataset

  • Delete an experiment

  • Create a dataset

  • Create a new experiment

  • Update dataset metadata

  • Update experiment metadata

  • Update dataset metadata partially

  • Update experiment metadata partially

  • Dataset upload

  • Retrieve default action specs

  • Retrieve default action specs

  • Retrieve experiment action specs

  • Retrieve dataset action specs

  • Update current experiment specs

  • Update dataset action specs

  • Run experiment actions

  • Run dataset actions

  • List experiment actions

  • List dataset jobs

  • Retrieve experiment job

  • Retrieve dataset job

  • Early stop / Cancel experiment actions

  • Cancel dataset job

  • Delete an experiment job

  • Delete a dataset job

  • Download experiment action job

  • List files of a dataset job

  • List files of an experiment job

  • Download selective files of a dataset job

  • Download selective files of an experiment job

  • Download dataset action job

  • Resume training

“See the TAO API Reference for more details.”

Datasets

This example workflow uses the object detection data based on the COCO dataset format. For more details about the COCO format, refer to the COCO dataset page. If you are using a custom dataset, it must follow the dataset structure as depicted below.

$DATA_DIR
├── annotations.json
├── images
    ├── image_name_1.jpg
    ├── image_name_2.jpg
    ├── ...

Cloud Storage

With TAO hosted on EA4, datasets must be present in the following methods to pull the data and run actions on.

  • User provided cloud storage (AWS or Azure) - Private buckets only, if public, use the self_hosted cloud_type option (described in the 3rd point)

    • AWS - cloud_type: aws; cloud_specific_details needed: access_key, secret_key, aws_region, s3_bucket_name, file_path_within_storage

    • Azure - cloud_type: aws; cloud_specific_details needed: account_name, access_key, azure_region, azure_blob_name, file_path_within_storage

  • HuggingFace datasets - cloud_type: huggingface; cloud_specific_details needed: url, token(if private datasets)

  • Public dataset tar files distributed as HTTPS links - cloud_type: self_hosted; cloud_specific_details needed: url

For experiments, you must provide cloud storage with write access, which pushes the action artifacts, like train’s checkpoints, to the provided cloud storage. Datasets also require cloud storage with write access, as TAO may need to convert your dataset to a compatible format before training.

Object Detection Use Case Example with API

The following example walks you through a typical TAO use case.

Note

Datasets provided in these examples are subject to the following license Dataset License.

  1. Creating the training dataset.

    Request body parameters:

    • name (optional) - Appropriate name for the dataset

    • description (optional) - Description of the dataset

    • type (mandatory) - One of TAO’s supported dataset types

    • format (mandatory) - One of the formats supported for the type chosen above

    • cloud_details (mandatory) - Dictionary of required cloud storage values, like bucket name or access credentials with write permissions

    Note

    Modify the cloud_details keys based on your cloud bucket credentials.

TRAIN_DATASET_ID=$(curl -s -X POST \
 $BASE_URL/orgs/ea-tlt/datasets \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer $TOKEN" \
 -d '{
   "type": "object_detection",
   "format": "coco",
   "cloud_details": {
       "cloud_type": "self_hosted",
       "cloud_file_type": "file",
       "cloud_specific_details": {
           "url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_train_coco.tar.gz"
       }
   }
}' | jq -r '.id')
echo $TRAIN_DATASET_ID

# Cloud details example for AWS CSP (Cloud storage provider)
# "cloud_details": {
#     "cloud_type": "aws",
#     "cloud_file_type": "file",
#     "cloud_specific_details": {
#         "cloud_file_path": "file_name.tar.gz",
#         "cloud_region": "us-west-1",
#         "cloud_bucket_name": "bucket_name",
#         "access_key": "access_key",
#         "secret_key": "secret_key"
#     }
# }

# Cloud details example for Azure CSP
# "cloud_details": {
#     "cloud_type": "azure",
#     "cloud_file_type": "file",
#     "cloud_specific_details": {
#         "cloud_file_path": "file_name.tar.gz",
#         "account_name": "account_name",
#         "access_key": "access_key",
#         "cloud_bucket_name": "container_name",
#     }
# }

# Cloud details example for HuggingFace CSP
# "cloud_details": {
#     "cloud_type": "huggingface",
#     "cloud_specific_details": {
#         "token": "access_token",
#         "url": "https://huggingface.co/datasets/<huggingface_username>/<huggingface_dataset_name>",
#     }
# }

Note

For a public Hugging Face dataset, use self_hosted as cloud_type and provide the HTTPS URL.

To monitor the status of the train dataset download, run (pull) the cURL request:

TRAIN_DATASET_PULL_STATUS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/datasets/$TRAIN_DATASET_ID -H "Authorization: Bearer $TOKEN" | jq -r '.status')
echo $TRAIN_DATASET_PULL_STATUS
  1. Creating the validation dataset.

    Prossible request body parameters are the same as for the train dataset.

    Note

    Modify the cloud_details keys based on your cloud bucket credentials.

EVAL_DATASET_ID=$(curl -s -X POST \
 $BASE_URL/orgs/ea-tlt/datasets \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer $TOKEN" \
 -d '{
   "type": "object_detection",
   "format": "coco",
   "cloud_details": {
       "cloud_type": "self_hosted",
       "cloud_file_type": "file",
       "cloud_specific_details": {
           "url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_val_coco.tar.gz"
       }
   }
}' | jq -r '.id')
echo $EVAL_DATASET_ID

To monitor the status of validation dataset download, run (pUll) the cURL request:

EVAL_DATASET_PULL_STATUS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/datasets/$EVAL_DATASET_ID -H "Authorization: Bearer $TOKEN" | jq -r '.status')
echo $EVAL_DATASET_PULL_STATUS
  1. Find the base experiment.

BASE_EXPERIMENT_ID=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments:base -H "Authorization: Bearer $TOKEN" | jq -r '[.experiments.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .id')
echo $BASE_EXPERIMENT_ID

Response from this endpoint also provides the compatible dataset type and lists of compatible dataset formats.

DATASET_TYPE=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments:base -H "Authorization: Bearer $TOKEN" | jq -r '[.experiments.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .dataset_type')
echo $DATASET_TYPE

DATASET_FORMATS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments:base -H "Authorization: Bearer $TOKEN" | jq -r '[.experiments.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .dataset_formats')
echo $DATASET_FORMATS
  1. Creating an experiment.

    Request body parameters for creating an experiment are:

    • network_arch - one of TAO’s supported network architectures

    • encryption_key - encryption key for loading the base experiment

    • checkpoint_choose_method - best_model/latest_model/from_epoch_number

    • train_datasets - list of train dataset id’s where each ID is obtained during the creation of respective train datasets

    • eval_dataset - dataset id obtained during creation of the eval dataset

    • inference_dataset - dataset id obtained during creation of the test dataset

    • calibration_dataset - dataset id obtained during the creation of the train dataset (not a list)

    • docker_env_vars - dictionary of Docker environment variables pertaining to MLOPs, like WANDB and CLEARML

    • base_experiment - list of base experiment id’s that can be obtained from the ‘Find Base Experiment’ step

    • cloud_details - dictionary of required cloud storage values, like bucket name or access credentials with write permissions

    Note

    Modify the cloud_details keys based on your cloud bucket credentials.

EXPERIMENT_ID=$(curl -s -X POST \
 $BASE_URL/orgs/ea-tlt/experiments \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer $TOKEN" \
 -d '{
   "network_arch": "dino",
   "encryption_key": "tlt_encode",
   "checkpoint_choose_method": "best_model",
   "train_datasets": ["'"$TRAIN_DATASET_ID"'"],
   "eval_dataset": "'"$EVAL_DATASET_ID"'",
   "inference_dataset": "'"$EVAL_DATASET_ID"'",
   "calibration_dataset": "'"$TRAIN_DATASET_ID"'",
   "docker_env_vars": {},
   "base_experiment": ["'"$BASE_EXPERIMENT_ID"'"],
   "cloud_details": {
       "cloud_type": "aws",
       "cloud_specific_details": {
           "cloud_region": "us-west-1",
           "cloud_bucket_name": "bucket_name",
           "access_key": "access_key",
           "secret_key": "secret_key"
       }
   }
}' | jq -r '.id')
echo $EXPERIMENT_ID

# Cloud details example for Azure CSP
# "cloud_details": {
#     "cloud_type": "azure",
#     "cloud_specific_details": {
#         "account_name": "account_name",
#         "access_key": "access_key",
#         "cloud_bucket_name": "optional_container_name",
#     }
# }

Note

Only AWS and Azure are supported for creating experiments and storing training artifacts, including checkpoints and logs.

  1. Train the Dino model.

    Request body parmeters for running the train action are:

    • parent_job_id - ID of the parent job, if any

    • actions - action to be executed

    • specs - spec dictionary of the action to be executed

TRAIN_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/train/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default')
echo $TRAIN_SPECS | jq

TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_epochs=10')
TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_gpus=2')
echo $TRAIN_SPECS | jq

TRAIN_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"train\", \"specs\": $TRAIN_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
  1. Check the status of the training Job.

You can wait for the training job to complete before proceeding to other actions. To monitor the status of train job, run (pull) the cURL request:

curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$TRAIN_ID -H "Authorization: Bearer $TOKEN" | jq
  1. Evaluating a trained model.

    Request body parmeters for running the evaluate action are:

    • parent_job_id - ID of the parent job, if any

    • actions - action to be executed

    • specs - spec dictionary of the action to be executed

EVALUATE_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/evaluate/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default')
echo $EVALUATE_SPECS | jq

EVALUATE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"evaluate\", \"parent_job_id\": \"$TRAIN_ID\", \"specs\": $EVALUATE_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
  1. Check the status of the evaluation job.

To monitor the status of an evaluation job, run (pull) the cURL request:

curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$EVALUATE_ID -H "Authorization: Bearer $TOKEN" | jq

. Run inference on a trained model.

Request body parmeters for running the inference action are:

  • parent_job_id - ID of the parent job, if any

  • actions - action to be executed

  • specs - spec dictionary of the action to be executed

INFERENCE_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/inference/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default')
echo $INFERENCE_SPECS | jq

INFERENCE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"inference\", \"parent_job_id\": \"$TRAIN_ID\", \"specs\": $INFERENCE_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
  1. Check the status of the inference job.

To monitor the status of the inference job, run (pull) the cURL request:

curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$INFERENCE_ID -H "Authorization: Bearer $TOKEN" | jq

AutoML

AutoML is a TAO Toolkit API service that automatically selects deep learning hyperparameters for a chosen model and dataset.

See the AutoML docs for more details.