TAO REST API

The TAO (Train, Adapt, Optimize) API exposes dataset and experiment endpoints for setting up and running actions.

Examples in this section are based on cURL commands and jq JSON data processing on a Linux machine with CURL and the jq tool pre-installed.

User Authentication

User authentication is based on NGC API KEY. For more details, see the API reference.

For example:

BASE_URL=https://api-ea4.tao.ngc.nvidia.com/api/v1

NGC_API_KEY=zZYtczM5amdtdDcwNjk0cnA2bGU2bXQ3bnQ6NmQ4NjNhMDItMTdmZS00Y2QxLWI2ZjktNmE5M2YxZTc0OGyS

CREDS=$(curl -s -X POST $BASE_URL/login -d '{"ngc_api_key": "'"$NGC_API_KEY"'"}')

TOKEN=$(echo $CREDS | jq -r '.token')

For example, an API call for listing datasets might be:

curl -s -X GET $BASE_URL/orgs/ea-tlt/datasets -H "Authorization: Bearer $TOKEN"

API Specs

The TAO API service includes methods for dealing with the content of experimental workspaces, such as user datasets and experiments. It also includes methods for executing TAO actions applicable to data and specifications stored in experimental workspaces.

Typically, you create a dataset for a specific network type, create an experiment that is pointing to this dataset, pick a base experiment, and customize specs before executing network-related actions.

`/api/v1/orgs/ea-tlt/datasets`	`/api/v1/orgs/ea-tlt/experiments`
List datasets for user	List experiments for a user
Retrieve a new dataset	Retrieve an experiment
Delete a dataset	Delete an experiment
Create a dataset	Create a new experiment
Update dataset metadata	Update experiment metadata
Update dataset metadata partially	Update experiment metadata partially
Dataset upload	Retrieve default action specs
Retrieve default action specs	Retrieve experiment action specs
Retrieve dataset action specs	Update current experiment specs
Update dataset action specs	Run experiment actions
Run dataset actions	List experiment actions
List dataset jobs	Retrieve experiment job
Retrieve dataset job	Early stop / Cancel experiment actions
Cancel dataset job	Delete an experiment job
Delete a dataset job	Download experiment action job
List files of a dataset job	List files of an experiment job
Download selective files of a dataset job	Download selective files of an experiment job
Download dataset action job	Resume training

“See the TAO API Reference for more details.”

Datasets

This example workflow uses the object detection data based on the COCO dataset format. For more details about the COCO format, refer to the COCO dataset page. If you are using a custom dataset, it must follow the dataset structure as depicted below.

$DATA_DIR
├── annotations.json
├── images
    ├── image_name_1.jpg
    ├── image_name_2.jpg
    ├── ...

Cloud Storage

With TAO hosted on EA4, datasets must be present in the following methods to pull the data and run actions on.

User provided cloud storage (AWS or Azure) - Private buckets only, if public, use the self_hosted cloud_type option (described in the 3rd point)
- AWS - cloud_type: aws; cloud_specific_details needed: access_key, secret_key, aws_region, s3_bucket_name, file_path_within_storage
- Azure - cloud_type: aws; cloud_specific_details needed: account_name, access_key, azure_region, azure_blob_name, file_path_within_storage
HuggingFace datasets - cloud_type: huggingface; cloud_specific_details needed: url, token(if private datasets)
Public dataset tar files distributed as HTTPS links - cloud_type: self_hosted; cloud_specific_details needed: url

For experiments, you must provide cloud storage with write access, which pushes the action artifacts, like train’s checkpoints, to the provided cloud storage. Datasets also require cloud storage with write access, as TAO may need to convert your dataset to a compatible format before training.

Object Detection Use Case Example with API

The following example walks you through a typical TAO use case.

Note

Datasets provided in these examples are subject to the following license Dataset License.

Creating the training dataset.
Request body parameters:
- name (optional) - Appropriate name for the dataset
- description (optional) - Description of the dataset
- type (mandatory) - One of TAO’s supported dataset types
- format (mandatory) - One of the formats supported for the type chosen above
- cloud_details (mandatory) - Dictionary of required cloud storage values, like bucket name or access credentials with write permissions
Note

Modify the cloud_details keys based on your cloud bucket credentials.

TRAIN_DATASET_ID=$(curl -s -X POST \
 $BASE_URL/orgs/ea-tlt/datasets \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer $TOKEN" \
 -d '{
   "type": "object_detection",
   "format": "coco",
   "cloud_details": {
       "cloud_type": "self_hosted",
       "cloud_file_type": "file",
       "cloud_specific_details": {
           "url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_train_coco.tar.gz"
       }
   }
}' | jq -r '.id')
echo $TRAIN_DATASET_ID

# Cloud details example for AWS CSP (Cloud storage provider)
# "cloud_details": {
#     "cloud_type": "aws",
#     "cloud_file_type": "file",
#     "cloud_specific_details": {
#         "cloud_file_path": "file_name.tar.gz",
#         "cloud_region": "us-west-1",
#         "cloud_bucket_name": "bucket_name",
#         "access_key": "access_key",
#         "secret_key": "secret_key"
#     }
# }

# Cloud details example for Azure CSP
# "cloud_details": {
#     "cloud_type": "azure",
#     "cloud_file_type": "file",
#     "cloud_specific_details": {
#         "cloud_file_path": "file_name.tar.gz",
#         "account_name": "account_name",
#         "access_key": "access_key",
#         "cloud_bucket_name": "container_name",
#     }
# }

# Cloud details example for HuggingFace CSP
# "cloud_details": {
#     "cloud_type": "huggingface",
#     "cloud_specific_details": {
#         "token": "access_token",
#         "url": "https://huggingface.co/datasets/<huggingface_username>/<huggingface_dataset_name>",
#     }
# }

Note

For a public Hugging Face dataset, use self_hosted as cloud_type and provide the HTTPS URL.

To monitor the status of the train dataset download, run (pull) the cURL request:

TRAIN_DATASET_PULL_STATUS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/datasets/$TRAIN_DATASET_ID -H "Authorization: Bearer $TOKEN" | jq -r '.status')
echo $TRAIN_DATASET_PULL_STATUS

Creating the validation dataset.

Prossible request body parameters are the same as for the train dataset.

Note

Modify the cloud_details keys based on your cloud bucket credentials.

EVAL_DATASET_ID=$(curl -s -X POST \
 $BASE_URL/orgs/ea-tlt/datasets \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer $TOKEN" \
 -d '{
   "type": "object_detection",
   "format": "coco",
   "cloud_details": {
       "cloud_type": "self_hosted",
       "cloud_file_type": "file",
       "cloud_specific_details": {
           "url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_val_coco.tar.gz"
       }
   }
}' | jq -r '.id')
echo $EVAL_DATASET_ID

To monitor the status of validation dataset download, run (pUll) the cURL request:

EVAL_DATASET_PULL_STATUS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/datasets/$EVAL_DATASET_ID -H "Authorization: Bearer $TOKEN" | jq -r '.status')
echo $EVAL_DATASET_PULL_STATUS

Find the base experiment.

BASE_EXPERIMENT_ID=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments:base -H "Authorization: Bearer $TOKEN" | jq -r '[.experiments.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .id')
echo $BASE_EXPERIMENT_ID

Response from this endpoint also provides the compatible dataset type and lists of compatible dataset formats.

DATASET_TYPE=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments:base -H "Authorization: Bearer $TOKEN" | jq -r '[.experiments.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .dataset_type')
echo $DATASET_TYPE

DATASET_FORMATS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments:base -H "Authorization: Bearer $TOKEN" | jq -r '[.experiments.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .dataset_formats')
echo $DATASET_FORMATS

Creating an experiment.
Request body parameters for creating an experiment are:
- network_arch - one of TAO’s supported network architectures
- encryption_key - encryption key for loading the base experiment
- checkpoint_choose_method - best_model/latest_model/from_epoch_number
- train_datasets - list of train dataset id’s where each ID is obtained during the creation of respective train datasets
- eval_dataset - dataset id obtained during creation of the eval dataset
- inference_dataset - dataset id obtained during creation of the test dataset
- calibration_dataset - dataset id obtained during the creation of the train dataset (not a list)
- docker_env_vars - dictionary of Docker environment variables pertaining to MLOPs, like WANDB and CLEARML
- base_experiment - list of base experiment id’s that can be obtained from the ‘Find Base Experiment’ step
- cloud_details - dictionary of required cloud storage values, like bucket name or access credentials with write permissions
Note

Modify the cloud_details keys based on your cloud bucket credentials.

EXPERIMENT_ID=$(curl -s -X POST \
 $BASE_URL/orgs/ea-tlt/experiments \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer $TOKEN" \
 -d '{
   "network_arch": "dino",
   "encryption_key": "tlt_encode",
   "checkpoint_choose_method": "best_model",
   "train_datasets": ["'"$TRAIN_DATASET_ID"'"],
   "eval_dataset": "'"$EVAL_DATASET_ID"'",
   "inference_dataset": "'"$EVAL_DATASET_ID"'",
   "calibration_dataset": "'"$TRAIN_DATASET_ID"'",
   "docker_env_vars": {},
   "base_experiment": ["'"$BASE_EXPERIMENT_ID"'"],
   "cloud_details": {
       "cloud_type": "aws",
       "cloud_specific_details": {
           "cloud_region": "us-west-1",
           "cloud_bucket_name": "bucket_name",
           "access_key": "access_key",
           "secret_key": "secret_key"
       }
   }
}' | jq -r '.id')
echo $EXPERIMENT_ID

# Cloud details example for Azure CSP
# "cloud_details": {
#     "cloud_type": "azure",
#     "cloud_specific_details": {
#         "account_name": "account_name",
#         "access_key": "access_key",
#         "cloud_bucket_name": "optional_container_name",
#     }
# }

Note

Only AWS and Azure are supported for creating experiments and storing training artifacts, including checkpoints and logs.

Train the Dino model.
Request body parmeters for running the train action are:
- parent_job_id - ID of the parent job, if any
- actions - action to be executed
- specs - spec dictionary of the action to be executed

TRAIN_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/train/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default')
echo $TRAIN_SPECS | jq

TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_epochs=10')
TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_gpus=2')
echo $TRAIN_SPECS | jq

TRAIN_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"train\", \"specs\": $TRAIN_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)

Check the status of the training Job.

You can wait for the training job to complete before proceeding to other actions. To monitor the status of train job, run (pull) the cURL request:

curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$TRAIN_ID -H "Authorization: Bearer $TOKEN" | jq

Evaluating a trained model.
Request body parmeters for running the evaluate action are:
- parent_job_id - ID of the parent job, if any
- actions - action to be executed
- specs - spec dictionary of the action to be executed

EVALUATE_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/evaluate/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default')
echo $EVALUATE_SPECS | jq

EVALUATE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"evaluate\", \"parent_job_id\": \"$TRAIN_ID\", \"specs\": $EVALUATE_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)

Check the status of the evaluation job.

To monitor the status of an evaluation job, run (pull) the cURL request:

curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$EVALUATE_ID -H "Authorization: Bearer $TOKEN" | jq

. Run inference on a trained model.

Request body parmeters for running the inference action are:

parent_job_id - ID of the parent job, if any

actions - action to be executed

specs - spec dictionary of the action to be executed

INFERENCE_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/inference/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default')
echo $INFERENCE_SPECS | jq

INFERENCE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"inference\", \"parent_job_id\": \"$TRAIN_ID\", \"specs\": $INFERENCE_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)

Check the status of the inference job.

To monitor the status of the inference job, run (pull) the cURL request:

curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$INFERENCE_ID -H "Authorization: Bearer $TOKEN" | jq

AutoML

AutoML is a TAO Toolkit API service that automatically selects deep learning hyperparameters for a chosen model and dataset.

See the AutoML docs for more details.