TAO REST API
The TAO (Train, Adapt, Optimize) API exposes dataset and experiment endpoints for setting up and running actions.
Examples in this section are based on cURL commands and jq JSON data processing on a Linux machine with CURL and the jq tool pre-installed.
User Authentication
User authentication is based on NGC API KEY. For more details, see the API reference.
For example:
BASE_URL=https://api-ea4.tao.ngc.nvidia.com/api/v1
NGC_API_KEY=zZYtczM5amdtdDcwNjk0cnA2bGU2bXQ3bnQ6NmQ4NjNhMDItMTdmZS00Y2QxLWI2ZjktNmE5M2YxZTc0OGyS
CREDS=$(curl -s -X POST $BASE_URL/login -d '{"ngc_api_key": "'"$NGC_API_KEY"'"}')
TOKEN=$(echo $CREDS | jq -r '.token')
For example, an API call for listing datasets might be:
curl -s -X GET $BASE_URL/orgs/ea-tlt/datasets -H "Authorization: Bearer $TOKEN"
API Specs
The TAO API service includes methods for dealing with the content of experimental workspaces, such as user datasets and experiments. It also includes methods for executing TAO actions applicable to data and specifications stored in experimental workspaces.
Typically, you create a dataset for a specific network type, create an experiment that is pointing to this dataset, pick a base experiment, and customize specs before executing network-related actions.
|
|
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
“See the TAO API Reference for more details.”
Datasets
This example workflow uses the object detection data based on the COCO dataset format. For more details about the COCO format, refer to the COCO dataset page. If you are using a custom dataset, it must follow the dataset structure as depicted below.
$DATA_DIR
├── annotations.json
├── images
├── image_name_1.jpg
├── image_name_2.jpg
├── ...
Cloud Storage
With TAO hosted on EA4, datasets must be present in the following methods to pull the data and run actions on.
User provided cloud storage (AWS or Azure) - Private buckets only, if public, use the self_hosted cloud_type option (described in the 3rd point)
AWS -
cloud_type
: aws;cloud_specific_details
needed: access_key, secret_key, aws_region, s3_bucket_name, file_path_within_storageAzure -
cloud_type
: aws;cloud_specific_details
needed: account_name, access_key, azure_region, azure_blob_name, file_path_within_storage
HuggingFace datasets -
cloud_type
: huggingface;cloud_specific_details
needed: url, token(if private datasets)Public dataset tar files distributed as HTTPS links -
cloud_type
: self_hosted;cloud_specific_details
needed: url
For experiments, you must provide cloud storage with write access, which pushes the action artifacts, like train’s checkpoints, to the provided cloud storage. Datasets also require cloud storage with write access, as TAO may need to convert your dataset to a compatible format before training.
Object Detection Use Case Example with API
The following example walks you through a typical TAO use case.
Note
Datasets provided in these examples are subject to the following license Dataset License.
Creating the training dataset.
Request body parameters:
name
(optional) - Appropriate name for the datasetdescription
(optional) - Description of the datasettype
(mandatory) - One of TAO’s supported dataset typesformat
(mandatory) - One of the formats supported for the type chosen abovecloud_details
(mandatory) - Dictionary of required cloud storage values, like bucket name or access credentials with write permissions
Note
Modify the cloud_details keys based on your cloud bucket credentials.
TRAIN_DATASET_ID=$(curl -s -X POST \
$BASE_URL/orgs/ea-tlt/datasets \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"type": "object_detection",
"format": "coco",
"cloud_details": {
"cloud_type": "self_hosted",
"cloud_file_type": "file",
"cloud_specific_details": {
"url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_train_coco.tar.gz"
}
}
}' | jq -r '.id')
echo $TRAIN_DATASET_ID
# Cloud details example for AWS CSP (Cloud storage provider)
# "cloud_details": {
# "cloud_type": "aws",
# "cloud_file_type": "file",
# "cloud_specific_details": {
# "cloud_file_path": "file_name.tar.gz",
# "cloud_region": "us-west-1",
# "cloud_bucket_name": "bucket_name",
# "access_key": "access_key",
# "secret_key": "secret_key"
# }
# }
# Cloud details example for Azure CSP
# "cloud_details": {
# "cloud_type": "azure",
# "cloud_file_type": "file",
# "cloud_specific_details": {
# "cloud_file_path": "file_name.tar.gz",
# "account_name": "account_name",
# "access_key": "access_key",
# "cloud_bucket_name": "container_name",
# }
# }
# Cloud details example for HuggingFace CSP
# "cloud_details": {
# "cloud_type": "huggingface",
# "cloud_specific_details": {
# "token": "access_token",
# "url": "https://huggingface.co/datasets/<huggingface_username>/<huggingface_dataset_name>",
# }
# }
Note
For a public Hugging Face dataset, use self_hosted as cloud_type and provide the HTTPS URL.
To monitor the status of the train dataset download, run (pull) the cURL request:
TRAIN_DATASET_PULL_STATUS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/datasets/$TRAIN_DATASET_ID -H "Authorization: Bearer $TOKEN" | jq -r '.status')
echo $TRAIN_DATASET_PULL_STATUS
Creating the validation dataset.
Prossible request body parameters are the same as for the train dataset.
Note
Modify the cloud_details keys based on your cloud bucket credentials.
EVAL_DATASET_ID=$(curl -s -X POST \
$BASE_URL/orgs/ea-tlt/datasets \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"type": "object_detection",
"format": "coco",
"cloud_details": {
"cloud_type": "self_hosted",
"cloud_file_type": "file",
"cloud_specific_details": {
"url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_val_coco.tar.gz"
}
}
}' | jq -r '.id')
echo $EVAL_DATASET_ID
To monitor the status of validation dataset download, run (pUll) the cURL request:
EVAL_DATASET_PULL_STATUS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/datasets/$EVAL_DATASET_ID -H "Authorization: Bearer $TOKEN" | jq -r '.status')
echo $EVAL_DATASET_PULL_STATUS
Find the base experiment.
BASE_EXPERIMENT_ID=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments:base -H "Authorization: Bearer $TOKEN" | jq -r '[.experiments.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .id')
echo $BASE_EXPERIMENT_ID
Response from this endpoint also provides the compatible dataset type and lists of compatible dataset formats.
DATASET_TYPE=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments:base -H "Authorization: Bearer $TOKEN" | jq -r '[.experiments.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .dataset_type')
echo $DATASET_TYPE
DATASET_FORMATS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments:base -H "Authorization: Bearer $TOKEN" | jq -r '[.experiments.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .dataset_formats')
echo $DATASET_FORMATS
Creating an experiment.
Request body parameters for creating an experiment are:
network_arch
- one of TAO’s supported network architecturesencryption_key
- encryption key for loading the base experimentcheckpoint_choose_method
- best_model/latest_model/from_epoch_numbertrain_datasets
- list of train dataset id’s where each ID is obtained during the creation of respective train datasetseval_dataset
- dataset id obtained during creation of the eval datasetinference_dataset
- dataset id obtained during creation of the test datasetcalibration_dataset
- dataset id obtained during the creation of the train dataset (not a list)docker_env_vars
- dictionary of Docker environment variables pertaining to MLOPs, like WANDB and CLEARMLbase_experiment
- list of base experiment id’s that can be obtained from the ‘Find Base Experiment’ stepcloud_details
- dictionary of required cloud storage values, like bucket name or access credentials with write permissions
Note
Modify the cloud_details keys based on your cloud bucket credentials.
EXPERIMENT_ID=$(curl -s -X POST \
$BASE_URL/orgs/ea-tlt/experiments \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"network_arch": "dino",
"encryption_key": "tlt_encode",
"checkpoint_choose_method": "best_model",
"train_datasets": ["'"$TRAIN_DATASET_ID"'"],
"eval_dataset": "'"$EVAL_DATASET_ID"'",
"inference_dataset": "'"$EVAL_DATASET_ID"'",
"calibration_dataset": "'"$TRAIN_DATASET_ID"'",
"docker_env_vars": {},
"base_experiment": ["'"$BASE_EXPERIMENT_ID"'"],
"cloud_details": {
"cloud_type": "aws",
"cloud_specific_details": {
"cloud_region": "us-west-1",
"cloud_bucket_name": "bucket_name",
"access_key": "access_key",
"secret_key": "secret_key"
}
}
}' | jq -r '.id')
echo $EXPERIMENT_ID
# Cloud details example for Azure CSP
# "cloud_details": {
# "cloud_type": "azure",
# "cloud_specific_details": {
# "account_name": "account_name",
# "access_key": "access_key",
# "cloud_bucket_name": "optional_container_name",
# }
# }
Note
Only AWS and Azure are supported for creating experiments and storing training artifacts, including checkpoints and logs.
Train the Dino model.
Request body parmeters for running the train action are:
parent_job_id
- ID of the parent job, if anyactions
- action to be executedspecs
- spec dictionary of the action to be executed
TRAIN_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/train/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default')
echo $TRAIN_SPECS | jq
TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_epochs=10')
TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_gpus=2')
echo $TRAIN_SPECS | jq
TRAIN_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"train\", \"specs\": $TRAIN_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
Check the status of the training Job.
You can wait for the training job to complete before proceeding to other actions. To monitor the status of train job, run (pull) the cURL request:
curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$TRAIN_ID -H "Authorization: Bearer $TOKEN" | jq
Evaluating a trained model.
Request body parmeters for running the evaluate action are:
parent_job_id
- ID of the parent job, if anyactions
- action to be executedspecs
- spec dictionary of the action to be executed
EVALUATE_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/evaluate/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default')
echo $EVALUATE_SPECS | jq
EVALUATE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"evaluate\", \"parent_job_id\": \"$TRAIN_ID\", \"specs\": $EVALUATE_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
Check the status of the evaluation job.
To monitor the status of an evaluation job, run (pull) the cURL request:
curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$EVALUATE_ID -H "Authorization: Bearer $TOKEN" | jq
. Run inference on a trained model.
Request body parmeters for running the inference action are:
parent_job_id
- ID of the parent job, if any
actions
- action to be executed
specs
- spec dictionary of the action to be executed
INFERENCE_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/inference/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default')
echo $INFERENCE_SPECS | jq
INFERENCE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"inference\", \"parent_job_id\": \"$TRAIN_ID\", \"specs\": $INFERENCE_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
Check the status of the inference job.
To monitor the status of the inference job, run (pull) the cURL request:
curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$INFERENCE_ID -H "Authorization: Bearer $TOKEN" | jq
AutoML
AutoML is a TAO Toolkit API service that automatically selects deep learning hyperparameters for a chosen model and dataset.
See the AutoML docs for more details.