REST API Overview and Examples#
The TAO (Train, Adapt, Optimize) API exposes dataset and experiment endpoints for setting up and running actions.
Examples in this section are based on cURL commands and jq JSON data processing on a Linux machine with CURL and the jq tool pre-installed.
User Authentication#
User authentication is based on NGC Personal Key. For more details, see the pre-requisites in API Setup.
For example:
BASE_URL=<API_BASE_URL>
NGC_ORG_NAME=ea-tlt
NGC_API_KEY=<nvapi-******>
CREDS=$(curl -s -X POST $BASE_URL/login -d '{"ngc_key": "'"$NGC_API_KEY"'", "ngc_org_name": "'"$NGC_ORG_NAME"'"}')
TOKEN=$(echo $CREDS | jq -r '.token')
For example, an API call for listing datasets might be:
curl -s -X GET $BASE_URL/orgs/ea-tlt/datasets -H "Authorization: Bearer $TOKEN"
Note
The API Base URL can be retrieved after the cluster is setup. For more details, see the TAO API Setup.
API Specs#
The TAO API service includes methods for dealing with the content of experimental workspaces, such as user datasets and experiments. It also includes methods for executing TAO actions applicable to data and specifications stored in experimental workspaces.
Typically, you create a dataset for a specific network type, create an experiment that is pointing to this dataset, pick a base experiment, and customize specs before executing network-related actions.
|
|
|
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note
See the TAO API Reference for more details.
Workspaces#
In TAO 6.0, cloud workspaces are used to pull datasets and store experiment results in popular cloud storage providers.
AWS -
cloud_type
: aws;cloud_specific_details
needed: access_key, secret_key, aws_region, s3_bucket_nameAzure -
cloud_type
: azure;cloud_specific_details
needed: account_name, access_key, azure_region, azure_blob_nameHuggingFace datasets -
cloud_type
: huggingface;cloud_specific_details
needed: token (Not applicable for experiment results storage)
Creating a workspace.
Request body parameters:
name
(optional) - Appropriate name for the dataset
description
(optional) - Description of the dataset
cloud_type
(mandatory) - One of TAO’s supported cloud types, e.g. aws, azure, huggingface
cloud_specific_details
(mandatory) - Cloud specific details, e.g. access_key, secret_key, aws_region, s3_bucket_name, account_name, access_key, azure_region, azure_blob_name, token
shared
(optional) - Whether the workspace is shared with other users in the org. Default is falseThis example creates a AWS cloud workspace.
WORKSPACE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/workspaces \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN" \ -d '{ "name": "my_workspace", "cloud_type": "aws", "cloud_specific_details": { "access_key": "access_key", "secret_key": "secret_key", "cloud_region": "us-west-1", "cloud_bucket_name": "bucket_name" }, "shared": false, # whether the workspace is shared with other users in the org }' | jq -r '.id') echo $WORKSPACE_ID # Cloud details example for Azure CSP # "cloud_details": { # "cloud_type": "azure", # "cloud_specific_details": { # "account_name": "account_name", # "access_key": "access_key", # "cloud_region": "azure_region", # "cloud_bucket_name": "container_name" # } # } # Cloud details example for HuggingFace # "cloud_details": { # "cloud_type": "huggingface", # "cloud_specific_details": { # "token": "access_token" # } # }
Note
For experiments, you must provide cloud storage with read and write access, which pushes the action artifacts, like train’s checkpoints, to the provided cloud storage. Datasets also require cloud storage with read and write access, as TAO may need to convert your dataset to a compatible format before training.
Datasets#
User can either use datasets stored in the cloud workspace with cloud_file_path or public dataset with an https url
This example workflow uses the object detection data based on the COCO dataset format. For more details about the COCO format, refer to the COCO dataset page. If you are using a custom dataset, it must follow the dataset structure as depicted below.
$DATA_DIR ├── annotations.json ├── images ├── image_name_1.jpg ├── image_name_2.jpg ├── ...
Note
Ensure that the dataset folder structure in cloud_file_path or url matches the model’s requirements. For details, refer to Data Annotation Format.
Object Detection Use Case Example with API#
The following example walks you through a typical TAO use case.
Note
Datasets provided in these examples are subject to the following license Dataset License.
Creating the training dataset.
Request body parameters:
name
(optional) - Appropriate name for the datasetdescription
(optional) - Description of the datasettype
(mandatory) - One of TAO’s supported dataset typesformat
(mandatory) - One of the formats supported for the type chosen aboveworkspace_id
(mandatory) - ID of the workspace where dataset is storedcloud_file_path
(mandatory for cloud_type: aws, azure) - Absolute path to the dataset in the cloud workspaceurl
- URL to dataset in private huggingface workspace or public tar fileshared
(optional) - Whether the dataset is shared with other users in the org. Default is false
Note
When organizing datasets for use, consider the following structures based on the source:
Cloud Storage (e.g., S3 bucket):
Organize as:
bucket_folder/ ├── images.tar.gz └── annotations.json
The images.tar.gz should unzip to:
images/ ├── 0001.jpg └── 0002.jpg
Public URL:
Ensure the dataset is formatted as a single tar file.
TRAIN_DATASET_ID=$(curl -s -X POST \ $BASE_URL/orgs/ea-tlt/datasets \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN" \ -d '{ "type": "object_detection", "format": "coco", "url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_train_coco.tar.gz" } }' | jq -r '.id') echo $TRAIN_DATASET_ID # Example for private dataset # "cloud_details": { # "type": "object_detection", # "format": "coco", # "workspace_id": $WORKSPACE_ID, # "cloud_file_path": /path/to/dataset/in/cloud/workspace # }
Note
For a public Hugging Face dataset, provide the HTTPS URL.
To monitor the status of the train dataset download, run (pull) the cURL request:
TRAIN_DATASET_PULL_STATUS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/datasets/$TRAIN_DATASET_ID -H "Authorization: Bearer $TOKEN" | jq -r '.status') echo $TRAIN_DATASET_PULL_STATUS
Creating the validation dataset.
Prossible request body parameters are the same as for the train dataset.
EVAL_DATASET_ID=$(curl -s -X POST \ $BASE_URL/orgs/ea-tlt/datasets \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN" \ -d '{ "type": "object_detection", "format": "coco", "url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_val_coco.tar.gz" }' | jq -r '.id') echo $EVAL_DATASET_ID
To monitor the status of validation dataset download, run the cURL request:
EVAL_DATASET_PULL_STATUS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/datasets/$EVAL_DATASET_ID -H "Authorization: Bearer $TOKEN" | jq -r '.status') echo $EVAL_DATASET_PULL_STATUS
Find the base experiment.
BASE_EXPERIMENT_ID=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments:base -H "Authorization: Bearer $TOKEN" | jq -r '[.experiments.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .id') echo $BASE_EXPERIMENT_ID
Response from this endpoint also provides the compatible dataset type and lists of compatible dataset formats.
DATASET_TYPE=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments:base -H "Authorization: Bearer $TOKEN" | jq -r '[.experiments.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .dataset_type') echo $DATASET_TYPE DATASET_FORMATS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments:base -H "Authorization: Bearer $TOKEN" | jq -r '[.experiments.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .dataset_formats') echo $DATASET_FORMATS
Creating an experiment.
Request body parameters for creating an experiment are:
network_arch
- one of TAO’s supported network architecturesencryption_key
- encryption key for loading the base experimentcheckpoint_choose_method
- best_model/latest_model/from_epoch_numberworkspace_id
- ID of the workspace where experiment artifacts are stored. You need write access to the cloud storage.train_datasets
- list of train dataset id’s where each ID is obtained during the creation of respective train datasetseval_dataset
- dataset id obtained during creation of the eval datasetinference_dataset
- dataset id obtained during creation of the test datasetcalibration_dataset
- dataset id obtained during the creation of the train dataset (not a list)docker_env_vars
- dictionary of Docker environment variables pertaining to MLOPS, like WandB and ClearMLbase_experiment
- list of base experiment id’s that can be obtained from the ‘Find Base Experiment’ steptensorboard_enabled
- boolean to enable unique TensorBoard session for the experiment.
Note
The TensorBoard session may not be immediately available after you create the workflow, so please be patient while metrics and charts are being generated.
Note that TensorBoard feature is not supported for SegFormer and PyTorch based classification models.
EXPERIMENT_ID=$(curl -s -X POST \
$BASE_URL/orgs/ea-tlt/experiments \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"network_arch": "dino",
"encryption_key": "tlt_encode",
"checkpoint_choose_method": "best_model",
"train_datasets": ["'"$TRAIN_DATASET_ID"'"],
"eval_dataset": "'"$EVAL_DATASET_ID"'",
"inference_dataset": "'"$EVAL_DATASET_ID"'",
"calibration_dataset": "'"$TRAIN_DATASET_ID"'",
"docker_env_vars": {},
"base_experiment": ["'"$BASE_EXPERIMENT_ID"'"],
"workspace_id": "'"$WORKSPACE_ID"'"
}' | jq -r '.id')
echo $EXPERIMENT_ID
Note
Only AWS and Azure are supported for creating experiments and storing training artifacts, including checkpoints and logs.
Train the Dino model.
Request body parmeters for running the train action are:
parent_job_id
- ID of the parent job, if anyactions
- action to be executed - train.specs
- config parameters of the action in json format. To get the schema of the specs, use the endpoint in example below. Most common sections are:dataset
- dataset parameters, e.g. batch_size, etc.model
- model parameters, e.g. learning_rate, etc.train
- training parameters, e.g. num_epochs, num_gpus, etc.
TRAIN_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/train/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default') echo $TRAIN_SPECS | jq TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_epochs=10') TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_gpus=2') echo $TRAIN_SPECS | jq TRAIN_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"train\", \"specs\": $TRAIN_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
Check the status of the training Job.
You can wait for the training job to complete before proceeding to other actions. To monitor the status of train job, run (pull) the cURL request:
curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$TRAIN_ID -H "Authorization: Bearer $TOKEN" | jq
Note
If TensorBoard is enabled, you can view the training logs and metrics at $BASE_URL/tensorboard/v1/orgs/ea-tlt/experiments/$EXPERIMENT_ID.
Evaluating a trained model.
Request body parmeters for running the evaluate action are:
parent_job_id
- ID of the parent job, if anyactions
- evaluate.specs
- spec dictionary of the action to be executed
EVALUATE_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/evaluate/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default') echo $EVALUATE_SPECS | jq EVALUATE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"evaluate\", \"parent_job_id\": \"$TRAIN_ID\", \"specs\": $EVALUATE_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
Check the status of the evaluation job.
To monitor the status of an evaluation job, run (pull) the cURL request:
curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$EVALUATE_ID -H "Authorization: Bearer $TOKEN" | jq
Run inference on a trained model.
Request body parmeters for running the inference action are:
parent_job_id
- ID of the parent job, if anyactions
- inference.specs
- spec dictionary of the action to be executed
INFERENCE_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/inference/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default') echo $INFERENCE_SPECS | jq INFERENCE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"inference\", \"parent_job_id\": \"$TRAIN_ID\", \"specs\": $INFERENCE_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
Check the status of the inference job.
To monitor the status of the inference job, run (pull) the cURL request:
curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$INFERENCE_ID -H "Authorization: Bearer $TOKEN" | jq
Export a trained model to standard ONNX format.
Request body parmeters for running the export action are:
parent_job_id
- ID of the parent job, if anyactions
- export.specs
- spec dictionary of the action to be executed
EXPORT_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/export/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default') echo $EXPORT_SPECS | jq EXPORT_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"export\", \"parent_job_id\": \"$TRAIN_ID\", \"specs\": $EXPORT_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
Check the status of the export job.
To monitor the status of the export job, run (pull) the cURL request:
curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$EXPORT_ID -H "Authorization: Bearer $TOKEN" | jq
Generate TensorRT Engine on a exported model.
Request body parmeters for running the gen_trt_engine action are:
parent_job_id
- ID of the parent job, if anyactions
- gen_trt_engine.specs
- spec dictionary of the action to be executed
GEN_TRT_ENGINE_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/gen_trt_engine/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default') echo $GEN_TRT_ENGINE_SPECS | jq GEN_TRT_ENGINE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"gen_trt_engine\", \"parent_job_id\": \"$EXPORT_ID\", \"specs\": $GEN_TRT_ENGINE_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
Check the status of the gen_trt_engine job.
To monitor the status of the gen_trt_engine job, run (pull) the cURL request:
curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$GEN_TRT_ENGINE_ID -H "Authorization: Bearer $TOKEN" | jq
Run inference on generated TensorRT Engine.
Request body parmeters for running the inference action are:
parent_job_id
- ID of the parent job, if anyactions
- inference.specs
- spec dictionary of the action to be executed
TRT_INFERENCE_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/inference/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default') echo $TRT_INFERENCE_SPECS | jq TRT_INFERENCE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"inference\", \"parent_job_id\": \"$GEN_TRT_ENGINE_ID\", \"specs\": $TRT_INFERENCE_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
Check the status of the TRT inference job.
To monitor the status of the TRT inference job, run (pull) the cURL request:
curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$TRT_INFERENCE_ID -H "Authorization: Bearer $TOKEN" | jq
(Optional) Backup metadata to the cloud.
Request body parmeters for running the backup action are:
backup_file_name
- name of the backup file, default ismongodb_backup.gz
. Recommended to set if using a shared cloud workspace.
To backup all experiments, datasets, and workspaces to the cloud, run the following command:
curl -s -X POST $BASE_URL/orgs/ea-tlt/workspaces/$WORKSPACE_ID/backup -d "{\"backup_file_name\": \"mongodb_backup.gz\"}" -H "Authorization: Bearer $TOKEN" | jq
(Optional) Restore metadata from the cloud.
Request body parmeters for running the restore action are:
backup_file_name
- name of the backup file, default ismongodb_backup.gz
. Recommended to set if using a shared cloud workspace.
To restore all experiments, datasets, and workspaces from the cloud, run the following command:
curl -s -X POST $BASE_URL/orgs/ea-tlt/workspaces/$WORKSPACE_ID/restore -d "{\"backup_file_name\": \"mongodb_backup.gz\"}" -H "Authorization: Bearer $TOKEN" | jq
Note
Restore action is recommended when reinstalling the FTMS Helm Chart or if
ptmPull
is set to False.Workspace used for restore must refer to a cloud bucket which contains a backup file generated by the FTMS backup action.
AutoML#
AutoML is a TAO Toolkit API service that automatically selects deep learning hyperparameters for a chosen model and dataset.
See the AutoML docs for more details.