REST API Overview and Examples#

The TAO (Train, Adapt, Optimize) API exposes dataset and experiment endpoints for setting up and running actions.

Examples in this section are based on cURL commands and jq JSON data processing on a Linux machine with CURL and the jq tool pre-installed.

User Authentication#

User authentication is based on NGC Personal Key. For more details, see the pre-requisites in API Setup.

For example:

BASE_URL=<API_BASE_URL>

NGC_ORG_NAME=ea-tlt

NGC_API_KEY=<nvapi-******>

CREDS=$(curl -s -X POST $BASE_URL/login -d '{"ngc_key": "'"$NGC_API_KEY"'", "ngc_org_name": "'"$NGC_ORG_NAME"'"}')

TOKEN=$(echo $CREDS | jq -r '.token')

For example, an API call for listing datasets might be:

curl -s -X GET $BASE_URL/orgs/ea-tlt/datasets -H "Authorization: Bearer $TOKEN"

Note

The API Base URL can be retrieved after the cluster is setup. For more details, see the TAO API Setup.

API Specs#

The TAO API service includes methods for dealing with the content of experimental workspaces, such as user datasets and experiments. It also includes methods for executing TAO actions applicable to data and specifications stored in experimental workspaces.

Typically, you create a dataset for a specific network type, create an experiment that is pointing to this dataset, pick a base experiment, and customize specs before executing network-related actions.

/api/v1/orgs/ea-tlt/workspaces

/api/v1/orgs/ea-tlt/datasets

/api/v1/orgs/ea-tlt/experiments

  • List workspaces for user

  • List datasets for user

  • List experiments for a user

  • Retrieve a workspace

  • Retrieve a dataset

  • Retrieve an experiment

  • Delete a workspace

  • Delete a dataset

  • Delete an experiment

  • Create a workspace

  • Create a dataset

  • Create a new experiment

  • Update workspace metadata

  • Update dataset metadata

  • Update experiment metadata

  • Update workspace metadata partially

  • Update dataset metadata partially

  • Update experiment metadata partially

  • Retrieve dataset action specs

  • Retrieve experiment action specs

  • Update dataset action specs

  • Update current experiment specs

  • Run dataset actions

  • Run experiment actions

  • List dataset jobs

  • List experiment jobs

  • Retrieve dataset job

  • Retrieve experiment job

  • Cancel dataset job

  • Cancel experiment actions

  • Delete a dataset job

  • Delete an experiment job

  • Download selective files of a dataset job

  • Download selective files of an experiment job

  • Download dataset action job

  • Download dataset action job

Note

See the TAO API Reference for more details.

Workspaces#

In TAO 6.0, cloud workspaces are used to pull datasets and store experiment results in popular cloud storage providers.

  • AWS - cloud_type: aws; cloud_specific_details needed: access_key, secret_key, aws_region, s3_bucket_name

  • Azure - cloud_type: azure; cloud_specific_details needed: account_name, access_key, azure_region, azure_blob_name

  • HuggingFace datasets - cloud_type: huggingface; cloud_specific_details needed: token (Not applicable for experiment results storage)

Creating a workspace.

Request body parameters:

  • name (optional) - Appropriate name for the dataset

  • description (optional) - Description of the dataset

  • cloud_type (mandatory) - One of TAO’s supported cloud types, e.g. aws, azure, huggingface

  • cloud_specific_details (mandatory) - Cloud specific details, e.g. access_key, secret_key, aws_region, s3_bucket_name, account_name, access_key, azure_region, azure_blob_name, token

  • shared (optional) - Whether the workspace is shared with other users in the org. Default is false

This example creates a AWS cloud workspace.

WORKSPACE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/workspaces \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
    "name": "my_workspace",
    "cloud_type": "aws",
    "cloud_specific_details": {
        "access_key": "access_key",
        "secret_key": "secret_key",
        "cloud_region": "us-west-1",
        "cloud_bucket_name": "bucket_name"
    },
    "shared": false, # whether the workspace is shared with other users in the org
}' | jq -r '.id')
echo $WORKSPACE_ID


# Cloud details example for Azure CSP
# "cloud_details": {
#     "cloud_type": "azure",
#     "cloud_specific_details": {
#         "account_name": "account_name",
#         "access_key": "access_key",
#         "cloud_region": "azure_region",
#         "cloud_bucket_name": "container_name"
#     }
# }

# Cloud details example for HuggingFace
# "cloud_details": {
#     "cloud_type": "huggingface",
#     "cloud_specific_details": {
#         "token": "access_token"
#     }
# }

Note

For experiments, you must provide cloud storage with read and write access, which pushes the action artifacts, like train’s checkpoints, to the provided cloud storage. Datasets also require cloud storage with read and write access, as TAO may need to convert your dataset to a compatible format before training.

Datasets#

User can either use datasets stored in the cloud workspace with cloud_file_path or public dataset with an https url

This example workflow uses the object detection data based on the COCO dataset format. For more details about the COCO format, refer to the COCO dataset page. If you are using a custom dataset, it must follow the dataset structure as depicted below.

$DATA_DIR
├── annotations.json
├── images
    ├── image_name_1.jpg
    ├── image_name_2.jpg
    ├── ...

Note

Ensure that the dataset folder structure in cloud_file_path or url matches the model’s requirements. For details, refer to Data Annotation Format.

Object Detection Use Case Example with API#

The following example walks you through a typical TAO use case.

Note

Datasets provided in these examples are subject to the following license Dataset License.

  1. Creating the training dataset.

    Request body parameters:

    • name (optional) - Appropriate name for the dataset

    • description (optional) - Description of the dataset

    • type (mandatory) - One of TAO’s supported dataset types

    • format (mandatory) - One of the formats supported for the type chosen above

    • workspace_id (mandatory) - ID of the workspace where dataset is stored

    • cloud_file_path (mandatory for cloud_type: aws, azure) - Absolute path to the dataset in the cloud workspace

    • url - URL to dataset in private huggingface workspace or public tar file

    • shared (optional) - Whether the dataset is shared with other users in the org. Default is false

    Note

    When organizing datasets for use, consider the following structures based on the source:

    • Cloud Storage (e.g., S3 bucket):

      • Organize as:

        bucket_folder/
        ├── images.tar.gz
        └── annotations.json
        
      • The images.tar.gz should unzip to:

        images/
        ├── 0001.jpg
        └── 0002.jpg
        
    • Public URL:

      • Ensure the dataset is formatted as a single tar file.

    TRAIN_DATASET_ID=$(curl -s -X POST \
        $BASE_URL/orgs/ea-tlt/datasets \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer $TOKEN" \
        -d '{
            "type": "object_detection",
            "format": "coco",
            "url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_train_coco.tar.gz"
        }
    }' | jq -r '.id')
    echo $TRAIN_DATASET_ID
    
    # Example for private dataset
    # "cloud_details": {
    #   "type": "object_detection",
    #   "format": "coco",
    #   "workspace_id": $WORKSPACE_ID,
    #   "cloud_file_path": /path/to/dataset/in/cloud/workspace
    # }
    

    Note

    For a public Hugging Face dataset, provide the HTTPS URL.

    To monitor the status of the train dataset download, run (pull) the cURL request:

    TRAIN_DATASET_PULL_STATUS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/datasets/$TRAIN_DATASET_ID -H "Authorization: Bearer $TOKEN" | jq -r '.status')
    echo $TRAIN_DATASET_PULL_STATUS
    
  2. Creating the validation dataset.

    Prossible request body parameters are the same as for the train dataset.

    EVAL_DATASET_ID=$(curl -s -X POST \
    $BASE_URL/orgs/ea-tlt/datasets \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $TOKEN" \
    -d '{
        "type": "object_detection",
        "format": "coco",
        "url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_val_coco.tar.gz"
    }' | jq -r '.id')
    echo $EVAL_DATASET_ID
    

    To monitor the status of validation dataset download, run the cURL request:

    EVAL_DATASET_PULL_STATUS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/datasets/$EVAL_DATASET_ID -H "Authorization: Bearer $TOKEN" | jq -r '.status')
    echo $EVAL_DATASET_PULL_STATUS
    
  3. Find the base experiment.

    BASE_EXPERIMENT_ID=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments:base -H "Authorization: Bearer $TOKEN" | jq -r '[.experiments.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .id')
    echo $BASE_EXPERIMENT_ID
    

    Response from this endpoint also provides the compatible dataset type and lists of compatible dataset formats.

    DATASET_TYPE=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments:base -H "Authorization: Bearer $TOKEN" | jq -r '[.experiments.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .dataset_type')
    echo $DATASET_TYPE
    
    DATASET_FORMATS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments:base -H "Authorization: Bearer $TOKEN" | jq -r '[.experiments.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .dataset_formats')
    echo $DATASET_FORMATS
    
  1. Creating an experiment.

    Request body parameters for creating an experiment are:

    • network_arch - one of TAO’s supported network architectures

    • encryption_key - encryption key for loading the base experiment

    • checkpoint_choose_method - best_model/latest_model/from_epoch_number

    • workspace_id - ID of the workspace where experiment artifacts are stored. You need write access to the cloud storage.

    • train_datasets - list of train dataset id’s where each ID is obtained during the creation of respective train datasets

    • eval_dataset - dataset id obtained during creation of the eval dataset

    • inference_dataset - dataset id obtained during creation of the test dataset

    • calibration_dataset - dataset id obtained during the creation of the train dataset (not a list)

    • docker_env_vars - dictionary of Docker environment variables pertaining to MLOPS, like WandB and ClearML

    • base_experiment - list of base experiment id’s that can be obtained from the ‘Find Base Experiment’ step

    • tensorboard_enabled - boolean to enable unique TensorBoard session for the experiment.

Note

  • The TensorBoard session may not be immediately available after you create the workflow, so please be patient while metrics and charts are being generated.

  • Note that TensorBoard feature is not supported for SegFormer and PyTorch based classification models.

EXPERIMENT_ID=$(curl -s -X POST \
$BASE_URL/orgs/ea-tlt/experiments \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d '{
"network_arch": "dino",
"encryption_key": "tlt_encode",
"checkpoint_choose_method": "best_model",
"train_datasets": ["'"$TRAIN_DATASET_ID"'"],
"eval_dataset": "'"$EVAL_DATASET_ID"'",
"inference_dataset": "'"$EVAL_DATASET_ID"'",
"calibration_dataset": "'"$TRAIN_DATASET_ID"'",
"docker_env_vars": {},
"base_experiment": ["'"$BASE_EXPERIMENT_ID"'"],
"workspace_id": "'"$WORKSPACE_ID"'"
}' | jq -r '.id')
echo $EXPERIMENT_ID

Note

Only AWS and Azure are supported for creating experiments and storing training artifacts, including checkpoints and logs.

  1. Train the Dino model.

    Request body parmeters for running the train action are:

    • parent_job_id - ID of the parent job, if any

    • actions - action to be executed - train.

    • specs - config parameters of the action in json format. To get the schema of the specs, use the endpoint in example below. Most common sections are:

      • dataset - dataset parameters, e.g. batch_size, etc.

      • model - model parameters, e.g. learning_rate, etc.

      • train - training parameters, e.g. num_epochs, num_gpus, etc.

    TRAIN_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/train/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default')
    echo $TRAIN_SPECS | jq
    
    TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_epochs=10')
    TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_gpus=2')
    echo $TRAIN_SPECS | jq
    
    TRAIN_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"train\", \"specs\": $TRAIN_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
    
  2. Check the status of the training Job.

    You can wait for the training job to complete before proceeding to other actions. To monitor the status of train job, run (pull) the cURL request:

    curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$TRAIN_ID -H "Authorization: Bearer $TOKEN" | jq
    

Note

If TensorBoard is enabled, you can view the training logs and metrics at $BASE_URL/tensorboard/v1/orgs/ea-tlt/experiments/$EXPERIMENT_ID.

  1. Evaluating a trained model.

    Request body parmeters for running the evaluate action are:

    • parent_job_id - ID of the parent job, if any

    • actions - evaluate.

    • specs - spec dictionary of the action to be executed

    EVALUATE_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/evaluate/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default')
    echo $EVALUATE_SPECS | jq
    
    EVALUATE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"evaluate\", \"parent_job_id\": \"$TRAIN_ID\", \"specs\": $EVALUATE_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
    
  2. Check the status of the evaluation job.

    To monitor the status of an evaluation job, run (pull) the cURL request:

    curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$EVALUATE_ID -H "Authorization: Bearer $TOKEN" | jq
    
  3. Run inference on a trained model.

    Request body parmeters for running the inference action are:

    • parent_job_id - ID of the parent job, if any

    • actions - inference.

    • specs - spec dictionary of the action to be executed

    INFERENCE_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/inference/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default')
    echo $INFERENCE_SPECS | jq
    
    INFERENCE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"inference\", \"parent_job_id\": \"$TRAIN_ID\", \"specs\": $INFERENCE_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
    
  4. Check the status of the inference job.

    To monitor the status of the inference job, run (pull) the cURL request:

    curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$INFERENCE_ID -H "Authorization: Bearer $TOKEN" | jq
    
  5. Export a trained model to standard ONNX format.

    Request body parmeters for running the export action are:

    • parent_job_id - ID of the parent job, if any

    • actions - export.

    • specs - spec dictionary of the action to be executed

    EXPORT_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/export/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default')
    echo $EXPORT_SPECS | jq
    
    EXPORT_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"export\", \"parent_job_id\": \"$TRAIN_ID\", \"specs\": $EXPORT_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
    
  6. Check the status of the export job.

    To monitor the status of the export job, run (pull) the cURL request:

    curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$EXPORT_ID -H "Authorization: Bearer $TOKEN" | jq
    
  7. Generate TensorRT Engine on a exported model.

    Request body parmeters for running the gen_trt_engine action are:

    • parent_job_id - ID of the parent job, if any

    • actions - gen_trt_engine.

    • specs - spec dictionary of the action to be executed

    GEN_TRT_ENGINE_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/gen_trt_engine/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default')
    echo $GEN_TRT_ENGINE_SPECS | jq
    
    GEN_TRT_ENGINE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"gen_trt_engine\", \"parent_job_id\": \"$EXPORT_ID\", \"specs\": $GEN_TRT_ENGINE_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
    
  8. Check the status of the gen_trt_engine job.

    To monitor the status of the gen_trt_engine job, run (pull) the cURL request:

    curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$GEN_TRT_ENGINE_ID -H "Authorization: Bearer $TOKEN" | jq
    
  9. Run inference on generated TensorRT Engine.

    Request body parmeters for running the inference action are:

    • parent_job_id - ID of the parent job, if any

    • actions - inference.

    • specs - spec dictionary of the action to be executed

    TRT_INFERENCE_SPECS=$(curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/specs/inference/schema -H "Authorization: Bearer $TOKEN" | jq -r '.default')
    echo $TRT_INFERENCE_SPECS | jq
    
    TRT_INFERENCE_ID=$(curl -s -X POST $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs -d "{\"action\": \"inference\", \"parent_job_id\": \"$GEN_TRT_ENGINE_ID\", \"specs\": $TRT_INFERENCE_SPECS}" -H "Authorization: Bearer $TOKEN" | jq -r .)
    
  10. Check the status of the TRT inference job.

    To monitor the status of the TRT inference job, run (pull) the cURL request:

    curl -s -X GET $BASE_URL/orgs/ea-tlt/experiments/$EXPERIMENT_ID/jobs/$TRT_INFERENCE_ID -H "Authorization: Bearer $TOKEN" | jq
    
  11. (Optional) Backup metadata to the cloud.

    Request body parmeters for running the backup action are:

    • backup_file_name - name of the backup file, default is mongodb_backup.gz. Recommended to set if using a shared cloud workspace.

    To backup all experiments, datasets, and workspaces to the cloud, run the following command:

    curl -s -X POST $BASE_URL/orgs/ea-tlt/workspaces/$WORKSPACE_ID/backup -d "{\"backup_file_name\": \"mongodb_backup.gz\"}" -H "Authorization: Bearer $TOKEN" | jq
    
  12. (Optional) Restore metadata from the cloud.

    Request body parmeters for running the restore action are:

    • backup_file_name - name of the backup file, default is mongodb_backup.gz. Recommended to set if using a shared cloud workspace.

    To restore all experiments, datasets, and workspaces from the cloud, run the following command:

    curl -s -X POST $BASE_URL/orgs/ea-tlt/workspaces/$WORKSPACE_ID/restore -d "{\"backup_file_name\": \"mongodb_backup.gz\"}" -H "Authorization: Bearer $TOKEN" | jq
    

    Note

    • Restore action is recommended when reinstalling the FTMS Helm Chart or if ptmPull is set to False.

    • Workspace used for restore must refer to a cloud bucket which contains a backup file generated by the FTMS backup action.

AutoML#

AutoML is a TAO Toolkit API service that automatically selects deep learning hyperparameters for a chosen model and dataset.

See the AutoML docs for more details.