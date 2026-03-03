REST API Overview and Examples#

The TAO (Train, Adapt, Optimize) API v2 provides a unified, job-centric interface for managing workspaces, datasets, and training jobs. This version simplifies the API structure with consolidated endpoints and improved authentication.

Examples in this section are based on cURL commands and jq JSON data processing on a Linux machine with CURL and the jq tool pre-installed.

Note For comprehensive API specifications, see the TAO API Reference.

API v2 Architecture# The TAO API v2 introduces a unified architecture with the following key improvements: Unified Jobs Endpoint All experiment and dataset operations are handled through /api/v2/orgs/{org_name}/jobs with a kind parameter ( experiment or dataset ). Environment Variable Authentication Authentication uses JWT tokens with environment variable support for better security and CI/CD integration. Resource-Specific Metadata Dedicated endpoints for workspace, dataset, and job metadata provide clearer access to resource information. Enhanced Job Control Comprehensive job management with pause, resume, cancel, and delete operations.

User Authentication# User authentication is based on NGC Personal Key. For more details, see the pre-requisites in API Setup. Login and Obtain JWT Token BASE_URL = https://api.tao.ngc.nvidia.com/api/v2 NGC_ORG_NAME = your_org_name NGC_API_KEY = nvapi-****** # Login to get JWT token CREDS = $( curl -s -X POST $BASE_URL /login -d '{ "ngc_key": "' " $NGC_API_KEY " '", "ngc_org_name": "' " $NGC_ORG_NAME " '" }' ) TOKEN = $( echo $CREDS | jq -r '.token' ) echo "Token: $TOKEN " Using the Token for API Calls For all subsequent API calls, include the token in the Authorization header: curl -s -X GET $BASE_URL /orgs/ $NGC_ORG_NAME /workspaces \ -H "Authorization: Bearer $TOKEN " Note The API Base URL can be retrieved after the cluster is setup. For more details, see the TAO API Setup.

API v2 Endpoints Overview# The TAO API v2 service is organized around these main resource types: Workspaces ( /api/v2/orgs/{org_name}/workspaces ) List workspaces

Create workspace

Get workspace metadata

Delete workspace

Backup workspace

Restore workspace Datasets ( /api/v2/orgs/{org_name}/datasets ) List datasets

Create dataset

Get dataset metadata

Delete dataset

Get dataset formats Jobs - Unified ( /api/v2/orgs/{org_name}/jobs ) List jobs

Create job (experiment or dataset)

Get job metadata

Get job status

Get job logs

Pause/Resume/Cancel job

Delete job

Download job files

List base experiments

Get job schema

Get GPU types

Publish/Remove published model Inference Microservices ( /api/v2/orgs/{org_name}/inference_microservices ) Start inference microservice

Get microservice status

Make inference request

Stop microservice Workspaces# In TAO 6.0+, cloud workspaces are used to pull datasets and store experiment results in popular cloud storage providers. Supported Cloud Types: AWS - cloud_type : aws; cloud_specific_details needed: access_key, secret_key, region, bucket_name

Azure - cloud_type : azure; cloud_specific_details needed: account_name, access_key, region, container_name

HuggingFace - cloud_type : huggingface; cloud_specific_details needed: token (datasets only, not for experiment storage) Creating a Workspace WORKSPACE_ID = $( curl -s -X POST $BASE_URL /orgs/ $NGC_ORG_NAME /workspaces \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "name": "my_workspace", "cloud_type": "aws", "cloud_specific_details": { "access_key": "AKIAIOSFODNN7EXAMPLE", "secret_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY", "region": "us-west-2", "bucket_name": "my-tao-bucket" }, "shared": false }' | jq -r '.id' ) echo $WORKSPACE_ID List Workspaces curl -s -X GET $BASE_URL /orgs/ $NGC_ORG_NAME /workspaces \ -H "Authorization: Bearer $TOKEN " | jq Get Workspace Metadata curl -s -X GET $BASE_URL /orgs/ $NGC_ORG_NAME /workspaces/ $WORKSPACE_ID \ -H "Authorization: Bearer $TOKEN " | jq Delete Workspace curl -s -X DELETE $BASE_URL /orgs/ $NGC_ORG_NAME /workspaces/ $WORKSPACE_ID \ -H "Authorization: Bearer $TOKEN " | jq Note For experiments, you must provide cloud storage with read and write access, which pushes the action artifacts, like training checkpoints, to the provided cloud storage. Datasets also require cloud storage with read and write access, as TAO may need to convert your dataset to a compatible format before training. Datasets# You can either use datasets stored in the cloud workspace with cloud_file_path or public dataset with an https url . This example workflow uses the object detection data based on the COCO dataset format. For more details about the COCO format, refer to the COCO dataset page. If you are using a custom dataset, it must follow the dataset structure as depicted below. $DATA_DIR ├── annotations.json ├── images ├── image_name_1.jpg ├── image_name_2.jpg ├── ... Note Ensure that the dataset folder structure in cloud_file_path or url matches the model’s requirements. For details, refer to Data Annotation Format.

Object Detection Use Case Example with API v2# The following example walks you through a complete TAO workflow using the unified v2 API. Note Datasets provided in these examples are subject to the following license Dataset License. Creating the Training Dataset TRAIN_DATASET_ID = $( curl -s -X POST \ $BASE_URL /orgs/ $NGC_ORG_NAME /datasets \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "type": "object_detection", "format": "coco", "url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_train_coco.tar.gz" }' | jq -r '.id' ) echo $TRAIN_DATASET_ID To monitor the status of the train dataset download: TRAIN_DATASET_PULL_STATUS = $( curl -s -X GET \ $BASE_URL /orgs/ $NGC_ORG_NAME /datasets/ $TRAIN_DATASET_ID \ -H "Authorization: Bearer $TOKEN " | jq -r '.status' ) echo $TRAIN_DATASET_PULL_STATUS Creating the Validation Dataset EVAL_DATASET_ID = $( curl -s -X POST \ $BASE_URL /orgs/ $NGC_ORG_NAME /datasets \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "type": "object_detection", "format": "coco", "url": "https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/tao_od_synthetic_val_coco.tar.gz" }' | jq -r '.id' ) echo $EVAL_DATASET_ID List Base Experiments # List all base experiments curl -s -X GET $BASE_URL /orgs/ $NGC_ORG_NAME /jobs:list_base_experiments \ -H "Authorization: Bearer $TOKEN " | jq # Find specific base experiment (e.g., RT-DETR with ResNet50) BASE_EXPERIMENT_ID = $( curl -s -X GET \ $BASE_URL /orgs/ $NGC_ORG_NAME /jobs:list_base_experiments \ -H "Authorization: Bearer $TOKEN " | \ jq -r '[.base_experiments[] | select(.network_arch == "rtdetr") | select(.ngc_path | contains("resnet50"))][0] | .id' ) echo $BASE_EXPERIMENT_ID Create Training Job (Unified v2 API) In API v2, you create jobs directly with all parameters in a single call: # Get job schema for train action TRAIN_SCHEMA = $( curl -s -X GET \ " $BASE_URL /orgs/ $NGC_ORG_NAME /jobs:schema?action=train&base_experiment_id= $BASE_EXPERIMENT_ID " \ -H "Authorization: Bearer $TOKEN " | jq -r '.default' ) # Modify specs as needed TRAIN_SPECS = $( echo $TRAIN_SCHEMA | jq '.train.num_epochs=10 | .train.num_gpus=2' ) # Create training job TRAIN_JOB_ID = $( curl -s -X POST $BASE_URL /orgs/ $NGC_ORG_NAME /jobs \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "kind": "experiment", "name": "rtdetr_training_job", "network_arch": "rtdetr", "encryption_key": "tlt_encode", "workspace": "' " $WORKSPACE_ID " '", "action": "train", "specs": ' " $TRAIN_SPECS " ', "train_datasets": ["' " $TRAIN_DATASET_ID " '"], "eval_dataset": "' " $EVAL_DATASET_ID " '", "inference_dataset": "' " $EVAL_DATASET_ID " '", "calibration_dataset": "' " $TRAIN_DATASET_ID " '", "base_experiment_ids": ["' " $BASE_EXPERIMENT_ID " '"], "automl_settings": { "automl_enabled": false } }' | jq -r '.id' ) echo $TRAIN_JOB_ID Monitor Training Job Status # Get job status curl -s -X GET $BASE_URL /orgs/ $NGC_ORG_NAME /jobs/ $TRAIN_JOB_ID \ -H "Authorization: Bearer $TOKEN " | jq '.status' # Get detailed job metadata curl -s -X GET $BASE_URL /orgs/ $NGC_ORG_NAME /jobs/ $TRAIN_JOB_ID \ -H "Authorization: Bearer $TOKEN " | jq # Get job logs curl -s -X GET " $BASE_URL /orgs/ $NGC_ORG_NAME /jobs/ $TRAIN_JOB_ID :logs" \ -H "Authorization: Bearer $TOKEN " Job Control Operations # Pause a running job curl -s -X POST " $BASE_URL /orgs/ $NGC_ORG_NAME /jobs/ $TRAIN_JOB_ID :pause" \ -H "Authorization: Bearer $TOKEN " | jq # Resume a paused job curl -s -X POST " $BASE_URL /orgs/ $NGC_ORG_NAME /jobs/ $TRAIN_JOB_ID :resume" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{"parent_job_id": "", "specs": {}}' | jq # Cancel a job curl -s -X POST " $BASE_URL /orgs/ $NGC_ORG_NAME /jobs/ $TRAIN_JOB_ID :cancel" \ -H "Authorization: Bearer $TOKEN " | jq Create Evaluation Job After training completes, run evaluation: # Get evaluation schema EVAL_SCHEMA = $( curl -s -X GET \ " $BASE_URL /orgs/ $NGC_ORG_NAME /jobs:schema?action=evaluate&base_experiment_id= $BASE_EXPERIMENT_ID " \ -H "Authorization: Bearer $TOKEN " | jq -r '.default' ) # Create evaluation job EVAL_JOB_ID = $( curl -s -X POST $BASE_URL /orgs/ $NGC_ORG_NAME /jobs \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "kind": "experiment", "name": "rtdetr_evaluation_job", "network_arch": "rtdetr", "encryption_key": "tlt_encode", "workspace": "' " $WORKSPACE_ID " '", "action": "evaluate", "parent_job_id": "' " $TRAIN_JOB_ID " '", "specs": ' " $EVAL_SCHEMA " ', "eval_dataset": "' " $EVAL_DATASET_ID " '" }' | jq -r '.id' ) echo $EVAL_JOB_ID Create Inference Job # Get inference schema INFERENCE_SCHEMA = $( curl -s -X GET \ " $BASE_URL /orgs/ $NGC_ORG_NAME /jobs:schema?action=inference&base_experiment_id= $BASE_EXPERIMENT_ID " \ -H "Authorization: Bearer $TOKEN " | jq -r '.default' ) # Create inference job INFERENCE_JOB_ID = $( curl -s -X POST $BASE_URL /orgs/ $NGC_ORG_NAME /jobs \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "kind": "experiment", "name": "rtdetr_inference_job", "network_arch": "rtdetr", "encryption_key": "tlt_encode", "workspace": "' " $WORKSPACE_ID " '", "action": "inference", "parent_job_id": "' " $TRAIN_JOB_ID " '", "specs": ' " $INFERENCE_SCHEMA " ', "inference_dataset": "' " $EVAL_DATASET_ID " '" }' | jq -r '.id' ) echo $INFERENCE_JOB_ID Export Model to ONNX # Get export schema EXPORT_SCHEMA = $( curl -s -X GET \ " $BASE_URL /orgs/ $NGC_ORG_NAME /jobs:schema?action=export&base_experiment_id= $BASE_EXPERIMENT_ID " \ -H "Authorization: Bearer $TOKEN " | jq -r '.default' ) # Create export job EXPORT_JOB_ID = $( curl -s -X POST $BASE_URL /orgs/ $NGC_ORG_NAME /jobs \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "kind": "experiment", "name": "rtdetr_export_job", "network_arch": "rtdetr", "encryption_key": "tlt_encode", "workspace": "' " $WORKSPACE_ID " '", "action": "export", "parent_job_id": "' " $TRAIN_JOB_ID " '", "specs": ' " $EXPORT_SCHEMA " ' }' | jq -r '.id' ) echo $EXPORT_JOB_ID Generate TensorRT Engine # Get TensorRT engine schema TRT_SCHEMA = $( curl -s -X GET \ " $BASE_URL /orgs/ $NGC_ORG_NAME /jobs:schema?action=gen_trt_engine&base_experiment_id= $BASE_EXPERIMENT_ID " \ -H "Authorization: Bearer $TOKEN " | jq -r '.default' ) # Create TensorRT engine generation job TRT_JOB_ID = $( curl -s -X POST $BASE_URL /orgs/ $NGC_ORG_NAME /jobs \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "kind": "experiment", "name": "rtdetr_trt_engine_job", "network_arch": "rtdetr", "encryption_key": "tlt_encode", "workspace": "' " $WORKSPACE_ID " '", "action": "gen_trt_engine", "parent_job_id": "' " $EXPORT_JOB_ID " '", "specs": ' " $TRT_SCHEMA " ' }' | jq -r '.id' ) echo $TRT_JOB_ID Download Job Files # List job files curl -s -X POST " $BASE_URL /orgs/ $NGC_ORG_NAME /jobs/ $TRAIN_JOB_ID :list_files" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "retrieve_logs": true, "retrieve_specs": true }' | jq # Download selective files curl -s -X POST " $BASE_URL /orgs/ $NGC_ORG_NAME /jobs/ $TRAIN_JOB_ID :download_selective_files" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "best_model": true, "latest_model": false }' > job_files.tar.gz # Download entire job curl -s -X GET " $BASE_URL /orgs/ $NGC_ORG_NAME /jobs/ $TRAIN_JOB_ID :download" \ -H "Authorization: Bearer $TOKEN " > job_complete.tar.gz Publish Model # Publish trained model to NGC curl -s -X POST " $BASE_URL /orgs/ $NGC_ORG_NAME /jobs/ $TRAIN_JOB_ID :publish_model" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "display_name": "RT-DETR Production Model v1.0", "description": "Trained RT-DETR model for object detection", "team": "ml_team" }' | jq # Remove published model curl -s -X POST " $BASE_URL /orgs/ $NGC_ORG_NAME /jobs/ $TRAIN_JOB_ID :remove_published_model" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "team": "ml_team" }' | jq

Inference Microservices# Deploy trained models as inference microservices for scalable, real-time inference. Start Inference Microservice MICROSERVICE_ID = $( curl -s -X POST \ $BASE_URL /orgs/ $NGC_ORG_NAME /inference_microservices:start \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "docker_image": "nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf2.11.0", "gpu_type": "a100", "num_gpus": 1, "parent_job_id": "' " $TRAIN_JOB_ID " '", "kind": "experiment", "model_path": "/workspace/models/best_model.pth", "workspace": "' " $WORKSPACE_ID " '", "checkpoint_choose_method": "best_model", "network_arch": "rtdetr" }' | jq -r '.id' ) echo $MICROSERVICE_ID Check Microservice Status curl -s -X GET \ " $BASE_URL /orgs/ $NGC_ORG_NAME /inference_microservices/ $MICROSERVICE_ID :status" \ -H "Authorization: Bearer $TOKEN " | jq Make Inference Request # Base64-encoded image inference curl -s -X POST \ " $BASE_URL /orgs/ $NGC_ORG_NAME /inference_microservices/ $MICROSERVICE_ID :inference" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "input": ["..."], "model": "rtdetr_model" }' | jq # Cloud media path inference curl -s -X POST \ " $BASE_URL /orgs/ $NGC_ORG_NAME /inference_microservices/ $MICROSERVICE_ID :inference" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "media": "s3://my-bucket/path/to/image.jpg", "prompt": "Detect objects in this image" }' | jq Stop Microservice curl -s -X POST \ " $BASE_URL /orgs/ $NGC_ORG_NAME /inference_microservices/ $MICROSERVICE_ID :stop" \ -H "Authorization: Bearer $TOKEN " | jq

Dataset Processing Jobs# Create dataset processing jobs using the unified jobs endpoint: Dataset Conversion Job CONVERT_JOB_ID = $( curl -s -X POST $BASE_URL /orgs/ $NGC_ORG_NAME /jobs \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "kind": "dataset", "dataset_id": "' " $TRAIN_DATASET_ID " '", "action": "convert", "specs": { "output_format": "tfrecords", "train_split": 0.8, "val_split": 0.2, "shuffle": true } }' | jq -r '.id' ) echo $CONVERT_JOB_ID

Workspace Backup and Restore# Backup Workspace curl -s -X POST " $BASE_URL /orgs/ $NGC_ORG_NAME /workspaces/ $WORKSPACE_ID :backup" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "backup_file_name": "mongodb_backup_20251110.gz" }' | jq Restore Workspace curl -s -X POST " $BASE_URL /orgs/ $NGC_ORG_NAME /workspaces/ $WORKSPACE_ID :restore" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "backup_file_name": "mongodb_backup_20251110.gz" }' | jq Note Restore action is recommended when reinstalling the FTMS Helm Chart or if ptmPull is set to False.

Workspace used for restore must refer to a cloud bucket which contains a backup file generated by the FTMS backup action.

Job Management Features# Graceful Job Termination# TAO FTMS supports graceful termination of training jobs, allowing them to complete their current checkpoint and upload results before shutting down. This ensures that no training progress is lost when pausing or stopping jobs. Using Graceful Pause# When pausing a job, you can specify the graceful parameter in the request body to allow the job to finish its current training epoch and upload checkpoints: # Graceful pause (recommended) - allows checkpoint upload before stopping curl -X POST $BASE_URL /orgs/ $NGC_ORG_NAME /jobs/ $JOB_ID :pause \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{"graceful": true}' # Abrupt pause - stops immediately without uploading curl -X POST $BASE_URL /orgs/ $NGC_ORG_NAME /jobs/ $JOB_ID :pause \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{"graceful": false}' Additional Termination Options# When running training jobs, you can configure graceful termination behavior using these top-level parameters: retain_checkpoints_for_resume (boolean): Retain intermediate checkpoints for resuming training later (useful for Hyperband AutoML)

early_stop_epoch (integer): Specify a predefined epoch number to stop training at, triggering graceful termination and checkpoint upload These options are specified as top-level parameters in the job run request: curl -X POST $BASE_URL /orgs/ $NGC_ORG_NAME /jobs \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "kind": "experiment", "action": "train", "specs": { "train": { "num_epochs": 10, "num_gpus": 2 } }, "retain_checkpoints_for_resume": true, "early_stop_epoch": 50 }' Job Timeout Configuration# TAO FTMS provides per-job timeout management for long-running training jobs. Each job has its own timeout value (default: 60 minutes), providing fine-grained control over different types of operations. Specifying Timeout When Running a Job# The timeout_minutes parameter is specified as a top-level field in the job run request body, not inside the specs. # Training job with 3-hour timeout curl -X POST $BASE_URL /orgs/ $NGC_ORG_NAME /jobs \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "kind": "experiment", "action": "train", "specs": { "train": { "num_epochs": 10, "num_gpus": 2 } }, "timeout_minutes": 180 }' Parameters# timeout_minutes (integer, optional): Timeout in minutes for the job. Default: 60 minutes. Must be at least 1 minute.

This is a top-level parameter alongside action , specs , parent_job_id , etc., NOT inside specs. Jobs that exceed their timeout are automatically terminated to prevent runaway processes from consuming cluster resources indefinitely. Best Practices# Adjust the timeout (default: 60 minutes) based on your workload.

Set realistic timeouts based on dataset size and model complexity.

Training jobs typically need longer timeouts than evaluation/inference jobs.

Consider hardware capabilities (GPU type memory) when setting timeout values.

Monitor job progress through the status API to adjust timeouts if needed.

For large-scale training (large models or extensive datasets), increase the timeout accordingly. Cloud File Operations with Progress Tracking# TAO FTMS provides enhanced visibility into cloud storage operations with detailed progress tracking for uploads and downloads. Progress information is available when you fetch the job metadata through the status API. Progress Tracking Features# Real-time progress updates for dataset downloads

Upload progress tracking when saving checkpoints to cloud storage

File count and size information for large model checkpoints

Current file being transferred with individual file progress

Overall transfer progress across all files

Transfer speed and ETA estimation Accessing Progress Information# Progress updates are included in the job metadata when you query job status: # Get job status to see progress curl -s -X GET $BASE_URL /orgs/ $NGC_ORG_NAME /jobs/ $JOB_ID \ -H "Authorization: Bearer $TOKEN " | jq Download Progress Examples# When downloading datasets or pretrained models, you’ll see progress updates like these in the job status response: NGC Model Download# Current file download: NGC: nvdinov2_vitg (nvidia/tao) Current file download Progress: 1.2 GB Total Download Progress: 1/7 files (14.3%), 1.2 GB/25.7 GB (4.6%) Remaining: 6 files, 24.5 GB, ETA: 0:06:14 HuggingFace Model Download# Current file download: HF: nvidia/Cosmos-Reason1-7B Current file download Progress: 7.1 GB/15.5 GB (45.7%) Total Download Progress: 3/7 files (42.9%), 11.7 GB/25.7 GB (45.3%) Remaining: 4 files, 14.1 GB, ETA: 0:01:55 Dataset Download# Current file download: mvtec_mgcn_train/images.tar.gz Current file download Progress: 80.0 MB/5.0 GB (1.6%) Total Download Progress: 5/7 files (71.4%), 20.1 GB/25.7 GB (78.2%) Remaining: 2 files, 5.6 GB, ETA: 0:00:31 Upload Progress Example# When uploading model checkpoints or results to cloud storage: Current file upload: train/model_epoch_000_step_00117.pth Current file upload Progress: 8.0 MB/1.0 GB (0.8%) Total Upload Progress: 2/4 files (50.0%), 8.0 MB/1.0 GB (0.8%) Remaining: 2 files, 1.0 GB, ETA: 0:44:39 Progress Information Includes# Current file : Name and source of the file being transferred

Current file progress : Bytes transferred and percentage for the current file

Total progress : Overall completion across all files (count and size)

Remaining : Files and data size yet to be transferred

ETA: Estimated time to completion based on current transfer rate Use Cases# Monitoring large dataset downloads from cloud storage

Tracking model checkpoint uploads during training

Observing PTM (PreTrained Model) downloads from NGC or HuggingFace

Monitoring experiment artifact uploads to cloud workspaces

Verifying transfer completion and detecting stalled operations

AutoML# AutoML is a TAO Toolkit API service that automatically selects deep learning hyperparameters for a chosen model and dataset. Create Training Job with AutoML AUTOML_JOB_ID = $( curl -s -X POST $BASE_URL /orgs/ $NGC_ORG_NAME /jobs \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $TOKEN " \ -d '{ "kind": "experiment", "name": "automl_training_job", "network_arch": "classification_pyt", "encryption_key": "tlt_encode", "workspace": "' " $WORKSPACE_ID " '", "action": "train", "specs": ' " $TRAIN_SPECS " ', "train_datasets": ["' " $TRAIN_DATASET_ID " '"], "eval_dataset": "' " $EVAL_DATASET_ID " '", "base_experiment_ids": ["' " $BASE_EXPERIMENT_ID " '"], "automl_settings": { "automl_enabled": true, "automl_algorithm": "bayesian", "automl_max_recommendations": 20, "automl_delete_intermediate_ckpt": true } }' | jq -r '.id' ) echo $AUTOML_JOB_ID Get AutoML Defaults curl -s -X GET \ " $BASE_URL /orgs/ $NGC_ORG_NAME /automl:get_param_details?base_experiment_id= $BASE_EXPERIMENT_ID " \ -H "Authorization: Bearer $TOKEN " | jq See the AutoML docs for more details.