Finetuning Microservices Overview#
TAO Toolkit offers a new interface. The Fine-Tuning Micro-Service (FTMS) is TAO Toolkit’s new interface for accelerating model training without the overhead of setting up and managing the compute infrastructure. This new interface makes it easy to offer a managed training service to your development teams. It automates model fine-tuning flows, and improves user experience for non-domain experts, avoiding training flow intricacies and reducing significantly user input mistakes. It also makes it easy to integrate into other applications and MLOps services.
If you are looking on using this version of TAO Toolkit with the legacy TAO Launcher CLI, please refer to TAO Launcher. Or directly with DNN containers, please refer to Working With The Containers.
The following diagram depicts the high-level architecture where a remote client accesses an API allowing to train, optimize and test your model, as well as augment and annotate your data. This version of FTMS includes AutoML. Given a dataset and pretrained model, AutoML hyper parameters optimization finds the best parameters for better accuracy using Bayesian or Hyperband algorithms.

The FTMS securely accesses remotely stored datasets and pushes experiment artifacts to your remote storage.
Actions such as train, evaluate, prune, retrain, export, and inference can be spawned using API calls. For each action, you can request the action’s default parameters, update said parameters to your liking, then pass them while running the action. The specs are in the JSON format.
The service exposes a Job API endpoint that allows you to cancel, download, and monitor jobs. Job API endpoints also provide useful information such as epoch number, accuracy, loss values, and ETA.

The supported backend compute platforms are Local (bare-metal), public Cloud Service Providers (AWS, Azure, GCP, OCI) and NVIDIA’s own NVCF (dgx-cloud, run.ai).
The FTMS service is a secure multi-tenant service for NVIDIA NGC users that runs on a standard Kubernetes cluster and is deployed using a Helm chart.
Beside host machines and GPUs (nodes), FTMS depends on Kubernetes and NVIDIA’s GPU Operator. These can come from CSPs’ own Managed Kubernetes offerings, from NVIDIA’s own NVCF service or manually installed on your local (bare-metal) cluster.
FTMS deployment depends on the chosen compute platform
For local (bare-metal) deployment, please refer to Bare-Metal Setup.
For Amazon AWS EKS, please refer to EKS Setup.
For Azure, please refer to AKS Setup.
For Google Cloud Platform, please refer to GKE Setup.
For NVIDIA’s own NVCF (dgx-cloud, run.ai), please refer to NVCF Setup.
After FTMS deployment, one can access the API’s OpenAPI specs from /swagger or /redoc, and download notebooks using /tao_api_notebooks.zip.
If deployed outside of NVIDIA’s NVCF platform, FTMS also offers a Remote Client CLI.