Helm Chart Deployment#

The following is used to deploy or update the Fine-Tuning Microservice (FTMS) API on an existing Kubernetes cluster. You can use the following to enable HTTPS and enforce user authentication to enable secure multi-tenancy.

(Optional) Create a namespace to deploy the FTMS API.

kubectl create namespace nvidia-ftms

Note

Deploying the FTMS API in a non-default namespace, like nvidia-ftms, affects API paths. If ingress is enabled, you might need to access the API at https://<host>/<namespace>/api instead of https://<host>/api. Consider using the default namespace to avoid modifying paths in notebooks or commands.

You can shut down an already deployed FTMS API.

helm delete tao-api --namespace nvidia-ftms

You must use the provided Helm chart to deploy FTMS resources.

helm fetch https://helm.ngc.nvidia.com/nvidia/tao/charts/tao-toolkit-api-6.0.0.tgz --username='$oauthtoken' --password=<YOUR API KEY>
mkdir tao-api && tar -zxvf tao-toolkit-api-6.0.0.tgz

You can customize the deployment if necessary by updating the chart’s tao-api/values.yaml.

Required values:

  • ngc_api_key: The admin NGC Personal Key to create imagepullsecret for nvcr.io access.

  • ptmApiKey: The NGC Legacy API key to pull pretrained models from across NGC orgs. A required value if ptmPull is true.

    Please visit NGC to create your NGC personal key and legacy API key (Requires NGC account).

Optional values:

Deployment-related parameters:

  • backend: Platform used for training jobs. Options are local-k8s amd NVCF. Defaults to local-k8s.

  • hostPlatform: Platform used for hosting the API orchestration service. Options are local and NVCF. Defaults to local.

  • ingressEnabled: Whether to enable ingress controller. Must be disabled when hostPlatform is NVCF. Defaults to true.

  • hostBaseUrl: Base URL of the API service. Format is https://<host>:<port>, for example https://10.10.10.10:32080.

  • serviceAdminUUID: UUID of the service admin user. This user has access to internal API endpoints.

Note

To obtain your serviceAdminUUID, run the following Python code:

import requests
import uuid

key = "<YOUR_NGC_PERSONAL_KEY>"  # Replace with your actual NGC Personal Key
url = 'https://api.ngc.nvidia.com/v3/keys/get-caller-info'

r = requests.post(
    url,
    headers={'Content-Type': 'application/x-www-form-urlencoded'},
    data={'credentials': key},
    timeout=5
)

ngc_user_id = r.json().get('user', {}).get('id')
service_admin_uuid = str(uuid.uuid5(uuid.UUID(int=0), str(ngc_user_id)))
print(f"Your serviceAdminUUID is: {service_admin_uuid}")
  • host, tlsSecret: For enabling HTTPS and enforcing user authentication, and enabling secure multi-tenancy.

  • corsOrigin: For enabling CORS and setting origin.

  • authClientID: Reserved for future NVIDIA Starfleet authentication.

Container related parameters:

  • image: Location of the TAO API container image.

  • ngcImagePullSecretName: Secret name set up to access the NVIDIA nvcr.io registry. Defaults to ‘imagePullSecret’.

  • imagePullPolicy: Set to always fetch from nvcr.io instead of using a locally cached image. Defaults to ‘Always’.

  • pythonVersion: Version of Python used in the container. Defaults to 3.12.

  • pythonBasePath: Path to the Python executable. Defaults to /usr/local/lib/python.

Other parameters:

  • ptmOrgTeams: List of org/teams that pretrained models are available for. Defaults to nvidia/tao,ea-tlt/tao_ea.

  • ptmPull: Whether to pull pretrained models from NGC when deploying API. Defaults to true.

  • maxNumGpuPerNode: Number of GPUs assigned to each job.

  • mongoOperatorEnabled: Whether to enable the MongoDB operator. Defaults to false.

  • telemetryOptOut: Set to true to opt out from NVIDIA collection of anonymous usage metrics.

We provide additional configurable parameters for dependent services:

  • mongo*: List of parameters for mongodb memory, CPU, and storage configuration.

  • community-operator: Configuration for the mongo community operator.

  • ingress-nginx: Configuration for ingress-nginx controller.

  • notebooksDir: Path to the notebooks directory in JupyterLab. Defaults to notebooks.

  • enableVault: Whether to enable vault for secrets management. Default to false.

  • vault: Configuration for the vault operator.

  • profiler: Whether to enable the Python profiler. Defaults to False.

  • kube-prometheus-stack.enabled: Whether to enable the prometheus in the cluster. Default to false

  • kratosClientCert: Client certificate to export telemetry to Kratos.

  • kratosClientKey: Client key to export telemetry to Kratos.

Example for creating a tlsSecret:

openssl req -x509 -sha256 -nodes -days 365 -newkey rsa:2048 -keyout tls.key -out tls.crt -subj "/CN=ec2-34-221-205-157.us-west-2.compute.amazonaws.com/O=ec2-34-221-205-157.us-west-2.compute.amazonaws.com" --addext "subjectAltName = DNS:ec2-34-221-205-157.us-west-2.compute.amazonaws.com"
kubectl create secret tls tls-secret --key tls.key --cert tls.crt --namespace default

Then install the FTMS API service:

helm install tao-api tao-api/ --namespace nvidia-ftms

FTMS Deployment is completed when all pods are in the Running state. This may take 10-15 minutes.

kubectl get pods -n nvidia-ftms

To debug a deployment, look for events toward the bottom of the following command’s output:

kubectl describe pods tao-api -n nvidia-ftms

Next Steps#

  • The swagger UI can be accessed at <host_url>/swagger

  • The notebooks can be downloaded at <host_url>/tao_api_notebooks.zip

  • host_url in the notebooks: The base URL of the API service. Format is http://<host>:<port>, for example http://10.10.10.10:32080

After successful deployment, you can start using the FTMS API through either:

  • The Remote Client CLI - A command-line interface for interacting with the API

  • The REST API - Direct HTTP endpoints for programmatic access

Or a tutorial notebook where we will distill a RT-DETR model down to 1/4 of its size but keep the same accuracy.

Choose the interface that best suits your needs and refer to the corresponding documentation section for detailed usage instructions.

Quick Start: Log-In#

The following diagram and examples show how to interact with the FTMS API quickly after a successful deployment, using either the Remote Client CLI or direct REST API calls.

graph TD User((User)) CLI[Remote Client CLI] API[REST API] FTMS[FTMS API Service] User -->|CLI| CLI -->|HTTP| FTMS User -->|HTTP| API --> FTMS

User interaction flow with FTMS API#

Log-In Example

  • Using Remote Client CLI:

    BASE_URL=<host_url>/default/api/v1 tao-client login --ngc-key <NGC_KEY> --ngc-org-name <NGC_ORG_NAME> --enable-telemetry
    
  • Using curl (REST API):

    curl -X POST "<host_url>/api/v1/login" \
      -H "Content-Type: application/json" \
      -d '{"ngc_org_name": "<NGC_ORG_NAME>", "ngc_key": "<NGC_KEY>", "enable_telemetry": true}'
    

Replace <host_url> and <NGC_ORG_NAME> and <NGC_KEY> with your actual API endpoint and NGC key.

For more details, see the Remote Client CLI and REST API documentation sections.

Common issues are:

  • GPU Operator pods not in Ready or Completed states

  • Invalid values.yaml file

  • Missing or invalid imagepullsecret

  • Missing or invalid ngc_api_key

  • Missing or invalid ptmApiKey