Setup
The TAO API service can run on any Kubernetes platform. This section describes how to set up the TAO API service on the following platforms: a bare-metal server, AWS (Amazon Web Services) EKS, and Azure AKS.
Hardware
Minimum Requirements
1 or more GPU node(s) where all GPUs within a given node match.
32 GB system RAM
32 GB of GPU RAM
8 core CPU
1 NVIDIA Discrete GPU: Volta, Turing, Ampere, Hopper architecture
16 GB of SSD space
Software
OS Support
Ubuntu 22.04 (fresh install)
Deployment Steps
Download the necessary software using the NGC CLI.
ngc registry resource download-version "nvidia/tao/tao-getting-started:5.5.0"
Change current directory:
cd tao-getting-started_v5.5.0/setup/quickstart_api_bare_metal
Setup proxy and custom CA certificates.
If applicable, make sure your deployment machine is set with Internet access.
Make sure the following environment variables are properly set:
HTTP_PROXY, HTTPS_PROXY
http_proxy, HTTPS_PROXY
NO_PROXY
If you are using a custom CA SSL Certificate, you need to copy the certificate bundle locally:
cp <path>/<certificat bundle file>.crt ./my-cert.crt
The remote node users must have sudo privileges.
Execute the following each node (the following example assumes an Ubuntu user):
sudo echo "ubuntu ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
Add content to your inventory.
vi hosts
Use either use a password (
`ansible_ssh_pass
) or SSH private key file (ansible_ssh_private_key_file
) for credentials.The following is an example with user/password credentials.
[master] 127.0.0.2 ansible_ssh_user='ubuntu' ansible_ssh_pass='password' ansible_ssh_extra_args='-o StrictHostKeyChecking=no' [nodes] 127.0.0.2 ansible_ssh_user='ubuntu' ansible_ssh_pass='password' ansible_ssh_extra_args='-o StrictHostKeyChecking=no'
The following is an example with an SSH key. You can generate a local SSH key using
ssh-keygen
, then populate your public key to the remote node(s) usingssh-copy-id
.
[master]
1.1.1.1 ansible_ssh_user='ubuntu' ansible_ssh_private_key_file='/home/user/.ssh/id_rsa'
[nodes]
1.1.1.2 ansible_ssh_user='ubuntu' ansible_ssh_private_key_file='/home/user/.ssh/id_rsa'
Use the following command to validate the SSH credentials for the remote node(s). A proper response would be “root”.
ssh ubuntu@127.0.0.2 'sudo whoami'
Set your deployment parameters, such as chart version, NGC credentials, etc.
vi deploy.yml
Below is an example.
ngc_api_key: YzZtczM5amdtdDcwNjk... ngc_email: johndoe@mycorp.com chart: https://helm.ngc.nvidia.com/nvidia/tao/charts/tao-api-5.5.0.tgz chart_values: ./tao-toolkit-api-helm-values.yml cluster_name: tao-api-demo
Optionally, you can add any values that you would like to override while installing the API chart. This is an uncommon use case.
vi tao-toolkit-api-helm-values.yml
Proceed with deployment.
bash setup.sh install
If you need to completely remove the installed Kubernetes services from your machine, you can use the following command.
bash setup.sh uninstall
Pre-Requisites
Step 1: AWS Account
If your organization has an AWS (Amazon Web Services) account that can be used to host the TAO API service, contact your AWS account administrator to perform the next steps.
If you do not have an AWS account, you can create one yourself.
To complete the following steps, log in to the AWS web console as either the AWS account root user or a user with Admin privileges.
Step2: IAM User
Follow these steps to create an AWS IAM user, group and attach policies for automated deployment of the TAO API.
Once logged in to the AWS web console, search for and select the “IAM” service.
Select Users on the left panel and click Add users.
In the Add user wizard, provide an appropriate User name and select Access key - Programmatic access for the AWS credential type.
Navigate to Next: Permissions > Next: Tags > Next: Review and click Create user.
Click the Download button to download the access keys as
.csv
for use in setting up the TAO API using one-click scripts.NoteOnce you leave this screen, you will NOT be able to download the same credentials again.
Select User groups on the left panel and click Create group.
In the Create user group wizard, provide an appropriate name.
In the Add users to the group - Optional section, search for and select the user created in the previous step.
In the Attach permission policies - Optional section, search for and select the “AdministratorAccess” policy. Then click Create Group.
Step 3: S3 Bucket
Follow these steps to create an S3 bucket to store the state.
Search for and select the “S3” service.
Select Buckets on left hand panel and click Create bucket.
In the Create bucket wizard, provide an appropriate name for the Bucket and choose the region closest to you.
Ensure ACLs are disabled and all public access is blocked.
Enable Bucket Versioning and Server-side encryption with Amazon S3-managed keys, then click Create bucket.
Deployment
Step 1: Download resource
Download resource using NGC CLI.
ngc registry resource download-version "nvidia/tao/tao-getting-started:5.5.0"
Step 2: Change directory
Change current directory.
cd tao-getting-started_v5.5.0/setup/quickstart_api_aws_eks
Step 3: Optional API parameters
Optionally add any values you would like to override while installing the API chart.
vi tao-toolkit-api-helm-values.yml
Step 4: Deploy
Proceed with deployment.
bash setup.sh install
You are asked to enter the following parameters:
S3 bucket name that you created manually from AWS console (for example, automation-for-tao-api)
S3 bucket region (for example, us-west-1)
You choice of cluster name (for example, automation-for-tao-api)
You choice of AWS region (for example, us-west-1)
Your choice of VPC CIDR (for example, 10.0.0.0/16)
Path to your SSH public key (for example, ~/.ssh/id_rsa.pub or generated from ssh-keygen command)
Your NGC API key
Your NGC account email address
K8s Cluster Version (defaults to 1.23)
AWS Instance Type (defaults to g4dn.12xlarge)
Number of instances of this type (defaults to 1)
URL of the TAO API helm chart (defaults to latest)
Helm values file to override any values of the TAO API Helm chart (for example, tao-toolkit-api-helm-values.yml)
AWS Access Key ID (from pre-requisites section above)
Pre-Requisites
Step 1: Create an Azure Subscription
In case you belong to an organization that has an Azure subscription that fits your purpose, reach out to the respective administrator to perform the next steps.
In case you are an individual without an Azure Subscription, you can create one for yourself via the link: https://learn.microsoft.com/en-us/training/modules/create-an-azure-account/
Step 2: Login as Administrator
Once logged in to the Azure web console, search for a service named App registrations to select the App registrations service.
Click + New registration in the service page to create a new app registration.
In the New registration wizard, provide an appropriate Name and click Register.
In the App registration page, make a note of the Application (client) ID and Directory (tenant) ID.
These will be needed later for running the automated one click deployment scripts.
In the App registration page, select the Certificates & secrets sub-menu, and click on the + New client secret.
In the wizard, provide a description and click Add..
> :warning: Make a note of the Value. This will not be visible once you exit the screen. You will need this in the future for use in the automated one click deployment scripts.
Search for a service named Subscriptions to select the Subscriptions service.
Select the subscription in which you want to create your infrastructure.
In the Subscription page, make a note of the Subscription ID.
This will be needed later for running the automated one click deployment scripts.
In the subscription page, select the Access control (IAM) sub-menu, click + Add and then Add role assignment.
In the Role tab of the role assignment wizard, select the Contributor role.
In the Members tab of the role assignment wizard, Assign access to User, group or service principal, and click + Select members.
In the Select members wizard, search for the name of the app registration created earlier and click Select.
Click Review + assign.
Step 3: Create Azure Storage Account and Container to Store State
Search for a service named Resource groups and select the Resource groups service.
Select + Create in the service page to create a new resource group.
In the Create a resource group page, choose a subscription, provide an appropriate name for the resource group and chose the region where resources will be created by default.
Click Review + create.
Search for a service named Storage accounts and select the Storage accounts service.
Select + Create in the service page to create a new storage account.
In the Create a storage account page, choose a subscription, resource group, provide an appropriate name for the storage account.
Chose the region where resources will be created by default. Provide acceptable values for performance and redundancy
Click Review and then click Create.
In the Containers sub-menu of the created storage account, click on the + Container to create a new container.
In the New container wizard, provide an appropriate name for the container, select Private for Public access level and click Create.
Deployment
Step 1: Download the Resource
Download resources using NGC CLI.
ngc registry resource download-version "nvidia/tao/tao-getting-started:5.5.0"
Step 2: Change Current Directory
Change current directory.
cd tao-getting-started_v5.5.0/setup/quickstart_api_azure_aks
Step 3: Optional API Parameters
Optionally add any values you would like to override while installing the API chart.
vi tao-toolkit-api-helm-values.yml
Step 4: Deploy
Proceed with deployment.
bash setup.sh install
Pre-Requisites
Step 1: Create a GCP Project
In case you belong to an organization that has a GCP project that fits your purpose, reach out to the respective administrator to perform the next steps.
In case you are an individual without a GCP project, you can create one for yourself via the link: https://cloud.google.com/resource-manager/docs/creating-managing-projects
Step 2: Login as Administrator
Log in to the GCP web console as a user with Admin privileges in order to complete the next steps.
Step 3: Create a GCP Service Account with Access Keys for Automated Deployment of TAO API
Once logged in to the GCP web console, search for a service named Service Accounts to select the Service Accounts service.
Click + CREATE SERVICE ACCOUNT in the service page to create a new service account.
In the Create service account wizard, provide an appropriate Service account name and click CREATE AND CONTINUE.
Add the Owner role for the project and click DONE.
Click on the created Service account.
In the Service Account page, under the KEYS tab, click the ADD KEY dropdown and click Create new key.
In the wizard, select JSON and click CREATE.
> :warning: Make a note of the path where the key is downloaded and move it to a secure location. You will need this key in the future.
Step 4: Create GCS Bucket to Store State
Search for a service name Buckets and select the Buckets service.
Click + CREATE in the service page to create a new Bucket.
In the Create a bucket wizard, provide an appropriate name for the Bucket and choose the location closest to you. Click CREATE.
Click CONFIRM to ensure the bucket is not publicly accessible.
Deployment
Step 1: Download Resource
Download resource using NGC CLI.
ngc registry resource download-version "nvidia/tao/tao-getting-started:5.5.0"
Step 2: Change Directory
Change current directory.
cd tao-getting-started_v5.5.0/setup/quickstart_api_gcp_gks
Step 3: Optional API Parameters
Optionally add any values you would like to override while installing the API chart.
vi tao-toolkit-api-helm-values.yml
Step 4: Deploy
Proceed with deployment.
bash setup.sh install