Setup

The NVTL API service can run on any Kubernetes platform. This section describes how to set up the NVTL API service on the following platforms: a bare-metal server, AWS (Amazon Web Services) EKS, and Azure AKS.

Hardware

Minimum Requirements

1 or more GPU node(s) where all GPUs within a given node match.

  • 32 GB system RAM

  • 32 GB of GPU RAM

  • 8 core CPU

  • 1 NVIDIA Discrete GPU: Volta, Turing, Ampere, Hopper architecture

  • 16 GB of SSD space

Software

OS Support

  • Ubuntu 22.04 (fresh install)

Deployment Steps

  1. Download the necessary software using the NGC CLI.

    Copy
    Copied!
                

    ngc registry resource download-version "nvidia/tao/tao-getting-started:5.3.0"


  2. Change current directory:

    Copy
    Copied!
                

    cd tao-getting-started_v5.3.0/setup/quickstart_api_bare_metal


  3. Setup proxy and custom CA certificates.

  4. If applicable, make sure your deployment machine is set with Internet access.

  1. Make sure the following environment variables are properly set:

  • HTTP_PROXY, HTTPS_PROXY

  • http_proxy, HTTPS_PROXY

  • NO_PROXY

  1. If you are using a custom CA SSL Certificate, you need to copy the certificate bundle locally:

    Copy
    Copied!
                

    cp <path>/<certificat bundle file>.crt ./my-cert.crt

    The remote node users must have sudo privileges.

  2. Execute the following each node (the following example assumes an Ubuntu user):

    Copy
    Copied!
                

    sudo echo "ubuntu ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers

  3. Add content to your inventory.

    Copy
    Copied!
                

    vi hosts

  4. Use either use a password (`ansible_ssh_pass) or SSH private key file (ansible_ssh_private_key_file) for credentials.

    1. The following is an example with user/password credentials.

      Copy
      Copied!
                  

      [master] 127.0.0.2 ansible_ssh_user='ubuntu' ansible_ssh_pass='password' ansible_ssh_extra_args='-o StrictHostKeyChecking=no' [nodes] 127.0.0.2 ansible_ssh_user='ubuntu' ansible_ssh_pass='password' ansible_ssh_extra_args='-o StrictHostKeyChecking=no'

  1. The following is an example with an SSH key. You can generate a local SSH key using ssh-keygen, then populate your public key to the remote node(s) using ssh-copy-id.

Copy
Copied!
            

[master] 1.1.1.1 ansible_ssh_user='ubuntu' ansible_ssh_private_key_file='/home/user/.ssh/id_rsa' [nodes] 1.1.1.2 ansible_ssh_user='ubuntu' ansible_ssh_private_key_file='/home/user/.ssh/id_rsa'


  1. Use the following command to validate the SSH credentials for the remote node(s). A proper response would be “root”.

    Copy
    Copied!
                

    ssh ubuntu@127.0.0.2 'sudo whoami'

  2. Set your deployment parameters, such as chart version, NGC credentials, etc.

    Copy
    Copied!
                

    vi deploy.yml

    Below is an example.

    Copy
    Copied!
                

    ngc_api_key: YzZtczM5amdtdDcwNjk... ngc_email: johndoe@mycorp.com chart: https://helm.ngc.nvidia.com/nvidia/tao/charts/nvtl-api-5.3.0.tgz chart_values: ./tao-toolkit-api-helm-values.yml cluster_name: nvtl-api-demo

  3. Optionally, you can add any values that you would like to override while installing the API chart. This is an uncommon use case.

    Copy
    Copied!
                

    vi tao-toolkit-api-helm-values.yml

  4. Proceed with deployment.

    Copy
    Copied!
                

    bash setup.sh install

If you need to completely remove the installed Kubernetes services from your machine, you can use the following command.

Copy
Copied!
            

bash setup.sh uninstall

Pre-Requisites

Step 1: AWS Account

If your organization has an AWS (Amazon Web Services) account that can be used to host the NVTL API service, contact your AWS account administrator to perform the next steps.

If you do not have an AWS account, you can create one yourself.

To complete the following steps, log in to the AWS web console as either the AWS account root user or a user with Admin privileges.

Step2: IAM User

Follow these steps to create an AWS IAM user, group and attach policies for automated deployment of the NVTL API.

  1. Once logged in to the AWS web console, search for and select the “IAM” service.

    eks-image001.png

  2. Select Users on the left panel and click Add users.

    eks-image003.png

  3. In the Add user wizard, provide an appropriate User name and select Access key - Programmatic access for the AWS credential type.

    eks-image005.png

  4. Navigate to Next: Permissions > Next: Tags > Next: Review and click Create user.

    eks-image007.png

  5. Click the Download button to download the access keys as .csv for use in setting up the TAO API using one-click scripts.

    Note

    Once you leave this screen, you will NOT be able to download the same credentials again.

    eks-image009.png

  6. Select User groups on the left panel and click Create group.

    eks-image011.png

  7. In the Create user group wizard, provide an appropriate name.

    eks-image013.png

  8. In the Add users to the group - Optional section, search for and select the user created in the previous step.

    eks-image015.png

  9. In the Attach permission policies - Optional section, search for and select the “AdministratorAccess” policy. Then click Create Group.

    eks-image017.png

Step 3: S3 Bucket

Follow these steps to create an S3 bucket to store the state.

  1. Search for and select the “S3” service.

    eks-image019.png

  2. Select Buckets on left hand panel and click Create bucket.

    eks-image021.png

  3. In the Create bucket wizard, provide an appropriate name for the Bucket and choose the region closest to you.

    eks-image023.png

  4. Ensure ACLs are disabled and all public access is blocked.

    eks-image025.png

  5. Enable Bucket Versioning and Server-side encryption with Amazon S3-managed keys, then click Create bucket.

    eks-image027.png

Deployment

Step 1: Download resource

Download resource using NGC CLI.

Copy
Copied!
            

ngc registry resource download-version "nvidia/tao/tao-getting-started:5.3.0"

Step 2: Change directory

Change current directory.

Copy
Copied!
            

cd tao-getting-started_v5.3.0/setup/quickstart_api_aws_eks

Step 3: Optional API parameters

Optionally add any values you would like to override while installing the API chart.

Copy
Copied!
            

vi tao-toolkit-api-helm-values.yml

Step 4: Deploy

Proceed with deployment.

Copy
Copied!
            

bash setup.sh install

You will be asked to enter the following parameters:

  • S3 bucket name that you created manually from AWS console (e.g. automation-for-tao-api)

  • S3 bucket region (e.g. us-west-1)

  • You choice of cluster name (e.g. automation-for-tao-api)

  • You choice of AWS region (e.g. us-west-1)

  • Your choice of VPC CIDR (e.g. 10.0.0.0/16)

  • Path to your SSH public key (e.g. ~/.ssh/id_rsa.pub or generated from ssh-keygen command)

  • Your NGC API key

  • Your NGC account email address

  • K8s Cluster Version (defaults to 1.23)

  • AWS Instance Type (defaults to g4dn.12xlarge)

  • Number of instances of this type (defaults to 1)

  • URL of the NVTL API helm chart (defaults to latest)

  • Helm values file to override any values of the NVTL API Helm chart (e.g. tao-toolkit-api-helm-values.yml)

  • AWS Access Key ID (from pre-requisites section above)

Pre-Requisites

Step 1: Create an Azure Subscription

In case you belong to an organization that has an Azure subscription that fits your purpose, reach out to the respective administrator to perform the next steps.

In case you are an individual without an Azure Subscription, you can create one for yourself via the link: https://learn.microsoft.com/en-us/training/modules/create-an-azure-account/

Step 2: Login as Administrator

Once logged in to the Azure web console, search for a service named App registrations to select the App registrations service.

aks-image001.png

Click + New registration in the service page to create a new app registration.

aks-image002.png

In the New registration wizard, provide an appropriate Name and click Register.

aks-image003.png

In the App registration page, make a note of the Application (client) ID and Directory (tenant) ID.

These will be needed later for running the automated one click deployment scripts.

aks-image004.png

In the App registration page, select the Certificates & secrets sub-menu, and click on the + New client secret.

aks-image005.png

In the wizard, provide a description and click Add..

aks-image006.png

> :warning: Make a note of the Value. This will not be visible once you exit the screen. You will need this in the future for use in the automated one click deployment scripts.

Search for a service named Subscriptions to select the Subscriptions service.

aks-image007.png

Select the subscription in which you want to create your infrastructure.

In the Subscription page, make a note of the Subscription ID.

This will be needed later for running the automated one click deployment scripts.

aks-image008.png

In the subscription page, select the Access control (IAM) sub-menu, click + Add and then Add role assignment.

aks-image009.png

In the Role tab of the role assignment wizard, select the Contributor role.

aks-image010.png

In the Members tab of the role assignment wizard, Assign access to User, group or service principal, and click + Select members.

In the Select members wizard, search for the name of the app registration created earlier and click Select.

Click Review + assign.

aks-image011.png

Step 3: Create Azure storage account and container to store state

Search for a service named Resource groups and select the Resource groups service.

aks-image012.png

Select + Create in the  service page to create a new resource group.

aks-image013.png

In the Create a resource group page, choose a subscription, provide an appropriate name for the resource group and chose the region where resources will be created by default.

Click Review + create.

aks-image014.png

Search for a service named Storage accounts and select the Storage accounts service.

aks-image015.png

Select + Create in the  service page to create a new storage account.

aks-image016.png

In the Create a storage account page, choose a subscription, resource group, provide an appropriate name for the storage account.

Chose the region where resources will be created by default. Provide acceptable values for performance and redundancy

Click Review and then click Create.

aks-image017.png

In the Containers sub-menu of the created storage account, click on the + Container to create a new container.

In the New container wizard, provide an appropriate name for the container, select Private for Public access level and click Create.

aks-image018.png

Deployment

Step 1: Download the resource

Download resources using NGC CLI.

Copy
Copied!
            

ngc registry resource download-version "nvidia/tao/tao-getting-started:5.3.0"

Step 2: Change current directory

Change current directory.

Copy
Copied!
            

cd tao-getting-started_v5.3.0/setup/quickstart_api_azure_aks

Step 3: Optional API parameters

Optionally add any values you would like to override while installing the API chart.

Copy
Copied!
            

vi tao-toolkit-api-helm-values.yml

Step 4: Deploy

Proceed with deployment.

Copy
Copied!
            

bash setup.sh install

Pre-Requisites

Step 1: Create a GCP project

In case you belong to an organization that has a GCP project that fits your purpose, reach out to the respective administrator to perform the next steps.

In case you are an individual without a GCP project, you can create one for yourself via the link: https://cloud.google.com/resource-manager/docs/creating-managing-projects

Step 2: Login as Administrator

Log in to the GCP web console as a user with Admin privileges in order to complete the next steps.

Step 3: Create a GCP service account with access keys for automated deployment of TAO API

Once logged in to the GCP web console, search for a service named Service Accounts to select the Service Accounts service.

gks-image001.png

Click + CREATE SERVICE ACCOUNT in the service page to create a new service account.

gks-image002.png

In the Create service account wizard, provide an appropriate Service account name and click CREATE AND CONTINUE.

gks-image003.png

Add the Owner role for the project and click DONE.

gks-image004.png

Click on the created Service account.

gks-image005.png

In the Service Account page, under the KEYS tab, click the ADD KEY dropdown and click Create new key.

gks-image006.png

In the wizard, select JSON and click CREATE.

gks-image007.png

> :warning: Make a note of the path where the key is downloaded and move it to a secure location. You will need this key in the future.

Step 4: Create GCS bucket to store state

Search for a service name Buckets and select the Buckets service.

gks-image008.png

Click + CREATE in the service page to create a new Bucket.

gks-image009.png

In the Create a bucket wizard, provide an appropriate name for the Bucket and choose the location closest to you. Click CREATE.

gks-image010.png

Click CONFIRM to ensure the bucket is not publicly accessible.

gks-image011.png

Deployment

Step 1: Download resource

Download resource using NGC CLI.

Copy
Copied!
            

ngc registry resource download-version "nvidia/tao/tao-getting-started:5.3.0"

Step 2: Change directory

Change current directory.

Copy
Copied!
            

cd tao-getting-started_v5.3.0/setup/quickstart_api_gcp_gks

Step 3: Optional API parameters

Optionally add any values you would like to override while installing the API chart.

Copy
Copied!
            

vi tao-toolkit-api-helm-values.yml

Step 4: Deploy

Proceed with deployment.

Copy
Copied!
            

bash setup.sh install

Previous Overview
Next Deployment
© Copyright 2024, NVIDIA. Last updated on Mar 22, 2024.