Step 2: Set Up Required Infrastructure

Route Optimization

NVIDIA AI Workflows are designed to be deployed on a cloud-native Kubernetes-based platform, which can be deployed on-premise or using a cloud service provider (CSP).

The infrastructure stack that will be set up for the workflow should follow the diagram below:

ai-workflow-stack.png

Follow the instructions in the sections below to set up the required infrastructure (denoted by the blue and grey boxes) that will be used in Step 3: Install Workflow Components (denoted by the green box).

NVIDIA AI Workflows at minimum require a single GPU-enabled node for running the provided example workload. Production deployments should be performed in an HA environment.

The following hardware specification for the GPU-enabled node is recommended for this workflow:

  • 1x T4/A10/A30/A40/A100 GPUs with 16 GB or more of GPU memory

  • 8 vCPU Cores

  • 32 GB RAM

  • 500 GB HDD

Make a note of these hardware specifications, as you will use them in the following sections to provision the nodes used in the Kubernetes cluster.

Note

The Kubernetes cluster and Cloud Native Service Add-On Pack may have additional infrastructure requirements for networking, storage, services, etc. More detailed information can be found in the NVIDIA Cloud Native Service Add-On Pack Deployment Guide.


The workflow requires a Kubernetes cluster that is supported by NVIDIA AI Enterprise to be provisioned.

The Cloud Native Service Add-On Pack only supports a subset of NVIDIA AI Enterprise-supported Kubernetes distributions at this time. Specific supported distributions and the steps to provision a cluster can be found in the NVIDIA Cloud Native Service Add-On Pack Deployment Guide.

An example reference to provision a minimal cluster based on the NVIDIA AI Enterprise VMI, with NVIDIA Cloud Native Stack, can be found in the guide here.

Once the Kubernetes cluster has been provisioned, proceed to the next step in the NVIDIA Cloud Native Service Add-On Pack Deployment Guide to deploy the add-on pack on the cluster.

An example reference following the one from the previous section can be found here.

All of the workflow components are integrated and deployed on top of the the previously described infrastructure stack as a starting point. The workflow can then be customized and integrated with one’s own specific environment if required.

After the add-on pack has been installed, proceed to Step 3: Install Workflow Components to continue setting up the workflow.

© Copyright 2022-2023, NVIDIA. Last updated on May 23, 2023.