Setting up Cloud Native Stack#

NVIDIA Cloud Native Stack

This section will describe how to setup and deploy NVIDIA Cloud Native Stack on the NVIDIA AI Enterprise VMI, for the NVIDIA Cloud Native Service Add-On Pack to be deployed on top of.

Note

The upstream K8S deployment using NVIDIA Cloud Native Stack should be used for evaluation and development purposes only. It is not designed for production use.

Prerequisites#

First, using the hardware specifications from the AI Workflows documentation, provision an instance of the NVIDIA AI Enterprise VMI. You can find this VMI under the Marketplaces for major CSPs; follow the instructions from the listing to provision an instance. More information can be found in the NVIDIA AI Enterprise Cloud Guide.
Once the instance has been provisioned, refer to the NVIDIA AI Enterprise Cloud Guide to authorize the instance and activate your subscription.
Once your subscription has been activated, ensure you can access the Enterprise Catalog, and create an NGC API Key if you have not done so already.
Once you have created an NGC API Key, install and configure the NGC CLI if you have not done so already using the instructions here.

Install NVIDIA Cloud Native Stack#

We will install NVIDIA Cloud Native Stack on the NVIDIA AI Enterprise VMI that has been provisioned.
- Open an SSH web console to the VMI you provisioned via the cloud provider’s methods.
- Once an SSH session has been established, a local user with sudo access will need to be created. This can be done via the following commands:
  1sudo adduser nvidia
  
  1sudo usermod -aG sudo nvidia
- Next, clone the NVIDIA Cloud Native Stack repo via the following command:
  git clone https://github.com/NVIDIA/cloud-native-stack.git
- Navigate to the playbooks directory within the repo:
  cd cloud-native-stack/playbooks
- Edit the host file, uncomment the localhosts line and replace the ansible_ssh_user, password, and sudo user fields with the local user credentials you specified previously. Once this is done, save and exit the file.
  nano hosts
- Next, edit the cnc_values_6.3.yaml file, updating the cnc_docker and cnc_nvidia_driver parameters from no to yes. Defining these values to yes ensures a compatible version of Docker and the NVIDIA GPU driver are installed and available for use in this workflow, helping developers evaluate the Docker experience with Cloud Native Stack. Save and exit the file.
  nano cnc_values_6.3.yaml
- Run the installer for NVIDIA Cloud Native Stack via the following command:
  bash setup.sh install
  
  Note
  
  The installer may fail if dpkg does not run cleanly or entirely during the instance provisioning. If this occurs, run the following command to resolve the issue, then retry the installation.
  
  sudo dpkg --configure -a
- After the installation completes, create the kubeconfig for your user:
  mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
- Next, check that the Kubernetes cluster is up and running by running the following command:
  kubectl get pods -A
- Once the cluster is up, install Local Path Provisioner to provide a storage class to use for the rest of the workflow:
  kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.23/deploy/local-path-storage.yaml
- Patch the cluster to set local-path as the default storage class:
  kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
Once NVIDIA Cloud Native Stack has been set up, we will install Local Path Provisioner to provide a storage class to use. Proceed to the next section below.

Installing Local Path Provisioner#

Once the cluster is up, install Local Path Provisioner to provide a storage class to use for the rest of the workflow using the following command:
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.23/deploy/local-path-storage.yaml

Patch the cluster to set local-path as the default storage class:

kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Now we can install NVIDIA Cloud Native Service Add-on Pack. Return to the Deploying Cloud Native Service Add-On Pack on NVIDIA AI Enterprise VMI (NVIDIA Cloud Native Stack) section to continue.

More information about NVIDIA Cloud Native Stack can be found at NVIDIA/cloud-native-stack.

And more information about NVIDIA Cloud Native Service Add-on Pack can be found here.