Step 3: Install Cloud Native Software Stack

Audio Transcription

NVIDIA AI Workflows are based on cloud-native platforms running Kubernetes. As a part of the workflow, we will deploy an example Kubernetes cluster meeting the hardware requirements, termed NVIDIA Cloud Native Stack, along with a set of components serving as implementation examples for integrating the AI application with Enterprise-like authentication, monitoring, reporting, and load balancing, termed NVIDIA Cloud Native Service Add-on Pack.

Ensure that the two VMIs provisioned from the previous Hardware Requirements section are accessible.
- One of the VMIs will be used for the training pipeline. No additional setup is needed for this VMI.
- The second VMI will be used for the inference pipeline. Proceed to the next step to continue setup for this VMI.
We will install NVIDIA Cloud Native Stack on the second NVIDIA AI Enterprise VMI that has been provisioned.
- Open an SSH web console to the VMI you provisioned via the cloud provider’s methods.
- Once an SSH session has been established, a local user with sudo access will need to be created. This can be done via the following commands:
  Copy
  
  Copied!
  
  adduser nvidia sudo adduser nvidia sudo usermod -aG sudo nvidia
- Next, clone the NVIDIA Cloud Native Stack repo via the following command:
  Copy
  
  Copied!
  
  git clone https://github.com/NVIDIA/cloud-native-stack.git
- Navigate to the playbooks directory within the repo:
  Copy
  
  Copied!
  
  cd cloud-native-stack/playbooks
- Edit the host file, uncomment the localhosts line and replace the ansible_ssh_user, password, and sudo user fields with the local user credentials you specified previously. Once this is done, save and exit the file.
  Copy
  
  Copied!
  
  nano hosts
- Next, edit the cnc_values_6.3.yaml file, updating the cnc_docker and cnc_nvidia_driver parameters from no to yes. Save and exit the file.
  Copy
  
  Copied!
  
  nano cnc_values_6.3.yaml
- Run the installer for NVIDIA Cloud Native Stack via the following command:
  Copy
  
  Copied!
  
  bash setup.sh install
  Note
  
  The installer may fail if dpkg does not run cleanly or entirely during the instance provisioning. If this occurs, run the following command to resolve the issue, then retry the installation.
  
  Copy
  
  Copied!
  
  sudo dpkg --configure -a
- After the installation completes, create the kubeconfig for your user:
  Copy
  
  Copied!
  
  mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
- Next, check that the Kubernetes cluster is up and running by running the following command:
  Copy
  
  Copied!
  
  kubectl get pods -A
- Finally, we’ll install Local Path Provisioner to provide a storage class to use for the rest of the workflow:
  Copy
  
  Copied!
  
  kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.23/deploy/local-path-storage.yaml
Once NVIDIA Cloud Native Stack has been set up, we will install NVIDIA Cloud Native Service Add-on Pack. Refer to the NVIDIA Cloud Native Service Add-on Pack Deployment Guide for instructions to set up the prerequisites and deploy the software stack.
After the add-on pack has been installed, proceed to the Deployment Steps section to continue setting up the workflow.

More information about NVIDIA Cloud Native Stack can be found here.

And more information about NVIDIA Cloud Native Service Add-on Pack can be found here.