Step 3: Install Cloud Native Software Stack
NVIDIA AI Workflows are based on cloud-native platforms running Kubernetes. As a part of the workflow, we will deploy an example Kubernetes cluster meeting the hardware requirements, termed NVIDIA Cloud Native Stack, along with a set of components serving as implementation examples for integrating the AI application with Enterprise-like authentication, monitoring, reporting, and load balancing, termed NVIDIA Cloud Native Service Add-on Pack.
Ensure that the two VMIs provisioned from the previous Hardware Requirements section are accessible.
One of the VMIs will be used for the training pipeline. No additional setup is needed for this VMI.
The second VMI will be used for the inference pipeline. Proceed to the next step to continue setup for this VMI.
We will install NVIDIA Cloud Native Stack on the second NVIDIA AI Enterprise VMI that has been provisioned.
Open an SSH web console to the VMI you provisioned via the cloud provider’s methods.
Once an SSH session has been established, a local user with sudo access will need to be created. This can be done via the following commands:
adduser nvidia sudo adduser nvidia sudo usermod -aG sudo nvidia
Next, clone the NVIDIA Cloud Native Stack repo via the following command:
git clone https://github.com/NVIDIA/cloud-native-stack.git
Navigate to the playbooks directory within the repo:
Edit the host file, uncomment the localhosts line and replace the ansible_ssh_user, password, and sudo user fields with the local user credentials you specified previously. Once this is done, save and exit the file.
Next, edit the cnc_values_6.3.yaml file, updating the cnc_docker and cnc_nvidia_driver parameters from no to yes. Save and exit the file.
Run the installer for NVIDIA Cloud Native Stack via the following command:
bash setup.sh installNote
The installer may fail if dpkg does not run cleanly or entirely during the instance provisioning. If this occurs, run the following command to resolve the issue, then retry the installation.
sudo dpkg --configure -a
After the installation completes, create the kubeconfig for your user:
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
Next, check that the Kubernetes cluster is up and running by running the following command:
kubectl get pods -A
Finally, we’ll install Local Path Provisioner to provide a storage class to use for the rest of the workflow:
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.23/deploy/local-path-storage.yaml
Once NVIDIA Cloud Native Stack has been set up, we will install NVIDIA Cloud Native Service Add-on Pack. Refer to the NVIDIA Cloud Native Service Add-on Pack Deployment Guide for instructions to set up the prerequisites and deploy the software stack.
After the add-on pack has been installed, proceed to the Deployment Steps section to continue setting up the workflow.
More information about NVIDIA Cloud Native Stack can be found here.
And more information about NVIDIA Cloud Native Service Add-on Pack can be found here.