Setting up Cloud Native Stack

NVIDIA Cloud Native Stack

This section will describe how to setup and deploy NVIDIA Cloud Native Stack on the NVIDIA AI Enterprise VMI, for the NVIDIA Cloud Native Service Add-On Pack to be deployed on top of.

Note

The upstream K8S deployment using NVIDIA Cloud Native Stack should be used for evaluation and development purposes only. It is not designed for production use.


  1. First, using the hardware specifications from the AI Workflows documentation, provision an instance of the NVIDIA AI Enterprise VMI. You can find this VMI under the Marketplaces for major CSPs; follow the instructions from the listing to provision an instance. More information can be found in the NVIDIA AI Enterprise Cloud Guide.

  2. Once the instance has been provisioned, refer to the NVIDIA AI Enterprise Cloud Guide to authorize the instance and activate your subscription.

  3. Once your subscription has been activated, ensure you can access the Enterprise Catalog, and create an NGC API Key if you have not done so already.

  4. Once you have created an NGC API Key, install and configure the NGC CLI if you have not done so already using the instructions here.

  1. We will install NVIDIA Cloud Native Stack on the NVIDIA AI Enterprise VMI that has been provisioned.

    • Open an SSH web console to the VMI you provisioned via the cloud provider’s methods.

    • Once an SSH session has been established, a local user with sudo access will need to be created. This can be done via the following commands:

      Copy
      Copied!
                  

      sudo adduser nvidia

      Copy
      Copied!
                  

      sudo usermod -aG sudo nvidia


    • Next, clone the NVIDIA Cloud Native Stack repo via the following command:

      Copy
      Copied!
                  

      git clone https://github.com/NVIDIA/cloud-native-stack.git


    • Navigate to the playbooks directory within the repo:

      Copy
      Copied!
                  

      cd cloud-native-stack/playbooks


    • Edit the host file, uncomment the localhosts line and replace the ansible_ssh_user, password, and sudo user fields with the local user credentials you specified previously. Once this is done, save and exit the file.

      Copy
      Copied!
                  

      nano hosts


    • Next, edit the cnc_values_6.3.yaml file, updating the cnc_docker and cnc_nvidia_driver parameters from no to yes. Defining these values to yes ensures a compatible version of Docker and the NVIDIA GPU driver are installed and available for use in this workflow, helping developers evaluate the Docker experience with Cloud Native Stack. Save and exit the file.

      Copy
      Copied!
                  

      nano cnc_values_6.3.yaml


    • Run the installer for NVIDIA Cloud Native Stack via the following command:

      Copy
      Copied!
                  

      bash setup.sh install

      Note

      The installer may fail if dpkg does not run cleanly or entirely during the instance provisioning. If this occurs, run the following command to resolve the issue, then retry the installation.

      Copy
      Copied!
                  

      sudo dpkg --configure -a


    • After the installation completes, create the kubeconfig for your user:

      Copy
      Copied!
                  

      mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config


    • Next, check that the Kubernetes cluster is up and running by running the following command:

      Copy
      Copied!
                  

      kubectl get pods -A


    • Once the cluster is up, install Local Path Provisioner to provide a storage class to use for the rest of the workflow:

      Copy
      Copied!
                  

      kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.23/deploy/local-path-storage.yaml


    • Patch the cluster to set local-path as the default storage class:

      Copy
      Copied!
                  

      kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'


  2. Once NVIDIA Cloud Native Stack has been set up, we will install Local Path Provisioner to provide a storage class to use. Proceed to the next section below.

  1. Once the cluster is up, install Local Path Provisioner to provide a storage class to use for the rest of the workflow using the following command:

    Copy
    Copied!
                

    kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.23/deploy/local-path-storage.yaml


  2. Patch the cluster to set local-path as the default storage class:

    Copy
    Copied!
                

    kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'


  3. Now we can install NVIDIA Cloud Native Service Add-on Pack. Return to the Deploying Cloud Native Service Add-On Pack on NVIDIA AI Enterprise VMI (NVIDIA Cloud Native Stack) section to continue.

More information about NVIDIA Cloud Native Stack can be found here.

And more information about NVIDIA Cloud Native Service Add-on Pack can be found here.

© Copyright 2022-2023, NVIDIA. Last updated on May 23, 2023.