Deploying Cloud Native Service Add-On Pack#

NVIDIA Cloud Native Stack

This guide will walk through the deployment and setup steps of the NVIDIA Cloud Native Service Add-On Pack on an upstream Kubernetes deployment, such as the NVIDIA Cloud Native Stack. This is the simplest deployment configuration, where all components provided are installed directly on the cluster, with no integration to external services.

Note

The NVIDIA Cloud Native Stack deployment using upstream Kubernetes should be used for evaluation and development purposes only. It is not designed for production use.

The following steps assume that the Requirements section has been met, and the NVIDIA Cloud Native Stack K8S cluster has already been set up from the steps in the previous section.
Ensure that a kubeconfig is available and set via the KUBECONFIG environment variable or your user’s default location (.kube/config).
Ensure that a FQDN and wildcard DNS entry are available and resolvable for the K8S cluster that was created.
Download the NVIDIA Cloud Native Service Add-on Pack from the Enterprise Catalog onto the instance you have provisioned from here.
ngc registry resource download-version "nvaie/nvidia_cnpack:0.2.1"
Note

If you still need to install and set up the NGC CLI with your API Key, please do so by autoloading the resource. Instructions can be found here.
Navigate to the installer’s directory using the following command:
cd nvidia_cnpack_v0.2.1

Create a config file for the installation using the following template as a minimal config file. For full details on all the available configuration options, please reference the Advanced Usage section of the Appendix.

Note

Make sure to change the wildcardDomain field to match the DNS FQDN and wildcard record created as described in the Requirements section.

apiVersion: v1alpha1
kind: NvidiaPlatform
spec:
  platform:
    wildcardDomain: "*.my-cluster.my-domain.com"
    externalPort: 443
  ingress:
    enabled: true
  postgres:
    enabled: true
  certManager:
    enabled: true
  trustManager:
    enabled: true
  keycloak:
    databaseStorage:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 1G
      storageClassName: local-path
      volumeMode: Filesystem
  prometheus:
    storage:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 1G
      storageClassName: local-path
      volumeMode: Filesystem
  grafana:
    enabled: true
  elastic:
    enabled: true

Note

If you installed local-path-provisioner, the storageClassName can be left as shown: local-path

Make the installer executable via the following commands:
chmod +x ./nvidia-cnpack-linux-x86_64
Run the following command on the instance to set up NVIDIA Cloud Native Service Add-on Pack:
./nvidia-cnpack-linux-x86_64 create -f config.yaml
Once the install is complete, check that all the pods are healthy via the following command:
kubectl get pods -A
The output should look similar to the screenshot below:
As a part of the installation, the installer will create nvidia-platform and nvidia-monitoring namespaces that contain most of the components and information required for interacting with the deployed services.
- The default Keycloak instance URL is at: https://auth.my-cluster.my-domain.com
- Default admin credentials can be found within the nvidia-platform namespace, in a secret called keycloak-initial-admin via the following commands:
1kubectl get secret keycloak-initial-admin -n nvidia-platform -o jsonpath='{.data.username}' | base64 -d 2kubectl get secret keycloak-initial-admin -n nvidia-platform -o jsonpath='{.data.password}' | base64 -d
- The default Grafana instance URL is at: https://dashboards.my-cluster.my-domain.com
- The default Grafana credentials can be found within the nvidia-monitoring namespace, in a secret called grafana-admin-credentials via the following command:
1kubectl get secret grafana-admin-credentials -n nvidia-monitoring -o jsonpath='{.data.GF_SECURITY_ADMIN_USER}' | base64 -d 2kubectl get secret grafana-admin-credentials -n nvidia-monitoring -o jsonpath='{.data.GF_SECURITY_ADMIN_PASSWORD}' | base64 -d
You can configure the components and services installed on the cluster as required for your use case. Specific examples can be found in the NVIDIA AI Workflow Guides.