Deploying Cloud Native Service Add-On Pack#
NVIDIA Cloud Native Stack
This guide will walk through the deployment and setup steps of the NVIDIA Cloud Native Service Add-On Pack on an upstream Kubernetes deployment, such as the NVIDIA Cloud Native Stack. This is the simplest deployment configuration, where all components provided are installed directly on the cluster, with no integration to external services.
Note
The NVIDIA Cloud Native Stack deployment using upstream Kubernetes should be used for evaluation and development purposes only. It is not designed for production use.
The following steps assume that the Requirements section has been met, and the NVIDIA Cloud Native Stack K8S cluster has already been set up from the steps in the previous section.
Ensure that a kubeconfig is available and set via the KUBECONFIG environment variable or your user’s default location (.kube/config).
Ensure that a FQDN and wildcard DNS entry are available and resolvable for the K8S cluster that was created.
Download the NVIDIA Cloud Native Service Add-on Pack from the Enterprise Catalog onto the instance you have provisioned from here.
ngc registry resource download-version "nvaie/nvidia_cnpack:0.2.1"
Note
If you still need to install and set up the NGC CLI with your API Key, please do so by autoloading the resource. Instructions can be found here.
Navigate to the installer’s directory using the following command:
cd nvidia_cnpack_v0.2.1
Create a config file for the installation using the following template as a minimal config file. For full details on all the available configuration options, please reference the Advanced Usage section of the Appendix.
Note
Make sure to change the
wildcardDomain
field to match the DNS FQDN and wildcard record created as described in the Requirements section.1apiVersion: v1alpha1 2kind: NvidiaPlatform 3spec: 4 platform: 5 wildcardDomain: "*.my-cluster.my-domain.com" 6 externalPort: 443 7 ingress: 8 enabled: true 9 postgres: 10 enabled: true 11 certManager: 12 enabled: true 13 trustManager: 14 enabled: true 15 keycloak: 16 databaseStorage: 17 accessModes: 18 - ReadWriteOnce 19 resources: 20 requests: 21 storage: 1G 22 storageClassName: local-path 23 volumeMode: Filesystem 24 prometheus: 25 storage: 26 accessModes: 27 - ReadWriteOnce 28 resources: 29 requests: 30 storage: 1G 31 storageClassName: local-path 32 volumeMode: Filesystem 33 grafana: 34 enabled: true 35 elastic: 36 enabled: true
Note
If you installed local-path-provisioner, the storageClassName can be left as shown: local-path
Make the installer executable via the following commands:
chmod +x ./nvidia-cnpack-linux-x86_64
Run the following command on the instance to set up NVIDIA Cloud Native Service Add-on Pack:
./nvidia-cnpack-linux-x86_64 create -f config.yaml
Once the install is complete, check that all the pods are healthy via the following command:
kubectl get pods -A
The output should look similar to the screenshot below:
As a part of the installation, the installer will create nvidia-platform and nvidia-monitoring namespaces that contain most of the components and information required for interacting with the deployed services.
The default Keycloak instance URL is at: https://auth.my-cluster.my-domain.com
Default admin credentials can be found within the nvidia-platform namespace, in a secret called keycloak-initial-admin via the following commands:
1kubectl get secret keycloak-initial-admin -n nvidia-platform -o jsonpath='{.data.username}' | base64 -d 2kubectl get secret keycloak-initial-admin -n nvidia-platform -o jsonpath='{.data.password}' | base64 -d
The default Grafana instance URL is at: https://dashboards.my-cluster.my-domain.com
The default Grafana credentials can be found within the nvidia-monitoring namespace, in a secret called grafana-admin-credentials via the following command:
1kubectl get secret grafana-admin-credentials -n nvidia-monitoring -o jsonpath='{.data.GF_SECURITY_ADMIN_USER}' | base64 -d 2kubectl get secret grafana-admin-credentials -n nvidia-monitoring -o jsonpath='{.data.GF_SECURITY_ADMIN_PASSWORD}' | base64 -d
You can configure the components and services installed on the cluster as required for your use case. Specific examples can be found in the NVIDIA AI Workflow Guides.