Deploying Cloud Native Service Add-On Pack
NVIDIA Cloud Native Stack
This guide will walk through the deployment and setup steps of the NVIDIA Cloud Native Service Add-On Pack on an upstream Kubernetes deployment, such as the NVIDIA Cloud Native Stack. This is the simplest deployment configuration, where all components provided are installed directly on the cluster, with no integration to external services.
The NVIDIA Cloud Native Stack deployment using upstream Kubernetes should be used for evaluation and development purposes only. It is not designed for production use.
The following steps assume that the Requirements section has been met, and the NVIDIA Cloud Native Stack K8S cluster has already been set up from the steps in the previous section.
Ensure that a kubeconfig is available and set via the KUBECONFIG environment variable or your user’s default location (.kube/config).
Ensure that a FQDN and wildcard DNS entry are available and resolvable for the K8S cluster that was created.
Download the NVIDIA Cloud Native Service Add-on Pack from the Enterprise Catalog onto the instance you have provisioned from here.
ngc registry resource download-version "nvaie/nvidia_cnpack:0.2.1"
NoteIf you still need to install and set up the NGC CLI with your API Key, please do so by autoloading the resource. Instructions can be found here.
Navigate to the installer’s directory using the following command:
cd nvidia_cnpack_v0.2.1
Create a config file for the installation using the following template as a minimal config file. For full details on all the available configuration options, please reference the Advanced Usage section of the Appendix.
NoteMake sure to change the
wildcardDomain
field to match the DNS FQDN and wildcard record created as described in the Requirements section.apiVersion: v1alpha1 kind: NvidiaPlatform spec: platform: wildcardDomain: "*.my-cluster.my-domain.com" externalPort: 443 ingress: enabled: true postgres: enabled: true certManager: enabled: true trustManager: enabled: true keycloak: databaseStorage: accessModes: - ReadWriteOnce resources: requests: storage: 1G storageClassName: local-path volumeMode: Filesystem prometheus: storage: accessModes: - ReadWriteOnce resources: requests: storage: 1G storageClassName: local-path volumeMode: Filesystem grafana: enabled: true elastic: enabled: true
NoteIf you installed local-path-provisioner, the storageClassName can be left as shown: local-path
Make the installer executable via the following commands:
chmod +x ./nvidia-cnpack-linux-x86_64
Run the following command on the instance to set up NVIDIA Cloud Native Service Add-on Pack:
./nvidia-cnpack-linux-x86_64 create -f config.yaml
Once the install is complete, check that all the pods are healthy via the following command:
kubectl get pods -A
The output should look similar to the screenshot below:
As a part of the installation, the installer will create nvidia-platform and nvidia-monitoring namespaces that contain most of the components and information required for interacting with the deployed services.
The default Keycloak instance URL is at: https://auth.my-cluster.my-domain.com
Default admin credentials can be found within the nvidia-platform namespace, in a secret called keycloak-initial-admin via the following commands:
kubectl get secret keycloak-initial-admin -n nvidia-platform -o jsonpath='{.data.username}' | base64 -d kubectl get secret keycloak-initial-admin -n nvidia-platform -o jsonpath='{.data.password}' | base64 -d
The default Grafana instance URL is at: https://dashboards.my-cluster.my-domain.com
The default Grafana credentials can be found within the nvidia-monitoring namespace, in a secret called grafana-admin-credentials via the following command:
kubectl get secret grafana-admin-credentials -n nvidia-monitoring -o jsonpath='{.data.GF_SECURITY_ADMIN_USER}' | base64 -d kubectl get secret grafana-admin-credentials -n nvidia-monitoring -o jsonpath='{.data.GF_SECURITY_ADMIN_PASSWORD}' | base64 -d
You can configure the components and services installed on the cluster as required for your use case. Specific examples can be found in the NVIDIA AI Workflow Guides.