Deploying Cloud Native Service Add-On Pack
Amazon EKS
The following steps assume that the Requirements section has been met, and an Amazon EKS cluster has already been set up from the steps in the previous section.
Ensure that a kubeconfig is available and set via the KUBECONFIG environment variable or your user’s default location (.kube/config).
Ensure that a FQDN and wildcard DNS entry are available and resolvable for the K8S cluster that was created.
Download the NVIDIA Cloud Native Service Add-on Pack from the Enterprise Catalog onto the instance you have provisioned from here.
ngc registry resource download-version "nvaie/nvidia_cnpack:0.4.0"
NoteIf you still need to install and set up the NGC CLI with your API Key, please do so by autoloading the resource. Instructions can be found here.
Navigate to the installer’s directory using the following command:
cd nvidia_cnpack_v*
Create a config file for the installation using the following template. The following represents a minimal config file. For full details on all the available configuration options, please reference the Advanced Usage section of the Appendix.
NoteMake sure to change the
wildcardDomain
field to match the DNS FQDN and wildcard record created as described in the Requirements section.
cat > config.yaml <<EOF apiVersion: v1alpha1 kind: NvidiaPlatform spec: platform: wildcardDomain: "*.my-cluster.my-domain.com" externalPort: 443 eks: region: us-west-2 certManager: enabled: true awsPCA: enabled: true commonName: "<your common name used to enable AWS Private CA>" domainName: "<your commonName used to enable AWS Private CA>" arn: "<ARN of the AWS Private CA>" prometheus: enabled: true awsRemoteWrite: url: "<Remote write url for Amazon Managed Prometheus>" arn: "<IAM Role for Amazon managed Prometheus>" grafana: enabled: false keycloak: enabled: true databaseStorage: accessModes: - ReadWriteOnce resources: requests: storage: 1G storageClassName: gp2 volumeMode: Filesystem postgres: enabled: true fluentbit: enabled: true elastic: enabled: true ingress: enabled: true EOF
Make the installer executable via the following commands:
chmod +x ./nvidia-cnpack_Linux_x86_64
Run the following command on the instance to set up NVIDIA Cloud Native Service Add-on Pack:
./nvidia-cnpack_Linux_x86_64 create -f config.yaml
Once the installation is complete, check that all the pods are healthy via the following command:
kubectl get pods -A
The output should look similar to the following:
NAMESPACE NAME READY STATUS RESTARTS AGE kube-system aws-node-hcn49 1/1 Running 0 20d kube-system coredns-769569fd5d-8pfsr 1/1 Running 0 20d kube-system coredns-769569fd5d-cpf29 1/1 Running 0 20d kube-system ebs-csi-controller-7c5f746989-9kjrj 6/6 Running 0 20d kube-system ebs-csi-controller-7c5f746989-fzzlw 6/6 Running 0 20d kube-system ebs-csi-node-f9bqp 3/3 Running 0 20d kube-system kube-proxy-t8ttt 1/1 Running 0 20d nvidia-monitoring elastic-operator-0 1/1 Running 1 (14d ago) 14d nvidia-monitoring grafana-deployment-6fdf95b986-8d2sh 1/1 Running 0 14d nvidia-monitoring nvidia-fluentbit-aws-for-fluent-bit-ljf7j 1/1 Running 0 17d nvidia-monitoring nvidia-grafana-grafana-operator-66d597fcdb-q88k7 1/1 Running 0 17d nvidia-monitoring nvidia-prometheus-kube-pro-operator-87cbfd57d-mlm6j 1/1 Running 0 17d nvidia-monitoring prometheus-nvidia-prometheus-kube-pro-prometheus-0 2/2 Running 0 17d nvidia-platform nvidia-certmanager-cert-manager-754dbf54cd-wnfmd 1/1 Running 0 17d nvidia-platform nvidia-certmanager-cert-manager-cainjector-68b7b69c6f-nrfpf 1/1 Running 0 17d nvidia-platform nvidia-certmanager-cert-manager-webhook-557978b4fc-tsc69 1/1 Running 0 17d nvidia-platform nvidia-ingress-kubernetes-ingress-j4zgh 1/1 Running 0 17d nvidia-platform nvidia-keycloak-0 1/1 Running 1 (17d ago) 17d nvidia-platform nvidia-keycloak-1 1/1 Running 0 17d nvidia-platform nvidia-keycloak-backup-hk5n-qnrgj 0/1 Completed 0 17d nvidia-platform nvidia-keycloak-instance1-mrbl-0 4/4 Running 0 17d nvidia-platform nvidia-keycloak-instance1-pt9t-0 4/4 Running 0 17d nvidia-platform nvidia-keycloak-instance1-schr-0 4/4 Running 0 17d nvidia-platform nvidia-keycloak-repo-host-0 2/2 Running 0 17d nvidia-platform nvidia-platform-aws-privateca-issuer-55b676666d-h6nlw 1/1 Running 0 17d nvidia-platform pgo-64cdcfff78-np8nb 1/1 Running 0 17d nvidia-platform pgo-upgrade-6776d6894-gjcn9 1/1 Running 0 17d
As a part of the installation, the installer will create
nvidia-platform
andnvidia-monitoring
namespaces that contain most of the components and information required for interacting with the deployed services.