Deploying Cloud Native Service Add-On Pack#

Amazon EKS

The following steps assume that the Requirements section has been met, and an Amazon EKS cluster has already been set up from the steps in the previous section.
Ensure that a kubeconfig is available and set via the KUBECONFIG environment variable or your user’s default location (.kube/config).
Ensure that a FQDN and wildcard DNS entry are available and resolvable for the K8S cluster that was created.
Download the NVIDIA Cloud Native Service Add-on Pack from the Enterprise Catalog onto the instance you have provisioned from here.
ngc registry resource download-version "nvaie/nvidia_cnpack:0.4.0"
Note

If you still need to install and set up the NGC CLI with your API Key, please do so by autoloading the resource. Instructions can be found here.
Navigate to the installer’s directory using the following command:
```
cd nvidia_cnpack_v*
```

Create a config file for the installation using the following template. The following represents a minimal config file. For full details on all the available configuration options, please reference the Advanced Usage section of the Appendix.

Note

Make sure to change the wildcardDomain field to match the DNS FQDN and wildcard record created as described in the Requirements section.

cat > config.yaml <<EOF
apiVersion: v1alpha1
kind: NvidiaPlatform
spec:
  platform:
    wildcardDomain: "*.my-cluster.my-domain.com"
    externalPort: 443
    eks:
      region: us-west-2
  certManager:
    enabled: true
    awsPCA:
      enabled: true
      commonName: "<your common name used to enable AWS Private CA>"
      domainName: "<your commonName used to enable AWS Private CA>"
      arn: "<ARN of the AWS Private CA>"
  prometheus:
    enabled: true
    awsRemoteWrite:
      url: "<Remote write url for Amazon Managed Prometheus>"
      arn: "<IAM Role for Amazon managed Prometheus>"
  grafana:
    enabled: false
  keycloak:
    enabled: true
    databaseStorage:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 1G
    storageClassName: gp2
    volumeMode: Filesystem
  postgres:
    enabled: true
  fluentbit:
    enabled: true
  elastic:
    enabled: true
  ingress:
    enabled: true

EOF

Make the installer executable via the following commands:
```
chmod +x ./nvidia-cnpack_Linux_x86_64
```
Run the following command on the instance to set up NVIDIA Cloud Native Service Add-on Pack:
```
./nvidia-cnpack_Linux_x86_64 create -f config.yaml
```

Once the installation is complete, check that all the pods are healthy via the following command:

kubectl get pods -A

The output should look similar to the following:

NAMESPACE           NAME                                                              READY   STATUS      RESTARTS      AGE
kube-system         aws-node-hcn49                                                    1/1     Running     0             20d
kube-system         coredns-769569fd5d-8pfsr                                          1/1     Running     0             20d
kube-system         coredns-769569fd5d-cpf29                                          1/1     Running     0             20d
kube-system         ebs-csi-controller-7c5f746989-9kjrj                               6/6     Running     0             20d
kube-system         ebs-csi-controller-7c5f746989-fzzlw                               6/6     Running     0             20d
kube-system         ebs-csi-node-f9bqp                                                3/3     Running     0             20d
kube-system         kube-proxy-t8ttt                                                  1/1     Running     0             20d
nvidia-monitoring   elastic-operator-0                                                1/1     Running     1 (14d ago)   14d
nvidia-monitoring   grafana-deployment-6fdf95b986-8d2sh                               1/1     Running     0             14d
nvidia-monitoring   nvidia-fluentbit-aws-for-fluent-bit-ljf7j                         1/1     Running     0             17d
nvidia-monitoring   nvidia-grafana-grafana-operator-66d597fcdb-q88k7                  1/1     Running     0             17d
nvidia-monitoring   nvidia-prometheus-kube-pro-operator-87cbfd57d-mlm6j               1/1     Running     0             17d
nvidia-monitoring   prometheus-nvidia-prometheus-kube-pro-prometheus-0                2/2     Running     0             17d
nvidia-platform     nvidia-certmanager-cert-manager-754dbf54cd-wnfmd                  1/1     Running     0             17d
nvidia-platform     nvidia-certmanager-cert-manager-cainjector-68b7b69c6f-nrfpf       1/1     Running     0             17d
nvidia-platform     nvidia-certmanager-cert-manager-webhook-557978b4fc-tsc69          1/1     Running     0             17d
nvidia-platform     nvidia-ingress-kubernetes-ingress-j4zgh                           1/1     Running     0             17d
nvidia-platform     nvidia-keycloak-0                                                 1/1     Running     1 (17d ago)   17d
nvidia-platform     nvidia-keycloak-1                                                 1/1     Running     0             17d
nvidia-platform     nvidia-keycloak-backup-hk5n-qnrgj                                 0/1     Completed   0             17d
nvidia-platform     nvidia-keycloak-instance1-mrbl-0                                  4/4     Running     0             17d
nvidia-platform     nvidia-keycloak-instance1-pt9t-0                                  4/4     Running     0             17d
nvidia-platform     nvidia-keycloak-instance1-schr-0                                  4/4     Running     0             17d
nvidia-platform     nvidia-keycloak-repo-host-0                                       2/2     Running     0             17d
nvidia-platform     nvidia-platform-aws-privateca-issuer-55b676666d-h6nlw             1/1     Running     0             17d
nvidia-platform     pgo-64cdcfff78-np8nb                                              1/1     Running     0             17d
nvidia-platform     pgo-upgrade-6776d6894-gjcn9                                       1/1     Running     0             17d

As a part of the installation, the installer will create nvidia-platform and nvidia-monitoring namespaces that contain most of the components and information required for interacting with the deployed services.