Deploying Cloud Native Service Add-On Pack#

Amazon EKS

  1. The following steps assume that the Requirements section has been met, and an Amazon EKS cluster has already been set up from the steps in the previous section.

  2. Ensure that a kubeconfig is available and set via the KUBECONFIG environment variable or your user’s default location (.kube/config).

  3. Ensure that a FQDN and wildcard DNS entry are available and resolvable for the K8S cluster that was created.

  4. Download the NVIDIA Cloud Native Service Add-on Pack from the Enterprise Catalog onto the instance you have provisioned from here.

    ngc registry resource download-version "nvaie/nvidia_cnpack:0.4.0"
    

    Note

    If you still need to install and set up the NGC CLI with your API Key, please do so by autoloading the resource. Instructions can be found here.

  5. Navigate to the installer’s directory using the following command:

    cd nvidia_cnpack_v*
    
  6. Create a config file for the installation using the following template. The following represents a minimal config file. For full details on all the available configuration options, please reference the Advanced Usage section of the Appendix.

    Note

    Make sure to change the wildcardDomain field to match the DNS FQDN and wildcard record created as described in the Requirements section.

    cat > config.yaml <<EOF
    apiVersion: v1alpha1
    kind: NvidiaPlatform
    spec:
      platform:
        wildcardDomain: "*.my-cluster.my-domain.com"
        externalPort: 443
        eks:
          region: us-west-2
      certManager:
        enabled: true
        awsPCA:
          enabled: true
          commonName: "<your common name used to enable AWS Private CA>"
          domainName: "<your commonName used to enable AWS Private CA>"
          arn: "<ARN of the AWS Private CA>"
      prometheus:
        enabled: true
        awsRemoteWrite:
          url: "<Remote write url for Amazon Managed Prometheus>"
          arn: "<IAM Role for Amazon managed Prometheus>"
      grafana:
        enabled: false
      keycloak:
        enabled: true
        databaseStorage:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1G
        storageClassName: gp2
        volumeMode: Filesystem
      postgres:
        enabled: true
      fluentbit:
        enabled: true
      elastic:
        enabled: true
      ingress:
        enabled: true
    
    EOF
    
  7. Make the installer executable via the following commands:

    chmod +x ./nvidia-cnpack_Linux_x86_64
    
  8. Run the following command on the instance to set up NVIDIA Cloud Native Service Add-on Pack:

    ./nvidia-cnpack_Linux_x86_64 create -f config.yaml
    
  9. Once the installation is complete, check that all the pods are healthy via the following command:

    kubectl get pods -A
    

    The output should look similar to the following:

    NAMESPACE           NAME                                                              READY   STATUS      RESTARTS      AGE
    kube-system         aws-node-hcn49                                                    1/1     Running     0             20d
    kube-system         coredns-769569fd5d-8pfsr                                          1/1     Running     0             20d
    kube-system         coredns-769569fd5d-cpf29                                          1/1     Running     0             20d
    kube-system         ebs-csi-controller-7c5f746989-9kjrj                               6/6     Running     0             20d
    kube-system         ebs-csi-controller-7c5f746989-fzzlw                               6/6     Running     0             20d
    kube-system         ebs-csi-node-f9bqp                                                3/3     Running     0             20d
    kube-system         kube-proxy-t8ttt                                                  1/1     Running     0             20d
    nvidia-monitoring   elastic-operator-0                                                1/1     Running     1 (14d ago)   14d
    nvidia-monitoring   grafana-deployment-6fdf95b986-8d2sh                               1/1     Running     0             14d
    nvidia-monitoring   nvidia-fluentbit-aws-for-fluent-bit-ljf7j                         1/1     Running     0             17d
    nvidia-monitoring   nvidia-grafana-grafana-operator-66d597fcdb-q88k7                  1/1     Running     0             17d
    nvidia-monitoring   nvidia-prometheus-kube-pro-operator-87cbfd57d-mlm6j               1/1     Running     0             17d
    nvidia-monitoring   prometheus-nvidia-prometheus-kube-pro-prometheus-0                2/2     Running     0             17d
    nvidia-platform     nvidia-certmanager-cert-manager-754dbf54cd-wnfmd                  1/1     Running     0             17d
    nvidia-platform     nvidia-certmanager-cert-manager-cainjector-68b7b69c6f-nrfpf       1/1     Running     0             17d
    nvidia-platform     nvidia-certmanager-cert-manager-webhook-557978b4fc-tsc69          1/1     Running     0             17d
    nvidia-platform     nvidia-ingress-kubernetes-ingress-j4zgh                           1/1     Running     0             17d
    nvidia-platform     nvidia-keycloak-0                                                 1/1     Running     1 (17d ago)   17d
    nvidia-platform     nvidia-keycloak-1                                                 1/1     Running     0             17d
    nvidia-platform     nvidia-keycloak-backup-hk5n-qnrgj                                 0/1     Completed   0             17d
    nvidia-platform     nvidia-keycloak-instance1-mrbl-0                                  4/4     Running     0             17d
    nvidia-platform     nvidia-keycloak-instance1-pt9t-0                                  4/4     Running     0             17d
    nvidia-platform     nvidia-keycloak-instance1-schr-0                                  4/4     Running     0             17d
    nvidia-platform     nvidia-keycloak-repo-host-0                                       2/2     Running     0             17d
    nvidia-platform     nvidia-platform-aws-privateca-issuer-55b676666d-h6nlw             1/1     Running     0             17d
    nvidia-platform     pgo-64cdcfff78-np8nb                                              1/1     Running     0             17d
    nvidia-platform     pgo-upgrade-6776d6894-gjcn9                                       1/1     Running     0             17d
    
  10. As a part of the installation, the installer will create nvidia-platform and nvidia-monitoring namespaces that contain most of the components and information required for interacting with the deployed services.