Cloud Native Service Add-on Pack Deployment Guide
Cloud Native Service Add-on Pack Deployment Guide (Latest Version)

Deploying Cloud Native Service Add-On Pack

Amazon EKS

  1. The following steps assume that the Requirements section has been met, and an Amazon EKS cluster has already been set up from the steps in the previous section.

  2. Ensure that a kubeconfig is available and set via the KUBECONFIG environment variable or your user’s default location (.kube/config).

  3. Ensure that a FQDN and wildcard DNS entry are available and resolvable for the K8S cluster that was created.

  4. Download the NVIDIA Cloud Native Service Add-on Pack from the Enterprise Catalog onto the instance you have provisioned from here.

    Copy
    Copied!
                

    ngc registry resource download-version "nvaie/nvidia_cnpack:0.4.0"

    Note

    If you still need to install and set up the NGC CLI with your API Key, please do so by autoloading the resource. Instructions can be found here.


  5. Navigate to the installer’s directory using the following command:

    Copy
    Copied!
                

    cd nvidia_cnpack_v*

  6. Create a config file for the installation using the following template. The following represents a minimal config file. For full details on all the available configuration options, please reference the Advanced Usage section of the Appendix.

    Note

    Make sure to change the wildcardDomain field to match the DNS FQDN and wildcard record created as described in the Requirements section.


    Copy
    Copied!
                

    cat > config.yaml <<EOF apiVersion: v1alpha1 kind: NvidiaPlatform spec: platform: wildcardDomain: "*.my-cluster.my-domain.com" externalPort: 443 eks: region: us-west-2 certManager: enabled: true awsPCA: enabled: true commonName: "<your common name used to enable AWS Private CA>" domainName: "<your commonName used to enable AWS Private CA>" arn: "<ARN of the AWS Private CA>" prometheus: enabled: true awsRemoteWrite: url: "<Remote write url for Amazon Managed Prometheus>" arn: "<IAM Role for Amazon managed Prometheus>" grafana: enabled: false keycloak: enabled: true databaseStorage: accessModes: - ReadWriteOnce resources: requests: storage: 1G storageClassName: gp2 volumeMode: Filesystem postgres: enabled: true fluentbit: enabled: true elastic: enabled: true ingress: enabled: true EOF

  7. Make the installer executable via the following commands:

    Copy
    Copied!
                

    chmod +x ./nvidia-cnpack_Linux_x86_64

  8. Run the following command on the instance to set up NVIDIA Cloud Native Service Add-on Pack:

    Copy
    Copied!
                

    ./nvidia-cnpack_Linux_x86_64 create -f config.yaml

  9. Once the installation is complete, check that all the pods are healthy via the following command:

    Copy
    Copied!
                

    kubectl get pods -A

    The output should look similar to the following:

    Copy
    Copied!
                

    NAMESPACE NAME READY STATUS RESTARTS AGE kube-system aws-node-hcn49 1/1 Running 0 20d kube-system coredns-769569fd5d-8pfsr 1/1 Running 0 20d kube-system coredns-769569fd5d-cpf29 1/1 Running 0 20d kube-system ebs-csi-controller-7c5f746989-9kjrj 6/6 Running 0 20d kube-system ebs-csi-controller-7c5f746989-fzzlw 6/6 Running 0 20d kube-system ebs-csi-node-f9bqp 3/3 Running 0 20d kube-system kube-proxy-t8ttt 1/1 Running 0 20d nvidia-monitoring elastic-operator-0 1/1 Running 1 (14d ago) 14d nvidia-monitoring grafana-deployment-6fdf95b986-8d2sh 1/1 Running 0 14d nvidia-monitoring nvidia-fluentbit-aws-for-fluent-bit-ljf7j 1/1 Running 0 17d nvidia-monitoring nvidia-grafana-grafana-operator-66d597fcdb-q88k7 1/1 Running 0 17d nvidia-monitoring nvidia-prometheus-kube-pro-operator-87cbfd57d-mlm6j 1/1 Running 0 17d nvidia-monitoring prometheus-nvidia-prometheus-kube-pro-prometheus-0 2/2 Running 0 17d nvidia-platform nvidia-certmanager-cert-manager-754dbf54cd-wnfmd 1/1 Running 0 17d nvidia-platform nvidia-certmanager-cert-manager-cainjector-68b7b69c6f-nrfpf 1/1 Running 0 17d nvidia-platform nvidia-certmanager-cert-manager-webhook-557978b4fc-tsc69 1/1 Running 0 17d nvidia-platform nvidia-ingress-kubernetes-ingress-j4zgh 1/1 Running 0 17d nvidia-platform nvidia-keycloak-0 1/1 Running 1 (17d ago) 17d nvidia-platform nvidia-keycloak-1 1/1 Running 0 17d nvidia-platform nvidia-keycloak-backup-hk5n-qnrgj 0/1 Completed 0 17d nvidia-platform nvidia-keycloak-instance1-mrbl-0 4/4 Running 0 17d nvidia-platform nvidia-keycloak-instance1-pt9t-0 4/4 Running 0 17d nvidia-platform nvidia-keycloak-instance1-schr-0 4/4 Running 0 17d nvidia-platform nvidia-keycloak-repo-host-0 2/2 Running 0 17d nvidia-platform nvidia-platform-aws-privateca-issuer-55b676666d-h6nlw 1/1 Running 0 17d nvidia-platform pgo-64cdcfff78-np8nb 1/1 Running 0 17d nvidia-platform pgo-upgrade-6776d6894-gjcn9 1/1 Running 0 17d

  10. As a part of the installation, the installer will create nvidia-platform and nvidia-monitoring namespaces that contain most of the components and information required for interacting with the deployed services.

© Copyright 2022-2023, NVIDIA. Last updated on May 23, 2023.