To deploy NVIDIA Cloud Native Service Add-on Pack on Amazon EKS, the following requirements must be met:
This section of the guide focuses on deploying the add-on pack on an NVIDIA AI Enterprise-supported Amazon Web Services EKS instance. See the next section for more information on how to deploy and set up an example EKS instance.
- GPU Operator
NVIDIA GPU Operator should be deployed on the K8S cluster to enable GPUs to be available for use within the cluster.
This guide assumes that the cluster will be externally accessible through ports 22 for SSH, and 443 for ingress. Additional ports may be required for your specific use case.
The K8S cluster requires a fully qualified domain name(FQDN) with a wildcard DNS entry that is resolvable within and outside of the network the cluster is located in.
A wildcard DNS A record must be created for the cluster in addition to the DNS A record for the cluster itself. Reverse lookup PTR records should also exist for both entries when possible. An example wildcard FDQN may look like the following:
*.my-cluster.my-domain.com.Make a note of this FQDN for later use.
An example of how to configure the domain and DNS for the cluster using Amazon Route 53 can be found in the Appendix.
A storage class must be available on the K8S cluster for the Cloud Native Service Add-on Pack to be configured to use. In this example, the GP2 storage class on Amazon EKS will be used. More information is provided in the next section.
- NVIDIA AI Enterprise
Since NVIDIA AI Workflows are available on NVIDIA NGC for NVIDIA AI Enterprise software customers, you must have access to the following to pull down the resources which are required for the workflow: