NVIDIA Cloud Native Stack
To deploy NVIDIA Cloud Native Service Add-on Pack, the following requirements must be met:
- Kubernetes
This section of the guide focuses on deploying the add-on pack on an NVIDIA AI Enterprise-supported Cloud Native Stack instance. See the next section for more information on how to deploy and set up an example NVIDIA Cloud Native Stack instance.
- GPU Operator
NVIDIA GPU Operator should be deployed on the K8S cluster to enable GPUs to be available for use within the cluster. NVIDIA Cloud Native Stack will deploy and set up the GPU Operator as a part of the installation.
- Networking
Ports
This guide assumes that the cluster will be externally accessible through port 443 for ingress. Additional ports may be required for your specific use case.
DNS/Domain Name
The K8S cluster requires a fully qualified domain name(FQDN) with a wildcard DNS entry that is resolvable within and outside of the network the cluster is located in.
A wildcard DNS A record must be created for the cluster in addition to the DNS A record for the cluster itself. Reverse lookup PTR records should also exist for both entries when possible. An example wildcard FDQN may look like the following:
*.my-cluster.my-domain.com.
Make a note of this FQDN for later use.An example of how to configure the domain and DNS for the cluster using Amazon Route 53 can be found in the Appendix.
- Storage
A storage class must be available on the K8S cluster for the Cloud Native Service Add-on Pack to be configured to use. For this example, Local Path Provisioner will be used. Instructions are provided in the next section.
- NVIDIA AI Enterprise
NGC CLI
Since NVIDIA AI Workflows are available on NVIDIA NGC for NVIDIA AI Enterprise software customers, you must have access to the following to pull down the resources which are required for the workflow: