Requirements#
NVIDIA Cloud Native Stack
To deploy NVIDIA Cloud Native Service Add-on Pack, the following requirements must be met:
- Kubernetes
This section of the guide focuses on deploying the add-on pack on an NVIDIA AI Enterprise-supported Cloud Native Stack instance. See the next section for more information on how to deploy and set up an example NVIDIA Cloud Native Stack instance.
Note
The NVIDIA Cloud Native Stack deployment using upstream Kubernetes should be used for evaluation and development purposes only. It is not designed for production use.
- GPU Operator
NVIDIA GPU Operator should be deployed on the K8S cluster to enable GPUs to be available for use within the cluster. NVIDIA Cloud Native Stack will deploy and set up the GPU Operator as a part of the installation.
- Networking
Ports
This guide assumes that the cluster will be externally accessible through port 443 for ingress. Additional ports may be required for your specific use case.
DNS/Domain Name
The K8S cluster requires a fully qualified domain name(FQDN) with a wildcard DNS entry that is resolvable within and outside of the network the cluster is located in.
A wildcard DNS A record must be created for the cluster in addition to the DNS A record for the cluster itself. Reverse lookup PTR records should also exist for both entries when possible. An example wildcard FDQN may look like the following:
*.my-cluster.my-domain.com.
Make a note of this FQDN for later use.An example of how to configure the domain and DNS for the cluster using Amazon Route 53 can be found in the Appendix.
Note
If the DNS entries are only resolvable within a local network, such as within a corporate domain, and not directly resolvable by the cluster, a manual reverse lookup entry can be made in /etc/hosts on the cluster for the cluster IP to point to various required ingress hostnames as a workaround. An example hosts file is provided below.
127.0.0.1 system.domain.com 127.0.0.1 auth.system.domain.com 127.0.0.1 dashboards.system.domain.com
More ingress rules may be required depending on the workflow.
Note
If the cluster contains multiple nodes, a load balancer must be created to balance requests across the cluster nodes. The DNS entries should point to the load balancer, not to the cluster nodes.
- Storage
A storage class must be available on the K8S cluster for the Cloud Native Service Add-on Pack to be configured to use. For this example, Local Path Provisioner will be used. Instructions are provided in the next section.
- NVIDIA AI Enterprise
Since NVIDIA AI Workflows are available on NVIDIA NGC for NVIDIA AI Enterprise software customers, you must have access to the following to pull down the resources which are required for the workflow:
NGC CLI
Note
NVIDIA AI Enterprise trial licenses are available for those who qualify.
Warning
NVIDIA AI Enterprise licensing is required for accessing AI Workflow resources.
Note
Cloud service providers may include licenses through on-demand NVIDIA AI Enterprise instances.