Deployment Strategies#

To ensure operational efficiency, the AI Factory emphasizes automation. NVIDIA IT and leading enterprises we’ve worked with have verified and utilized a combination of tools—Infrastructure as Code (IaC), Helm, and Ansible Playbooks—for software installation.

IaC is employed for repeatable and consistent environment provisioning.

Ansible playbooks automate the installation of the specified software stack (NVIDIA AI Enterprise, Kubernetes Operators, Partner Integrations) onto pre-provisioned AI Factory hardware and OS.

Helm charts deploy AI workloads (NIMs) and platform services.

Certified Kubernetes Operators are used whenever possible

Application and platform configurations are centralized using Helm value files and secure secret management practices, managed via IaC and JFrog Artifactory. Initial configuration parameters can be supplied to the provided Ansible playbooks.

Note

GitOps practices are strongly recommended as part of the approach for managing application configurations beyond the initial installation.

A logical separation of Kubernetes Namespaces, Resource Quotas, and Network Policies within OpenShift is employed to create distinct environments (Development, Testing, Production). This deployment strategy also utilizes standardized Kubernetes patterns, including Operators; NVIDIA Kubernetes Operators (GPU and NIM Operator) as well as Partner Operators.