Enterprise Cloud Native Platform#

The Enterprise Cloud Native Platform, with Kubernetes at its core, provides agility, scalability, and resilience for an Enterprise AI Factory focused on developing and deploying sophisticated AI agents. Kubernetes embodies cloud-native principles by orchestrating containers (like those from NVIDIA AI Enterprise), managing microservice-based agent architectures, and enabling dynamic automation. This includes automated deployment of new agent versions, scaling based on demand (important for both training and inference on NVIDIA-Certified Systems), self-healing to support high availability, and resource management, particularly for GPU resources.

These cloud-native capabilities are relevant for an AI Factory. The ability to independently develop, update, and scale microservice-based agents, coupled with automated CI/CD pipelines managed via Kubernetes, allows for iteration and deployment. Kubernetes handles the significant and often burstable compute demands for training AI models and scales inference services for deployed agents based on real-time needs. This automation and resource packing on NVIDIA-Certified Systems also contribute to reducing operational burden and optimizing costs, which is a consideration when dealing with complex AI/ML environments, especially those involving GPUs.

In this context, Kubernetes functions as a foundational platform for the AI Factory. It unifies the management of a complex stack—including NVIDIA Operators, AI software suites like NVIDIA AI Enterprise, storage, networking, and observability tools—onto a single platform. Enterprise Kubernetes distributions and validated architectures can further simplify this by providing secure, supported, and pre-integrated environments. This orchestration supports the process of efficiently building, deploying, and managing a diverse and evolving suite of AI agents on high-performance infrastructure.