NVIDIA Enterprise Reference Architectures

Build AI factories that scale. Turn your data center into a high-performance AI factory with NVIDIA Enterprise Reference Architectures.

This whitepaper introduces NVIDIA Enterprise Reference Architectures (Enterprise RAs), which provide recommendations for building AI Factories for enterprise-class deployments, ranging from 32 to 256 GPUs. These architectures aim to simplify the deployment of AI infrastructure, reduce complexity, and accelerate time to value.
Provides a standardized framework for deploying vanilla Kubernetes for AI inference and machine learning, optimized for scalable AI workloads in enterprise environments.
Presents the necessary components, including integrations from our ecosystem partners, automation tools, and deployment strategies. This design can be used by our enterprise partners for integrating accelerated computing, high-performance networking, and AI software for successfully building single tenant enterprise ready AI factories.
Provides an example infrastructure stack build that is geared towards OEMs and NVIDIA partners who intend to build systems that are ready for single-tenant production-grade AI workloads. While hardware components of the infrastructure stack can be modular, the software components of the infrastructure stack are consistent for various workloads, e.g. Inference, Finetuning, & Retrieval Augmented Generation.
Offers a standardized and production-ready reference for implementing observability in enterprise AI or HPC environments. Built on top of NVIDIA’s AI infrastructure and Kubernetes-native platforms, this version of the guide specifically focuses on establishing advanced custom dashboard solutions for AI factories, providing administrators and enterprise customers with actionable insights into GPU, CPU, Kubernetes and applications.
Provides a clear, step-by-step procedure for installing NVIDIA Base Command Manager (BCM) on bare-metal cluster hardware. The instructions focus on concise, practical steps with minimal explanations so that a moderately experienced cluster administrator can get a cluster up and running in a standard configuration as quickly as possible.