Build AI factories that scale. Turn your data center into a high-performance AI factory with NVIDIA Enterprise Reference Architectures.
This whitepaper introduces NVIDIA Enterprise Reference Architectures (Enterprise RAs), which provide recommendations for building AI Factories for enterprise-class deployments, ranging from 32 to 256 GPUs. These architectures aim to simplify the deployment of AI infrastructure, reduce complexity, and accelerate time to value.
The NVIDIA RTX PRO AI Factory supports a range of enterprise workloads, including agentic AI inference, physical and industrial AI, visual computing, and high-performance computing for data analytics and simulation. This document outlines the hardware components that define this scalable and modular architecture. This includes guidance regarding the SU design and specifics of Ethernet fabric topologies.
The NVIDIA HGX AI Factory supports a range of enterprise workloads, including AI inference, AI training and fine‑tuning, and large‑scale GPU‑accelerated data analytics. It outlines the hardware components that define a scalable, modular architecture, including SU‑based design guidance and the specifics of the underlying network fabric topologies used to interconnect the cluster.
Presents the necessary components, including integrations from our ecosystem partners, automation tools, and deployment strategies. This design can be used by our enterprise partners for integrating accelerated computing, high-performance networking, and AI software for successfully building single tenant enterprise ready AI factories.
Provides an example infrastructure stack build that is geared towards OEMs and NVIDIA partners who intend to build systems that are ready for single-tenant production-grade AI workloads. While hardware components of the infrastructure stack can be modular, the software components of the infrastructure stack are consistent for various workloads, e.g. Inference, Finetuning, & Retrieval Augmented Generation.
This paper helps guide enterprises on how to pack more Inference models on a given set of NVIDIA GPUs using NVIDIA Run:ai, through intelligent scheduling, fractional GPUs, and dynamic resource management. We also explore the impact on performance with the Run:ai scheduler on utilizing fractional GPUs for NIM LLMs.
In this paper, we look at the NVIDIA AI-Q Research Agent blueprint, an agentic system that can generate detailed reports based on both internal and external data. We walk through how to deploy, how to scale and provide sizing guidance.
Coming Soon.
Coming Soon.