Skip to main content

Ctrl+K

NVIDIA Cloud Accelerator Documentation

NVIDIA Cloud Accelerator Documentation

Table of Contents

NVIDIA Inference Reference Architecture

Introduction
Why Adopting This Architecture is Essential
NVIDIA Open Models Across Modalities
Common Component Combinations
Component Layers
Data Flow Diagrams
Key Component Interactions
Component Interaction Matrix
Getting Started
Example Workload: Large MoE LLM Inference
Appendix

NVIDIA Inference Reference Architecture#

NVIDIA Inference Reference Architecture

Introduction
Why Adopting This Architecture is Essential
- Measuring the Value Proposition
NVIDIA Open Models Across Modalities
- Modular by Design
Common Component Combinations
- Architecture Overview
Component Layers
Data Flow Diagrams
Key Component Interactions
- Disaggregated LLM Serving
- Kubernetes Infrastructure Stack
Component Interaction Matrix
Getting Started
Example Workload: Large MoE LLM Inference
Appendix

next

Introduction

Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2024-2026, NVIDIA Corporation.

Last updated on Mar 12, 2026.