Abstract

As part of the NVIDIA DGX™ platform, NVIDIA DGX BasePOD™ provides on-premises infrastructure for artificial intelligence (AI) workloads. This infrastructure is an excellent fit for stable use cases and resource requirements.

However, demands can sometimes outstrip resource availability or users might need access to different resources than those provided by their DGX infrastructure.

Managing a separate pool of resources to support changing requirements typically involves the development of significant expertise in cloud management tools and interfaces. A separate pool of resources often requires user education to request the appropriate system or environment—leading to suboptimal resource utilization and user confusion.

Those scenarios are now resolved through the capabilities of NVIDIA Base Command™ Manager (BCM) software. Administrators can now integrate on-demand public cloud resources directly with an on-premises DGX BasePOD private cloud environment and make the combined resources available transparently in a multi-cloud architecture.

This document describes how to extend DGX BasePOD with additional NVIDIA GPUs from Amazon Web Services (AWS) and manage the entire infrastructure from a consolidated user interface. Given the breadth of instances offered by AWS for both general-purpose and accelerated computing with NVIDIA GPUs, it is a great option for use as the basis of cloud resource integration in BCM.

Providing concordant access to on-premises and public cloud resources through existing infrastructure drastically simplifies both the administrator and user experience and makes using the right tool for any job easy.