NVIDIA Run:ai on DGX Cloud Documentation#
Run:ai on DGX Cloud Manual
- 1. Product Overview
- 2. Cluster Onboarding Guide
- 2.1. Introduction
- 2.2. Admin Steps
- 2.2.1. Overview of Personas in Run:ai on DGX Cloud
- 2.2.2. Accessing the Run:ai on DGX Cloud Cluster
- 2.2.3. Setting up the NVIDIA Run:ai and Kubernetes CLI
- 2.2.4. Creating a Department in NVIDIA Run:ai
- 2.2.5. Creating a Project in NVIDIA Run:ai
- 2.2.6. Assigning User Roles in NVIDIA Run:ai
- 2.2.7. Creating an Environment
- 2.2.8. Creating a Data Source
- 2.3. User Steps
- 2.4. Conclusion
- 3. Cluster Administrator Guide
- 4. Cluster User Guide
- 5. CLI/API Setup Guide
- 6. Storage User Guide
Tutorials
- 1. Data Download Examples
- 2. Interactive Workload Examples
- 2.1. Interactive NeMo Workload Job
- 2.2. RAPIDS and Polars Workspace
- 2.3. Running Visual Studio Code Server
- 2.4. Building and Running an Interactive PyTorch Job with Visual Studio Code
- 2.4.1. Prerequisites and Requirements
- 2.4.2. Creating the Training Script
- 2.4.3. Creating and Using a Custom Docker Container
- 2.4.4. Run the Custom Container on Run:ai on DGX Cloud Using the NVIDIA Run:ai CLI
- 2.4.5. Connect to the Running Workload Locally Using Visual Studio (via SSH)
- 2.4.6. Cleaning up the Environment
- 2.5. Using WandB with a Workspace
- 2.6. Using BioNeMo Framework for ESM-2nv Data Preprocessing and Model Training
- 3. Distributed Training Workload Examples
- 4. NeMo End-to-End Workflow Example
- 5. Inference Examples
- 6. MLOps Integration Examples