NVIDIA DGX Cloud Run:ai Documentation
- 1. Overview
- 2. Cluster Onboarding Guide
- 2.1. Introduction
- 2.2. Admin Steps
- 2.3. User Steps
- 2.3.1. Creating an Environment
- 2.3.2. Creating a Data Source
- 2.3.3. Running Sample Workloads
- 2.3.4. Setting up the CLI
- 2.4. (Optional) Private Access
- 2.5. Conclusion
- 3. Cluster Administrator Guide
- 4. Cluster User Guide
- 5. Advanced Usage
- 5.1. Accessing the Run:ai CLI
- 5.2. Kubernetes Usage for Researchers
- 5.3. Advanced Kubernetes Usage for Admins
- 5.4. Security Restrictions for Kubernetes
- 5.5. Checking Your Storage Utilization
- 5.6. Retrieving the Kubernetes Configuration File via CLI
- 5.7. Configuring the Ingress/Egress CIDRs for the Cluster
- 6. Workload Examples
- 6.1. Interactive NeMo Workload Job
- 6.2. RAPIDS and Polars Workspace
- 6.3. Download Data From S3 in a Training Workload
- 6.4. Using Your Data From Google Cloud Storage
- 6.5. Using Your Data From Amazon S3
- 6.6. Running Visual Studio Code Inside a Workload
- 6.7. Using WandB with a Workspace
- 6.8. Distributed PyTorch Training Job
- 6.9. Using Zero Quota Projects for Lower Priority Workloads
- 6.10. End-to-end NeMo Framework Workflow
- 6.11. Using BioNeMo Framework for ESM-2nv Data Preprocessing and Model Training
- 7. Integration Examples
- 8. Troubleshooting
- 9. Limitations