NVIDIA Software References for AI Platform#

Run:ai#

AI teams compete for limited GPU resources. Training jobs run for hours or days, leaving GPUs idle between runs. Data scientists wait in queues while expensive GPUs sit underutilized. Organizations need a way to maximize GPU utilization while ensuring fair resource allocation across teams and projects.

Run:ai is an AI infrastructure orchestration platform designed to help maximize utilization of computing resources, particularly GPUs, in AI development environments. It operates as a layer on top of K8s, providing specialized features for managing AI workloads and research experiments. Run:ai is installed as a K8s operator on top of an existing K8s infrastructure.

The platform’s core functionality includes dynamic resource allocation, where GPU resources can be fractionally shared or aggregated based on workload demands. This allows organizations to optimize their hardware utilization by automatically redistributing compute resources between different teams and projects based on priority and demand. For example, when one team’s training job completes, those GPUs can be immediately reallocated to another team’s pending workload.

Run:ai also provides advanced queuing and scheduling mechanisms that handle the complexity of managing multiple AI workloads across distributed infrastructure. It includes features for experiment management, helping data scientists track and manage their training runs, and provides tools for monitoring resource usage, job progress, and system performance. The platform integrates with common ML frameworks and development tools, supporting both interactive development sessions and production training jobs while maintaining isolation and resource guarantees between different users and teams.

Run:ai integrates with common ML frameworks (PyTorch, TensorFlow) and development tools (JupyterLab, VS Code), supporting both interactive development sessions and production training jobs.

NVCF#

AI inference at scale presents unique challenges. Models must be deployed across distributed GPU infrastructure, handle variable request loads, and scale efficiently without over-provisioning expensive GPU resources. Developers need to focus on model logic, not infrastructure management.

NVIDIA Cloud Functions (NVCF) is a serverless inference platform that enables deployment of AI models as scalable API endpoints. NVCF abstracts GPU infrastructure management, allowing developers to deploy containerized models and receive auto-scaling, load balancing, and GPU orchestration automatically.

NVCF registers cluster backends that have been provisioned with K8s as the orchestration layer. NVCF requires K8s to be installed on the compute nodes.

NVCF uses a cluster agent installed on top of a compute node, which already has K8s installed. The cluster agent serves two purposes:

Communicates with the NVCF API cloud service to register the node as an NVCF backend target.
Interacts with K8s to deploy AI workloads onto the GPU cluster as a target from NVCF orchestration.

This document does not require the use of NVCF, but NVCF is an NVIDIA solution that simplifies the orchestration of AI workloads on the cluster.

Slurm#

Slurm (Simple Linux Utility for Resource Management) is a widely adopted workload manager for HPC and large-scale AI training workloads. While not a cloud-native solution, Slurm remains a key technology for dedicated training jobs due to its mature job queuing, prioritization, and resource scheduling capabilities. It is the backbone of many supercomputers and research clusters.

For NCPs serving customers with large-scale training workloads, Slurm provides a proven, single-tenant deployment model that maximizes GPU utilization for long-running distributed training jobs.

NCPs can deploy either the open-source version of Slurm or NVIDIA’s BCM Slurm, which is optimized for NVIDIA GPU infrastructure. BCM Slurm includes:

Pre-configured integration with NVIDIA GPUs and high-speed interconnects.
Support for multi-node NVLink and InfiniBand fabrics.
GPU-aware scheduling and resource allocation.
Integration with NVIDIA software stack (drivers, NCCL, cuDNN).

NeMo#

NVIDIA NeMo™ is a comprehensive software suite for building, monitoring, and optimizing AI agents across their entire lifecycle. Unlike point solutions that address only model training or inference, NeMo provides an integrated platform spanning data preparation through production optimization. Refer to the Nemo Framework documentation for more information.

NCPs can offer NeMo as part of an AI development platform, enabling tenants to:

Prepare enterprise data – Clean, filter, and curate multimodal datasets from tenant sources
Customize foundation models – Fine-tune and align models with domain-specific knowledge
Build RAG pipelines – Ground AI responses in tenant knowledge bases and documents
Enforce guardrails – Apply safety, compliance, and content policies to AI outputs
Continuously improve – Evaluate agent performance and apply reinforcement learning

NeMo components are available as containers and can be deployed on Kubernetes or bare metal infrastructure. NCPs can integrate NeMo into their AI platform offerings, providing tenants with self-service access to model customization and agent development capabilities.