> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/dsx/llms.txt.
> For full documentation content, see https://docs.nvidia.com/dsx/llms-full.txt.

# NVIDIA Software References for AI Platform

## Run:ai

AI teams compete for limited GPU resources. Training jobs run for hours
or days, leaving GPUs idle between runs. Data scientists wait in queues
while expensive GPUs sit underutilized. Organizations need a way to
maximize GPU utilization while ensuring fair resource allocation across
teams and projects.

[Run:ai](https://run-ai-docs.nvidia.com/) is an AI infrastructure orchestration platform designed to help
maximize utilization of computing resources, particularly GPUs, in AI
development environments. It operates as a layer on top of K8s,
providing specialized features for managing AI workloads and research
experiments. Run:ai is [installed as a K8s operator](https://docs.run.ai/v2.20/home/components/) on top of an existing
K8s infrastructure.

The platform's core functionality includes dynamic resource allocation,
where GPU resources can be fractionally shared or aggregated based on
workload demands. This allows organizations to optimize their hardware
utilization by automatically redistributing compute resources between
different teams and projects based on priority and demand. For example,
when one team's training job completes, those GPUs can be immediately
reallocated to another team's pending workload.

Run:ai also provides advanced queuing and scheduling mechanisms that
handle the complexity of managing multiple AI workloads across
distributed infrastructure. It includes features for experiment
management, helping data scientists track and manage their training
runs, and provides tools for monitoring resource usage, job progress,
and system performance. The platform integrates with common ML
frameworks and development tools, supporting both interactive
development sessions and production training jobs while maintaining
isolation and resource guarantees between different users and teams.

Run:ai integrates with common ML frameworks (PyTorch, TensorFlow) and
development tools (JupyterLab, VS Code), supporting both interactive
development sessions and production training jobs.

## NVCF

AI inference at scale presents unique challenges. Models must be
deployed across distributed GPU infrastructure, handle variable request
loads, and scale efficiently without over-provisioning expensive GPU
resources. Developers need to focus on model logic, not infrastructure
management.

[NVIDIA Cloud Functions (NVCF)](https://docs.nvidia.com/cloud-functions/user-guide/latest/cloud-function/overview.html) is a serverless inference platform that
enables deployment of AI models as scalable API endpoints. NVCF
abstracts GPU infrastructure management, allowing developers to deploy
containerized models and receive auto-scaling, load balancing, and GPU
orchestration automatically.

NVCF registers cluster backends that have been provisioned with K8s as
the orchestration layer. NVCF requires K8s to be installed on the
compute nodes.

NVCF uses a cluster agent installed on top of a compute node, which
already has K8s installed. The cluster agent serves two purposes:

1. Communicates with the NVCF API cloud service to register the node as
   an NVCF backend target.
2. Interacts with K8s to deploy AI workloads onto the GPU cluster as a
   target from NVCF orchestration.

This document does not require the use of NVCF, but NVCF is an NVIDIA
solution that simplifies the orchestration of AI workloads on the
cluster.

## Slurm

Slurm (Simple Linux Utility for Resource Management) is a widely adopted
workload manager for HPC and large-scale AI training workloads. While
not a cloud-native solution, Slurm remains a key technology for
dedicated training jobs due to its mature job queuing, prioritization,
and resource scheduling capabilities. It is the backbone of many
supercomputers and research clusters.

For NCPs serving customers with large-scale training workloads, Slurm
provides a proven, single-tenant deployment model that maximizes GPU
utilization for long-running distributed training jobs.

NCPs can deploy either the open-source version of Slurm or [NVIDIA's BCM
Slurm](https://docs.nvidia.com/mission-control/docs/systems-administration-guide/2.0.0/slurm-workload-management.html), which is optimized for NVIDIA GPU infrastructure. BCM Slurm
includes:

1. Pre-configured integration with NVIDIA GPUs and high-speed
   interconnects.
2. Support for multi-node NVLink and InfiniBand fabrics.
3. GPU-aware scheduling and resource allocation.
4. Integration with NVIDIA software stack (drivers, NCCL, cuDNN).

## NeMo

NVIDIA NeMo™ is a comprehensive software suite for building, monitoring,
and optimizing AI agents across their entire lifecycle. Unlike point
solutions that address only model training or inference, NeMo provides
an integrated platform spanning data preparation through production
optimization. Refer to the [NeMo Framework](https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html) documentation for more information.

NCPs can offer NeMo as part of an AI development platform, enabling
tenants to:

* Prepare enterprise data – Clean, filter, and curate multimodal
  datasets from tenant sources
* Customize foundation models – Fine-tune and align models with
  domain-specific knowledge
* Build RAG pipelines – Ground AI responses in tenant knowledge bases
  and documents
* Enforce guardrails – Apply safety, compliance, and content policies to
  AI outputs
* Continuously improve – Evaluate agent performance and apply
  reinforcement learning

NeMo components are available as containers and can be deployed on
Kubernetes or bare metal infrastructure. NCPs can integrate NeMo into
their AI platform offerings, providing tenants with self-service access
to model customization and agent development capabilities.