Persistent Volumes#

The NeMo Platform uses persistent volume claims (PVCs) for jobs and files storage that can be mounted on multiple pods and nodes in read-write mode. This access mode is called ReadWriteMany (RWX) in Kubernetes. Using a ReadWriteMany-capable StorageClass is required for both jobs and files storage.

NVIDIA NIM microservices also scale, upgrade, and deploy more smoothly with an RWX-backed storage class.

The platform does not manage storage classes. You must install an appropriate storage provisioner (identified by a StorageClass) before installing the Helm chart.

Jobs and Files storage#

The NeMo Platform chart creates a single shared PVC for jobs and files storage, configured under core.storage in values.yaml.

Note

As an alternative to PVC-based file storage, you can configure the Files service to use S3 object storage. See File Storage for S3 configuration options. When using S3 for files, the shared PVC is still required for jobs storage.

Option 1: Create a new PVC (default)#

  1. Confirm the StorageClass exists in your cluster:

    kubectl get storageclass
    
  2. Set core.storage in your values.yaml. Use a ReadWriteMany-capable StorageClass (e.g. NFS, CephFS):

    core:
      storage:
        storageClass: "nfs"
        accessModes:
          - ReadWriteMany
        size: 200Gi
    

    If storageClass is empty (default), the cluster’s default StorageClass is used.

Option 2: Use an existing PersistentVolume#

To use a pre-created PersistentVolume instead of having the chart create a PVC, set:

core:
  storage:
    existingPersistentVolumeName: "my-existing-pv-name"

When set, the chart does not create a new PVC; pods mount the named volume.

NIM storage class#

For NIM deployments launched via the NeMo Platform, you can set the default StorageClass used by NIM PVCs via platform config. In values.yaml, under platformConfig:

platformConfig:
  models:
    controller:
      backends:
        nim_operator:
          config:
            default_storage_class: "nfs"

Replace "nfs" with your StorageClass name (e.g. oci-nfs, gp3). For NIM scaling and multi-node deployments, use a ReadWriteMany-capable StorageClass.

Refer to the platform configuration documentation for the full config reference.

Persistent volume options#

Use a ReadWriteMany-capable filesystem such as NFS or CephFS so the volume can be mounted read-write across nodes and pods.

For CSI drivers, see the Kubernetes CSI Drivers table. Choose a driver that supports Read/Write Multiple in the Supported Access Modes column and matches your environment.

AWS persistent volumes#

Amazon EFS#

Amazon Elastic File System (EFS) provides scalable, shared file storage for Kubernetes. To use EFS:

  1. Install the Amazon EFS CSI driver in your cluster.

  2. Create an EFS file system accessible from the cluster.

  3. Configure a StorageClass and use it in core.storage.storageClass (and optionally for NIM via platformConfig).

Amazon FSx for Lustre#

For high-performance workloads:

  1. Install the FSx for Lustre CSI driver.

  2. Create an FSx for Lustre file system.

  3. Configure storage classes and reference them in core.storage.storageClass or NIM config as needed.

Azure persistent volumes#

On AKS, Azure Disk (managed-csi) only supports ReadWriteOnce (RWO). Use Azure Files for the shared RWX volumes required by NMP. PostgreSQL must use managed-csi because Azure Files does not support the POSIX permissions it requires.

  1. Enable the Azure Files CSI driver if not already installed in your cluster.

  2. Set core.storage.storageClass to azurefile (or azurefile-csi on AKS 1.29+) and postgresql.persistence.storageClass to managed-csi.

Oracle persistent volumes#

On OKE with OCI File Storage, see Setting Up Storage for Kubernetes Clusters to make NFS-backed persistent volumes available, then set core.storage.storageClass (and NIM config if needed) accordingly.