About Setup & Deployment#

The administration section provides comprehensive information for deployment, infrastructure management, monitoring, and scaling NeMo Curator. Use these resources to efficiently set up and maintain your NeMo Curator environment at any scale, from development workstations to production clusters.


Installation & Configuration#

Installation Guide

Install NeMo Curator with system requirements, package extras, and verification steps. Covers PyPI, source, and container installation methods.

Installation Guide
Configuration Guide

Configure NeMo Curator for deployment environments, storage access, credentials, and environment variables for operational management.

Configuration Guide

Deployment Options#

Kubernetes Deployment

Deploy NeMo Curator on Kubernetes clusters using Dask Operator, GPU Operator, and PVC storage. Includes setup, storage, cluster creation, module execution, and cleanup.

Running NeMo Curator on Kubernetes
Slurm Deployment

Run NeMo Curator on Slurm clusters with shared filesystems. Covers job scripts, Dask cluster setup, module execution, monitoring, and advanced Python-based job submission.

Deploy NeMo Curator on Slurm

Integration Options#

Spark

Integrate NeMo Curator with Apache Spark for distributed processing

Reading and Writing Datasets with NeMo Curator and Apache Spark