> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.

> Comprehensive system, hardware, and software requirements for deploying NeMo Curator in production environments

# Production Deployment Requirements

This page details the comprehensive system, hardware, and software requirements for deploying NeMo Curator in production environments.

## System Requirements

* **Operating System**: Ubuntu 22.04/20.04 (recommended)
* **Python**: Python 3.10, 3.11, or 3.12
  * packaging >= 22.0

<Warning>
  **Python 3.10 support will be removed in NeMo Curator 26.06.** 26.04 is the last release to support Python 3.10. Standardize production environments on a newer supported Python version (3.11+) before upgrading to 26.06. See the [26.04 release notes](/about/release-notes) for details.
</Warning>

## Hardware Requirements

### CPU Requirements

* Multi-core CPU with sufficient cores for parallel processing
* **Memory**: Minimum 16GB RAM recommended for text processing
  * For large datasets: 32GB+ RAM recommended
  * Memory requirements scale with dataset size and number of workers

### GPU Requirements (Optional but Recommended)

* **GPU**: NVIDIA GPU with Volta™ architecture or higher
  * Compute capability 7.0+ required
  * **Memory**: Minimum 16GB VRAM for GPU-accelerated operations
  * For video processing: 21GB+ VRAM (reducible with optimization)
  * For large-scale deduplication: 32GB+ VRAM recommended
* **CUDA**: CUDA 12.0 or above with compatible drivers

## Software Dependencies

### Core Dependencies

* Python 3.10+ with required packages for distributed computing
* RAPIDS libraries (cuDF) for GPU-accelerated deduplication operations

### Container Support (Recommended)

* **Docker** or **Podman** for containerized deployment
* Access to NVIDIA NGC registry for official containers

## Network Requirements

* Reliable network connectivity between nodes
* High-bandwidth network for large dataset transfers
* InfiniBand recommended for multi-node GPU clusters

## Storage Requirements

* **Capacity**: Storage capacity should be 3-5x the size of input datasets
  * Input data storage
  * Intermediate processing files
  * Output data storage
* **Performance**: High-throughput storage system recommended
  * SSD storage preferred for frequently accessed data
  * Parallel filesystem for multi-node access

## Deployment-Specific Requirements

* Resource quotas configured for GPU and memory allocation

## Performance Considerations

### Memory Management

* Monitor memory usage across distributed workers
* Configure appropriate memory limits per worker
* Use memory-efficient data formats (e.g., Parquet)

### GPU Optimization

* Ensure CUDA drivers are compatible with RAPIDS versions
* Configure GPU memory pools (RMM) for optimal performance
* Monitor GPU utilization and memory usage

### Network Optimization

* Use high-bandwidth interconnects for multi-node deployments
* Configure appropriate network protocols (TCP vs UCX)
* Optimize data transfer patterns to minimize network overhead