Production Deployment Requirements
This page details the comprehensive system, hardware, and software requirements for deploying NeMo Curator in production environments.
System Requirements
- Operating System: Ubuntu 22.04/20.04 (recommended)
- Python: Python 3.10, 3.11, or 3.12
- packaging >= 22.0
Hardware Requirements
CPU Requirements
- Multi-core CPU with sufficient cores for parallel processing
- Memory: Minimum 16GB RAM recommended for text processing
- For large datasets: 32GB+ RAM recommended
- Memory requirements scale with dataset size and number of workers
GPU Requirements (Optional but Recommended)
- GPU: NVIDIA GPU with Volta™ architecture or higher
- Compute capability 7.0+ required
- Memory: Minimum 16GB VRAM for GPU-accelerated operations
- For video processing: 21GB+ VRAM (reducible with optimization)
- For large-scale deduplication: 32GB+ VRAM recommended
- CUDA: CUDA 12.0 or above with compatible drivers
Software Dependencies
Core Dependencies
- Python 3.10+ with required packages for distributed computing
- RAPIDS libraries (cuDF) for GPU-accelerated deduplication operations
Container Support (Recommended)
- Docker or Podman for containerized deployment
- Access to NVIDIA NGC registry for official containers
Network Requirements
- Reliable network connectivity between nodes
- High-bandwidth network for large dataset transfers
- InfiniBand recommended for multi-node GPU clusters
Storage Requirements
- Capacity: Storage capacity should be 3-5x the size of input datasets
- Input data storage
- Intermediate processing files
- Output data storage
- Performance: High-throughput storage system recommended
- SSD storage preferred for frequently accessed data
- Parallel filesystem for multi-node access
Deployment-Specific Requirements
- Resource quotas configured for GPU and memory allocation
Performance Considerations
Memory Management
- Monitor memory usage across distributed workers
- Configure appropriate memory limits per worker
- Use memory-efficient data formats (e.g., Parquet)
GPU Optimization
- Ensure CUDA drivers are compatible with RAPIDS versions
- Configure GPU memory pools (RMM) for optimal performance
- Monitor GPU utilization and memory usage
Network Optimization
- Use high-bandwidth interconnects for multi-node deployments
- Configure appropriate network protocols (TCP vs UCX)
- Optimize data transfer patterns to minimize network overhead