Infrastructure References#
This section provides technical reference documentation for NeMo Curator’s infrastructure components that are used across all modalities (text, image, video). For deployment and operational configuration, see the Admin Configuration Guide.
Infrastructure Components#
Distributed Computing
Configure and manage distributed processing across multiple machines
Memory Management
Optimize memory usage when processing large datasets
GPU Acceleration
Leverage NVIDIA GPUs for faster data processing
Resumable Processing
Continue interrupted operations across large datasets
Container Environments
Available environments and configurations in NeMo Curator containers. Includes Slurm environment variables, build arguments, and video-specific environments.