Infrastructure References#

This section provides technical reference documentation for NeMo Curator’s infrastructure components that are used across all modalities (text, image, video). For deployment and operational configuration, see the Admin Configuration Guide.


Infrastructure Components#

Distributed Computing

Configure and manage distributed processing across multiple machines

Distributed Computing Reference
Memory Management

Optimize memory usage when processing large datasets

Memory Management Guide
GPU Acceleration

Leverage NVIDIA GPUs for faster data processing

GPU Processing Guide
Resumable Processing

Continue interrupted operations across large datasets

Resumable Processing
Container Environments

Available environments and configurations in NeMo Curator containers. Includes Slurm environment variables, build arguments, and video-specific environments.

Container Environments