Infrastructure References
This section provides technical reference documentation for NeMo Curator’s infrastructure components that are used across all modalities (text, image, video).
Infrastructure Components
Memory Management
Optimize memory usage when processing large datasets partitioning batching monitoring
GPU Acceleration
Leverage NVIDIA GPUs for faster data processing cuda rmm performance
Resumable Processing
Continue interrupted operations across large datasets checkpoints recovery batching
Container Environments
Available environments and configurations in NeMo Curator containers. Includes build arguments and video-specific environments. docker conda environments