For AI agents: a documentation index is available at the root level at /llms.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
LogoLogoNeMo Curator
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
      • Overview
      • Memory Management
      • Monitoring
      • GPU Processing
      • Resumable Processing
      • Execution Backends
      • Per-Stage Runtime Environments
      • Container Environments
    • Related Tools
  • Welcome
  • Overview
  • Key Features
  • Overview
  • Deduplication
  • Resource Allocation
  • Streaming
  • Auto-Balancing
  • Throughput
  • Overview
  • Loading
  • Acquisition
  • Processing
  • Curation Pipeline
  • Overview
  • Loading
  • Data Processing
  • Data Export
  • Overview
  • Architecture
  • Abstractions
  • Data Flow
  • Overview
  • Curation Pipeline
  • Audio Task
  • ASR Pipeline
  • Quality Metrics
  • Manifests and Ingest
  • ALM Pipeline
  • Text Integration
  • Overview
  • Migration FAQ
  • Migration Guide
  • Overview
  • Install (All Modalities)
  • Text Quickstart
  • Image Quickstart
  • Video Quickstart
  • Audio Quickstart
  • Overview
  • Tutorials
  • Overview
  • ArXiv
  • Common Crawl
  • Custom Sources
  • Nemotron-Parse PDF Pipeline
  • Read Existing Data
  • Wikipedia
  • Overview
  • Overview
  • Add IDs
  • Text Cleaning
  • Overview
  • vLLM Embedder
  • Overview
  • Exact Deduplication
  • Fuzzy Deduplication
  • Semantic Deduplication
  • Overview
  • Language Detection
  • Stopwords
  • Overview
  • Classifier
  • Distributed Classifier
  • Heuristic Filtering
  • Overview
  • Code Processing
  • Overview
  • Interleaved IO
  • Interleaved Filters
  • Save and Export
  • Overview
  • LLM Client Setup
  • Inference Server
  • NeMo Data Designer
  • Multilingual Q&A
  • Overview
  • Task Reference
  • Overview
  • Overview
  • Beginner Tutorial
  • Deduplication Workflow
  • Overview
  • TAR Archives
  • Overview
  • Overview
  • CLIP Embedder
  • Overview
  • Aesthetic Filter
  • NSFW Filter
  • Save and Export
  • Overview
  • Overview
  • Beginner Tutorial
  • Split and Dedup
  • Overview
  • Add Custom Environment
  • Add Custom Code
  • Add Custom Model
  • Add Custom Stage
  • Load Data
  • Overview
  • Clipping
  • Transcoding
  • Filtering
  • Embeddings
  • Deduplication
  • Frame Extraction
  • Captions Preview
  • Save and Export
  • Overview
  • Overview
  • Beginner Tutorial
  • ALM Tutorial
  • ReadSpeech Tutorial
  • Overview
  • Custom Manifests
  • FLEURS Dataset
  • Local Files
  • Overview
  • Overview
  • NeMo ASR Models
  • Overview
  • WER Filtering
  • Duration Filtering
  • Overview
  • Preprocessing Stages
  • VAD Segmentation
  • Band Filter
  • UTMOS Filter
  • SIGMOS Filter
  • Speaker Separation
  • AudioDataFilterStage Composite
  • Overview
  • Duration Calculation
  • Format Validation
  • Overview
  • Data Builder
  • Overlap Filtering
  • Text Integration
  • Save and Export
  • Overview
  • Overview
  • Requirements
  • Deploy Image Curation on Slurm
  • Multi-Node Ray on Slurm
  • Overview
  • Overview
  • Overview
  • Memory Management
  • Monitoring
  • GPU Processing
  • Resumable Processing
  • Execution Backends
  • Per-Stage Runtime Environments
  • Container Environments
  • Related Tools
On this page
  • Infrastructure Components
ReferenceInfra

Infrastructure References

||View as Markdown|

This section provides technical reference documentation for NeMo Curator’s infrastructure components that are used across all modalities (text, image, video).


Infrastructure Components

Memory Management

Optimize memory usage when processing large datasets partitioning batching monitoring

GPU Acceleration

Leverage NVIDIA GPUs for faster data processing cuda rmm performance

Resumable Processing

Continue interrupted operations across large datasets checkpoints recovery batching

Container Environments

Available environments and configurations in NeMo Curator containers. Includes build arguments and video-specific environments. docker conda environments

Previous

Overview

Next

Memory Management

NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.