API Reference#

NeMo Curator’s API reference provides comprehensive technical documentation for all modules, classes, and functions. Use these references to understand the technical foundation of NeMo Curator and integrate it with your data curation workflows.

Execution Backends

Ray-based execution backends

Adapters and executors for running pipelines at scale.

ray-data xenna

backends
Pipeline

Orchestrate end-to-end workflows

Build and run pipelines composed of processing stages.

pipeline
Processing Stages

Download, transform, and write data

Modular stages for download/extract, text models/classifiers, I/O, and utilities.

download text io modules

stages
Tasks

Core data structures

Document batches, file groups, and related interfaces passed between stages.

tasks
Utilities

Helper functions

File, performance, and operation utilities used across the pipeline.

utils