nemo_curator.stages.client_partitioning
nemo_curator.stages.client_partitioning
Module Contents
Classes
Functions
API
Dataclass
Bases: FilePartitioningStage
Stage that partitions input file paths from a client into FileGroupTasks.
This stage runs as a dedicated processing stage (not on the driver) and creates file groups based on the partitioning strategy.
_fs
_root
input_list_json_path
name
Return sorted, de-duplicated list of paths relative to root.
Read JSON list (via fsspec) and return entries relative to root.
Validates each entry is under root.