nemo_curator.stages.interleaved.stages
nemo_curator.stages.interleaved.stages
Module Contents
Classes
API
DataclassAbstract
Bases: ProcessingStage[InterleavedBatch, InterleavedBatch]
Base stage for row-wise interleaved annotation/filter transforms.
name
abstract
Apply annotation/filter logic and return transformed dataframe.
DataclassAbstract
Bases: BaseInterleavedAnnotatorStage
Base stage for interleaved filtering based on a keep-mask.
drop_invalid_rows
name
staticmethod
abstract
Return content-specific boolean keep-mask aligned to dataframe index.
Yield (row_index, bytes) for masked rows after materialization.
Only the masked subset is materialized, avoiding redundant I/O for the full task.
Dataclass
Bases: BaseInterleavedFilterStage
Filter interleaved image rows by aspect-ratio bounds (all image formats).
max_aspect_ratio
min_aspect_ratio
name
staticmethod