nemo_curator.stages.image.io.image_reader
nemo_curator.stages.image.io.image_reader
Module Contents
Classes
API
Dataclass
Bases: ProcessingStage[FileGroupTask, ImageBatch]
DALI-based reader that loads images from WebDataset tar shards.
Works with DALI GPU (CUDA) or DALI CPU; decodes on GPU if CUDA is available, otherwise falls back to CPU decoding.
dali_batch_size
name
num_gpus_per_worker
num_threads
verbose
Yield lists of ImageObject per DALI run over one or more tar files.
Emit one ImageBatch per DALI run across all provided tar files.
Ray stage specification for this stage.