nemo_curator.stages.interleaved.io.reader
nemo_curator.stages.interleaved.io.reader
Module Contents
Classes
API
Dataclass
Bases: CompositeStage[_EmptyTask, InterleavedBatch]
Composite stage for reading WebDataset shards.
blocksize
fields
file_extensions
file_paths
files_per_partition
image_extensions
image_member_field
images_field
json_extensions
materialize_on_read
max_batch_bytes
name
per_image_fields
per_text_fields
read_kwargs
sample_id_field
source_id_field
texts_field