nemo_curator.stages.image.io.image_writer
nemo_curator.stages.image.io.image_writer
Module Contents
Classes
API
Dataclass
Bases: ProcessingStage[ImageBatch, FileGroupTask]
Write images to tar files and corresponding metadata to a Parquet file.
- Images are packed into tar archives with at most
images_per_tarentries each. - Metadata for all written images in the batch is stored in a single Parquet file.
- Tar filenames are unique across actors via an actor-scoped prefix.
deterministic_name
images_per_tar
name
output_dir
remove_image_data
verbose
Encode image array to JPEG bytes; always returns (bytes, â.jpgâ).
Write metadata rows to a Parquet file for a specific tar and return its path.
The Parquet file shares the same base name as the tar file: {base_name}.parquet.
Write a tar file with given (member_name, bytes) entries using provided base name.
Returns tar path.
Construct a base name for tar files within this actor.