nemo_curator.tasks.image

View as Markdown

Module Contents

Classes

NameDescription
ImageBatchTask for processing batches of images.
ImageObjectRepresents a single image with metadata.

API

class nemo_curator.tasks.image.ImageBatch(
task_id: str,
dataset_name: str,
data: list[nemo_curator.tasks.image.ImageObject] = list(),
_stage_perf: list[nemo_curator.utils.performance_utils.StagePerfStats] = list(),
_metadata: dict[str, typing.Any] = dict()
)
Dataclass

Bases: Task

Task for processing batches of images. Images are stored as a list of ImageObject instances, each containing the path to the image and associated metadata.

data
list[ImageObject] = field(default_factory=list)
num_items
int

Number of images in this batch.

nemo_curator.tasks.image.ImageBatch.validate() -> bool

Validate the task data.

class nemo_curator.tasks.image.ImageObject(
image_path: str = '',
image_id: str = '',
metadata: dict[str, typing.Any] = dict(),
image_data: numpy.ndarray | None = None,
embedding: numpy.ndarray | None = None,
aesthetic_score: float | None = None,
nsfw_score: float | None = None
)
Dataclass

Represents a single image with metadata.

aesthetic_score
float | None = None
embedding
ndarray | None = None
image_data
ndarray | None = None
image_id
str = ''
image_path
str = ''
metadata
dict[str, Any] = field(default_factory=dict)
nsfw_score
float | None = None