nemo_curator.stages.synthetic.nemo_data_designer.data_designer

View as Markdown

Module Contents

Classes

NameDescription
DataDesignerStageData Designer stage.

Data

__all__

API

class nemo_curator.stages.synthetic.nemo_data_designer.data_designer.DataDesignerStage(
config_builder: data_designer.config.DataDesignerConfigBuilder | None = None,
data_designer_config_file: str | None = None,
model_providers: list | None = None,
verbose: bool = False
)
Dataclass

Bases: ProcessingStage[DocumentBatch, DocumentBatch]

Data Designer stage.

This class provides a Data Designer stage. To request GPUs, use: DataDesignerStage(…).with_(resources=Resources(gpus=X)).

When verbose is False (default), NeMo Data Designer (NDD) log output is suppressed (e.g. “Preview generation in progress”, “Preview complete!”) so the stage is less verbose. Set verbose=True to see full NDD logging.

Optional model_providers: pass a list of :class:data_designer.config.models.ModelProvider to use custom or test endpoints (e.g. a mock LLM server). If None, the default DataDesigner providers are used.

config_builder
DataDesignerConfigBuilder | None = None
data_designer
DataDesigner = field(init=False)
data_designer_config_file
str | None = None
model_providers
list | None = None
verbose
bool = False
nemo_curator.stages.synthetic.nemo_data_designer.data_designer.DataDesignerStage.__post_init__() -> None
nemo_curator.stages.synthetic.nemo_data_designer.data_designer.DataDesignerStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.synthetic.nemo_data_designer.data_designer.DataDesignerStage.outputs() -> tuple[list[str], list[str]]
nemo_curator.stages.synthetic.nemo_data_designer.data_designer.DataDesignerStage.process(
batch: nemo_curator.tasks.DocumentBatch
) -> nemo_curator.tasks.DocumentBatch
nemo_curator.stages.synthetic.nemo_data_designer.data_designer.__all__ = ['DataDesignerStage']