nemo_curator.stages.synthetic.nemotron_cc.nemo_data_designer.base
nemo_curator.stages.synthetic.nemotron_cc.nemo_data_designer.base
NDD-backed base stage for NemotronCC synthetic data generation.
This module re-implements the BaseSyntheticStage interface on top of DataDesignerStage (NeMo Data Designer) instead of using LLMClient/AsyncLLMClient directly. Child stages (WikipediaParaphrasingStage, DistillStage, etc.) can inherit from this class with the same field-based API (system_prompt, prompt, input_field, output_field) and gain NDD execution automatically.
Module Contents
Classes
Data
API
Bases: DataDesignerStage
Base class for NemotronCC synthetic stages backed by NeMo Data Designer.
Parameters
system_prompt : str | None
Optional system prompt prepended to every LLM call.
prompt : str | None
User prompt template. Must contain {document} which will be
replaced by the value of input_field at runtime.
input_field : str | None
Column name in the input DataFrame whose value is substituted
into the prompt template.
output_field : str | None
Column name where the LLM response is stored in the output
DataFrame.
model_alias : str | None
NDD model alias that maps to a ModelConfig entry.
model_configs : list | None
List of data_designer.config.ModelConfig objects. If not
provided, NDD will use its default model configuration.
model_providers : list | None
Optional list of data_designer.config.models.ModelProvider
for custom endpoints. Forwarded to DataDesignerStage.
verbose : bool
When False (default), suppress NDD log output.
Auto-build a DataDesignerConfigBuilder from stage fields.
Skipped when config_builder or data_designer_config_file
is already provided (advanced usage).
Process the input sample to create the LLM prompt.
Called per-row before NDD generation. Child classes can override this to customise prompt formatting.
Process a single response from the LLM.
Called per-row after NDD generation. Child classes can override this to customise response parsing.