nemo_microservices.data_designer.config.config_builder#
Module Contents#
Classes#
Configuration container for Data Designer builder. |
|
Config builder for Data Designer configurations. |
Data#
API#
- class nemo_microservices.data_designer.config.config_builder.BuilderConfig(/, **data: Any)#
Bases:
nemo_microservices.data_designer.config.base.ExportableConfigBaseConfiguration container for Data Designer builder.
This class holds the main Data Designer configuration along with optional datastore settings needed for seed dataset operations.
Attributes: data_designer: The main Data Designer configuration containing columns, constraints, profilers, and other settings. datastore_settings: Optional datastore settings for accessing external datasets.
Initialization
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.- data_designer: nemo_microservices.data_designer.config.data_designer_config.DataDesignerConfig#
None
- datastore_settings: nemo_microservices.data_designer.config.datastore.DatastoreSettings | None#
None
- class nemo_microservices.data_designer.config.config_builder.DataDesignerConfigBuilder(
- model_configs: list[nemo_microservices.data_designer.config.models.ModelConfig] | str | pathlib.Path | None = None,
Config builder for Data Designer configurations.
This class provides a high-level interface for building Data Designer configurations.
Initialization
Initialize a new DataDesignerConfigBuilder instance.
Args: model_configs: Optional model configurations. Can be: - A list of ModelConfig objects - A string or Path to a model configuration file - None to use default model configurations
- add_column(
- column_config: nemo_microservices.data_designer.config.columns.ColumnConfigT | None = None,
- *,
- name: str | None = None,
- column_type: nemo_microservices.data_designer.config.columns.DataDesignerColumnType | None = None,
- **kwargs,
Add a Data Designer column configuration to the current Data Designer configuration.
If no column config object is provided, you must provide the
name,column_type, and any additional keyword arguments that are required by the column config constructor.Args: column_config: Data Designer column config object to add. name: Name of the column to add. This is only used if
column_configis not provided. column_type: Column type to add. This is only used ifcolumn_configis not provided. **kwargs: Additional keyword arguments to pass to the column constructor.Returns: The current Data Designer config builder instance.
- add_constraint(
- constraint: nemo_microservices.data_designer.config.sampler_constraints.ColumnConstraintT | None = None,
- *,
- constraint_type: nemo_microservices.data_designer.config.sampler_constraints.ConstraintType | None = None,
- **kwargs,
Add a constraint to the current Data Designer configuration.
Currently, constraints are only supported for numerical samplers.
You can either provide a constraint object directly, or provide a constraint type and additional keyword arguments to construct the constraint object. Valid constraint types are: - “scalar_inequality”: Constraint between a column and a scalar value. - “column_inequality”: Constraint between two columns.
Args: constraint: Constraint object to add. constraint_type: Constraint type to add. Ignored when
constraintis provided. **kwargs: Additional keyword arguments to pass to the constraint constructor.Returns: The current Data Designer config builder instance.
- add_model_config(
- model_config: nemo_microservices.data_designer.config.models.ModelConfig,
Add a model configuration to the current Data Designer configuration.
Args: model_config: The model configuration to add.
- add_profiler(
- profiler_config: nemo_microservices.data_designer.config.analysis.column_profilers.ColumnProfilerConfigT,
Add a profiler to the current Data Designer configuration.
Args: profiler_config: The profiler configuration object to add.
Returns: The current Data Designer config builder instance.
Raises: BuilderConfigurationError: If the profiler configuration is of an invalid type.
- property allowed_references: list[str]#
Get all referenceable variables allowed in prompt templates and expressions.
This includes all column names and their side effect columns that can be referenced in prompt templates and expressions within the configuration.
Returns: A list of variable names that can be referenced in templates and expressions.
- build(
- *,
- skip_validation: bool = False,
- raise_exceptions: bool = False,
Build a DataDesignerConfig instance based on the current builder configuration.
Args: skip_validation: Whether to skip validation of the configuration. raise_exceptions: Whether to raise an exception if the configuration is invalid.
Returns: The current Data Designer config object.
- delete_column(column_name: str) typing_extensions.Self#
Delete the column with the given name.
Args: column_name: Name of the column to delete.
Returns: The current Data Designer config builder instance.
Raises: BuilderConfigurationError: If trying to delete a seed dataset column.
- delete_constraints(target_column: str) typing_extensions.Self#
Delete all constraints for the given target column.
Args: target_column: Name of the column to remove constraints for.
Returns: The current Data Designer config builder instance.
- delete_model_config(alias: str) typing_extensions.Self#
Delete a model configuration from the current Data Designer configuration by alias.
Args: alias: The alias of the model configuration to delete.
- classmethod from_config(
- config: dict | str | pathlib.Path | nemo_microservices.data_designer.config.config_builder.BuilderConfig,
Create a DataDesignerConfigBuilder from an existing configuration.
Args: config: Configuration source. Can be: - A dictionary containing the configuration - A string or Path to a YAML/JSON configuration file - A BuilderConfig object
Returns: A new instance populated with the configuration from the provided source.
Raises: ValueError: If the config format is invalid. ValidationError: If the builder config loaded from the config is invalid.
- get_builder_config() nemo_microservices.data_designer.config.config_builder.BuilderConfig#
Get the builder config for the current Data Designer configuration.
Returns: The builder config.
- get_column_config(
- name: str,
Get a column configuration by name.
Args: name: Name of the column to retrieve the config for.
Returns: The column configuration object.
Raises: KeyError: If no column with the given name exists.
- get_column_configs() list[nemo_microservices.data_designer.config.columns.ColumnConfigT]#
Get all column configurations.
Returns: A list of all column configuration objects.
- get_columns_excluding_type( ) list[nemo_microservices.data_designer.config.columns.ColumnConfigT]#
Get all column configurations excluding the specified type.
Args: column_type: The type of columns to exclude.
Returns: A list of column configurations that do not match the specified type.
- get_columns_of_type( ) list[nemo_microservices.data_designer.config.columns.ColumnConfigT]#
Get all column configurations of the specified type.
Args: column_type: The type of columns to filter by.
Returns: A list of column configurations matching the specified type.
- get_constraints(
- target_column: str,
Get all constraints for the given target column.
Args: target_column: Name of the column to get constraints for.
Returns: A list of constraint objects targeting the specified column.
- get_llm_gen_columns() list[nemo_microservices.data_designer.config.columns.ColumnConfigT]#
Get all LLM-generated column configurations.
Returns: A list of column configurations that use LLM generation.
- get_profilers() list[nemo_microservices.data_designer.config.analysis.column_profilers.ColumnProfilerConfigT]#
Get all profilers.
Returns: A list of profiler configuration objects.
- get_seed_config() nemo_microservices.data_designer.config.seed.SeedConfig | None#
Get the seed config for the current Data Designer configuration.
Returns: The seed config if configured, None otherwise.
- get_seed_datastore_settings() nemo_microservices.data_designer.config.datastore.DatastoreSettings | None#
Get most recent datastore settings for the current Data Designer configuration.
Returns: The datastore settings if configured, None otherwise.
- property info: nemo_microservices.data_designer.config.utils.info.DataDesignerInfo#
Get the DataDesignerInfo object for this builder.
Returns: An object containing metadata about the configuration.
- property model_configs: list[nemo_microservices.data_designer.config.models.ModelConfig]#
Get the model configurations for this builder.
Returns: A list of ModelConfig objects used for data generation.
- num_columns_of_type( ) int#
Get the count of columns of the specified type.
Args: column_type: The type of columns to count.
Returns: The number of columns matching the specified type.
- set_seed_datastore_settings(
- datastore_settings: nemo_microservices.data_designer.config.datastore.DatastoreSettings | None,
Set the datastore settings for the seed dataset.
Args: datastore_settings: The datastore settings to use for the seed dataset.
- validate(*, raise_exceptions: bool = False) typing_extensions.Self#
Validate the current Data Designer configuration.
Args: raise_exceptions: Whether to raise an exception if the configuration is invalid.
Returns: The current Data Designer config builder instance.
Raises: InvalidConfigError: If the configuration is invalid and raise_exceptions is True.
- with_seed_dataset(
- dataset_reference: nemo_microservices.data_designer.config.seed.SeedDatasetReference,
- *,
- sampling_strategy: nemo_microservices.data_designer.config.seed.SamplingStrategy = SamplingStrategy.ORDERED,
Add a seed dataset to the current Data Designer configuration.
This method sets the seed dataset for the configuration and automatically creates SeedDatasetColumnConfig objects for each column found in the dataset. The column names are fetched from the dataset source (Hugging Face Hub or NeMo Microservices Datastore).
Args: dataset_reference: Seed dataset reference for fetching from the datastore. sampling_strategy: The sampling strategy to use when generating data from the seed dataset. Defaults to ORDERED sampling.
Returns: The current Data Designer config builder instance.
- write_config(
- path: str | pathlib.Path,
- indent: int | None = 2,
- **kwargs,
Write the current configuration to a file.
Args: path: Path to the file to write the configuration to. indent: Indentation level for the output file (default: 2). **kwargs: Additional keyword arguments passed to the serialization methods used.
Raises: BuilderConfigurationError: If the file format is unsupported.
- nemo_microservices.data_designer.config.config_builder.logger#
‘getLogger(…)’