NeMo Safe Synthesizer Configuration#

Configuration system for synthetic data generation in NVIDIA NeMo Safe Synthesizer, including data preparation, holdout, PII replacement, training, generation, and evaluation parameters.

Configuration Overview#

The NeMo Safe Synthesizer configuration system uses a hierarchical structure with six main parameter categories that control all aspects of synthetic data generation.

The default parameters generally work well for getting started. In particular, you should rarely need to change the parameters for Holdout and Evaluation. Data Preparation parameters are needed only if your data is event-driven and PII Replacement parameter adjustments are generally only needed if your data contains unusual PII or difficult-to-detect entities.

If you make parameter adjustments, the bulk should occur in the Training parameters section, as this is where there is the most opportunity for performance optimization.

The Generation parameters section is often commonly used as well, but generally only to adjust the number of output records you want to generate.

Data Preparation

These parameters are generally only useful and necessary if you have event-driven data, and need to apply grouping and ordering rules as a pre-processing step.

Data Preparation Configuration
PII Replacement

Generally the default selection of entities and automatic classification is sufficient for PII detection, but in some cases you may want to provide your own entity list, declare the entity type for columns directly, or adjust the tuning for precision vs recall.

PII Replacement Configuration
Training

While your job should generally be able to run with the default training hyperparameters, these parameters are the most likely to require tuning to improve performance. Additionally, this is where you can apply differential privacy to achieve the maximum level of privacy if needed.

Training Configuration
Generation

You can adjust the number of records you want to generate with the Generation parameters.

Generation Configuration
Evaluation

Evaluation is enabled by default, and nearly always the default parameters are sufficient.

Evaluate Configuration