Quality and Privacy Evaluation#
Learn how to assess the quality and privacy of your synthetic data using NeMo Safe Synthesizer comprehensive evaluation capabilities.
Overview#
Evaluation is a critical component of NeMo Safe Synthesizer that helps you understand both the utility and privacy of your synthetic data. The evaluation step is enabled by default and provides comprehensive reports comparing your original and synthetic datasets across multiple dimensions.
Evaluation Components#
Synthetic Quality Score#
Column Correlation Stability: Analyze the correlation across every combination of two columns
Deep Structure Stability: Use Principal Component Analysis to reduce the dimensionality when comparing the original and synthetic data
Column Distribution Stability: Compare the distribution for each column in the original data to the matching column in the synthetic data
Text Structure Similarity: Calculate the sentence, word, and character counts across the two datasets
Text Semantic Similarity: Understand whether the semantic meaning of the text held after synthesizing
Data Privacy Score#
Membership Inference Protection: Test whether attackers can determine if specific records were in the training data
Attribute Inference Protection: Assess whether sensitive attributes can be inferred by an attacker when other attributes are known
PII Replay: Evaluate the frequency with which sensitive values from the original data show up in the synthetic version
You can learn more about each of the metrics on the Evaluation Report page.