Is this page helpful?

About Generating Private Synthetic Data#

NVIDIA NeMo Safe Synthesizer enables you to create private versions of sensitive tabular datasets. The resulting data is entirely synthetic, with no one-to-one mapping to your original records. NeMo Safe Synthesizer is purpose-built for privacy compliance and data protection while preserving data utility for downstream AI tasks.

Deploy with Docker

Tutorials

Important

NVIDIA NeMo Safe Synthesizer is released with early access availability and is subject to limited support and potential API changes in future releases.

The NeMo Safe Synthesizer Early Access release is available using Docker Compose or the NeMo Microservices Helm Chart for Kubernetes deployment.

NeMo Safe Synthesizer allows you to generate synthetic data that maintains the statistical properties of your original dataset without exposing sensitive information about individual records.

NeMo Safe Synthesizer is best when you have the data you need, it is just private or sensitive in nature. It interpolates from existing data to generate a private, synthetic version, where new records have no one-to-one mapping to original records. If you do not have any data or want to extrapolate based on a very small set of examples, refer to About Designing Synthetic Data From Scratch or Seeds. NeMo Data Designer supports synthetic data creation from scratch or small seed for AI training and development use cases.

NeMo Safe Synthesizer Job#

A complete NeMo Safe Synthesizer job consists of the following steps:

Upload Data: Add your tabular data to NeMo Data Store
Prepare Data:
- Configure PII Replacement: Set up detection and replacement of sensitive information (recommended prior to the Synthesis step to ensure the model has no chance of learning the most sensitive information like names and addresses)
- Configure Training Data Organization: Configure grouping, ordering, and holdout applied prior to training
Configure Synthesis:
- Training: Set model selection and training parameters including differential privacy
- Generation: Specify synthetic data generation parameters
- Evaluation: Adjust quality and privacy assessment metrics as needed (enabled by default)
Execute and Review:
- Run and Monitor Job: Execute the job and track progress
- Download Results: Retrieve synthetic data and evaluation reports

Installation Options#

Try out this early access microservice using Docker Compose or deploying the NeMo Microservices Helm chart.

Docker Compose

Deploy the NeMo Safe Synthesizer microservice using Docker. Easiest for local testing.

standalone

Deploy NeMo Safe Synthesizer with Docker

Helm Chart

Deploy the NeMo Microservices Helm Chart, which includes NeMo Safe Synthesizer.

helm-chart

Deploy NeMo Safe Synthesizer Using Parent Helm Chart

References#

Explore advanced configuration management, API details, and learning resources.

Synthesize Data

Generate private synthetic data using LLM-based fine-tuning with optional differential privacy.

synthetic-data fine-tuning differential-privacy

Synthesize Data

Replace PII

Detect and replace personally identifiable information in your datasets using configurable transformation rules.

privacy compliance pii-detection

PII Replacement

Evaluate Dataset Quality & Privacy

Assess the quality and privacy of your synthetic data with comprehensive evaluation reports.

evaluation privacy-metrics quality-assessment

Quality and Privacy Evaluation

Configuration

Learn how to adjust parameters in the NeMo Safe Synthesizer configuration to customize your job.

configuration parameters customization

NeMo Safe Synthesizer Configuration

Python SDK

Use the Python SDK for NeMo Safe Synthesizer jobs with REST API access and comprehensive configuration options.

python-sdk api-client automation

NeMo Safe Synthesizer Python SDK

API Reference

Complete REST API documentation for NeMo Safe Synthesizer jobs, configuration, and results management.

rest-api endpoints

Safe Synthesizer API