Getting Started with NeMo Safe Synthesizer#

Get started with NeMo Safe Synthesizer for generating private synthetic versions of sensitive tabular datasets.

Prerequisites#

Before using NeMo Safe Synthesizer, complete the NeMo Platform Quickstart to install the CLI/SDK and deploy the platform.

NeMo Safe Synthesizer has the following additional requirements:

An NVIDIA GPU on the host machine with 80GB+ VRAM (check with nvidia-smi). This is separate from any GPU inside a NIM container — Safe Synthesizer training runs directly on the host.
Sufficient disk space for generated datasets (50GB+ recommended)

For general platform troubleshooting (port conflicts, health checks, and so on), refer to the main quickstart guide.

Note

The platform pre-configures a system/nvidia-build model provider during startup. This provider routes inference requests to models hosted on build.nvidia.com using the API base URL https://integrate.api.nvidia.com and the NGC API key with Public API Endpoints permissions provided during deployment (automatically saved as the built-in system/ngc-api-key secret).

You can verify this provider exists by running nmp inference providers list --workspace system.

The tutorials in these docs use this provider for inference, but you can alternatively create your own and use it instead.

Using the CLI#

Interact with NeMo Safe Synthesizer using the nmp CLI:

# List jobs
nmp safe-synthesizer jobs list

# Create a job from a config file
nmp safe-synthesizer jobs create --input-file config.json

# Create a job with inline JSON
nmp safe-synthesizer jobs create --input-data '{"spec": {...}}'

Next Steps#

Run one of the tutorials to create your first synthetic dataset:

Safe Synthesizer 101 Tutorial - A beginner-friendly introduction
Differential Privacy Tutorial - Generate differentially-private synthetic data