Safe Synthesizer Jobs
NeMo Safe Synthesizer jobs orchestrate the complete pipeline from data preparation through synthesis to evaluation. Understanding the job lifecycle and configuration options is essential for effective use of the platform.
Job Lifecycle
A NeMo Safe Synthesizer job progresses through several states:
Job States
- created: Job has been created but not yet started
- pending: Job is queued and waiting for resources (GPU)
- active: Job is processing your data
- completed: Job finished successfully - results are ready
- error: Job encountered an error - check logs for details
- cancelled: Job was manually cancelled
- cancelling: Job is in the process of being cancelled
- paused: Job execution has been paused
- pausing: Job is in the process of being paused
- resuming: Job is resuming from a paused state
Job Phases
A complete job typically includes these phases:
- Data Preparation
- Data validation and preprocessing
- Column type inference
- Grouping and ordering (if configured)
- Train/test split for holdout evaluation
- PII Replacement (optional)
- PII detection using configured methods
- Entity classification
- Value transformation
- Synthesis
- Training: Fine-tune LLM on prepared data
- Apply differential privacy (if enabled)
- Generation: Generate synthetic records
- Evaluation
- Calculate SQS metrics
- Calculate DPS metrics
- Generate evaluation report
Job Configuration
Jobs are configured through a hierarchical configuration structure:
Top-Level Configuration
Configuration Sections
- data_prep: Grouping, ordering, holdout configuration
- replace_pii: PII detection and replacement rules
- training: Model selection and training parameters
- generation: Synthetic data generation settings
- privacy: Differential privacy parameters
- evaluation: Quality and privacy assessment options
Job Management
Creating Jobs
Create jobs using:
- Python SDK: Recommended approach with
SafeSynthesizerJobBuilder - REST API: Direct HTTP requests for integration
- CLI: Command-line interface for scripting
Monitoring Jobs
Track job progress through:
- Status checks: Poll job state
- Logs: View real-time execution logs
- Events: Subscribe to job state changes when supported by the deployment
Retrieving Results
When the job completes, access:
- Synthetic data: Generated CSV files
- Evaluation report: HTML report with scores and visualizations
- Metadata: Job summary and configuration
- Adapter: LoRA adapter from the training step (when synthesis ran)
- Logs: Complete execution history
Reusing a Trained Adapter
For platform jobs, set pretrained_model_job in the job spec to a completed job that has an adapter result in Files. Reuse is generation-only (no retraining). Use either pretrained_model_job or config.training.pretrained_model, not both.
For host-local development (nemo safe-synthesizer run-local), set config.training.pretrained_model to a local adapter or work directory from an earlier run. See Local and Subprocess Execution.
Job Builder API
The SafeSynthesizerJobBuilder provides a high-level interface for common workflows:
The builder does the following:
- Uploads data to filesets automatically
- Provides smart defaults
- Validates configuration
- Returns a
SafeSynthesizerJobinstance for job interaction
Best Practices
Resource Planning
- Larger datasets and models require more GPU memory
- Training time scales with data size and model complexity
- Plan for 15-120 minutes for typical jobs
Configuration
- Start with default settings
- Enable PII replacement for sensitive data
- Use differential privacy for maximum privacy guarantees
- Adjust generation parameters based on evaluation results
Monitoring
- Check status periodically during execution
- Review logs if jobs fail or take longer than expected
- Use evaluation reports to iterate on configuration
Error Handling
- Common errors: insufficient GPU memory, invalid data format, configuration errors
- Check logs for detailed error messages
- Reduce model size or data size if resource errors occur
Troubleshooting
This section covers common issues and how to diagnose them.
Viewing Job Logs
Logs are essential for diagnosing job failures. Access them through:
Python SDK:
Docker Compose Deployments
When running NeMo Safe Synthesizer using Docker Compose, view container logs directly:
To check the health status of containers:
Kubernetes Deployments
For Kubernetes deployments, use kubectl to access logs:
To check pod status and events:
Common Issues and Solutions
Job Stuck in “Pending” State
Symptoms: Job remains in pending state for an extended period.
Possible Causes:
- No GPU resources available
- Resource quota exceeded
- Scheduling constraints not met
Solutions:
- Check available GPU resources in your cluster
- Verify resource quotas and limits
- Review pod events for scheduling failures
Out of Memory (OOM) Errors
Symptoms: Job fails with memory-related errors during training.
Possible Causes:
- Dataset too large for available GPU memory
- Batch size too high
- Model too large for available resources
Solutions:
- Reduce
batch_sizein training parameters - Use a smaller subset of data for initial testing
- Increase
gradient_accumulation_stepsto maintain effective batch size with lower memory
Invalid Data Format Errors
Symptoms: Job fails during data preparation phase.
Possible Causes:
- CSV file has encoding issues
- Missing or malformed columns
- Unsupported data types
Solutions:
- Ensure CSV is UTF-8 encoded
- Validate column names do not contain special characters
- Check for null values or inconsistent data types
Generation Quality Issues
Symptoms: Generated synthetic data has poor quality or many invalid records.
Possible Causes:
- Insufficient training (low
num_input_records_to_sample) - Temperature too high or too low
- Data has complex patterns that need more training
Solutions:
- Increase
num_input_records_to_samplefor more training - Adjust
temperature(try 0.7-1.0 range) - Enable
use_structured_generationfor better format adherence - Review evaluation report for specific quality issues
Related Topics
- Local and Subprocess Execution:
run-local, adapter reuse, and plugin tests - safe-synthesizer-101: Get started with NeMo Safe Synthesizer jobs
- index: More hands-on tutorials
- reference: Full parameter reference