NeMo Data Designer Microservice Deployment Guide#

The NeMo Data Designer microservice enables synthetic data generation capabilities for the NeMo platform. NeMo Data Designer is in Early Access and supports deployment in “Quickstart” mode using Docker Compose. All the necessary components to run Data Designer are deployed as part of the Docker Compose and the Data Designer API is exposed on your localhost.

Data Designer supports generating realistic synthetic datasets with various column types and constraints, leveraging large language models for intelligent data creation.

Key Features#

  • Synthetic Data Generation: Create realistic datasets with various data types

  • Column-based Configuration: Define custom column types, constraints, and relationships

  • LLM Integration: Leverage language models for intelligent data generation

  • Batch Processing: Support for large-scale dataset generation via batch jobs

  • API-driven: RESTful API for programmatic access and integration

Prerequisites#

Before deploying Data Designer, ensure you have:

  • Docker and Docker Compose installed

  • NGC API key for accessing NVIDIA container registry

  • Access to LLM endpoints (local NIM or NVIDIA API)

  • Sufficient storage for generated artifacts


Deployment Options#

Choose one of the following deployment options based on your use case:

Docker Compose (Recommended)

Deploy the Data Designer microservice using Docker Compose for local development and testing.

Deploy NeMo Data Designer Using Docker Compose

Configuration and Management#

Customize and manage your Data Designer deployment:

Troubleshooting

Resolve common issues and debug deployment problems.

Data Designer Troubleshooting