Deploy NeMo Data Designer with Docker#
Run the microservice on your local machine using Docker Compose for experimentation.
Prerequisites#
Before following this deployment guide, ensure that you have:
Sufficient disk space for generated artifacts (recommended: 20GB).
At least 8GB of available RAM.
Docker and Docker Compose installed on your system.
An NGC API Key with access to the NGC Catalog and Model endpoints on build.nvidia.com. Create an NGC API key following the instructions at Generating NGC API Keys. Specify the NGC Catalog and Public API Endpoints permissions when you generate the key.
The NGC CLI installed. Refer to Getting Started with the NGC CLI for details on setup.
A tool such as
curlto make HTTP requests.
Launch Data Designer#
Authenticate with NGC#
Export the NGC API Key into your shell environment using the following command:
export NGC_CLI_API_KEY=<your-ngc-api-key>
Log in to the NVIDIA NGC container registry (following instructions at Getting Started with the NGC CLI):
echo $NGC_CLI_API_KEY | docker login nvcr.io -u '$oauthtoken' --password-stdin
Download the Data Designer Docker Compose Resources#
ngc registry resource download-version "nvidia/nemo-microservices/nemo-microservices-quickstart:25.12"
cd nemo-microservices-quickstart_v25.12
Set platform environment variables:#
export NEMO_MICROSERVICES_IMAGE_REGISTRY="nvcr.io/nvidia/nemo-microservices"
export NEMO_MICROSERVICES_IMAGE_TAG="25.12"
export NIM_API_KEY=<build.nvidia.com-api-key> # This is the API key for build.nvidia.com created as part of the prerequisites
Start the Service#
docker compose --profile data-designer up --detach --quiet-pull --wait
Verify Deployment#
Running the docker compose command starts the platform with Data Designer API running on http://localhost:8080.
Letβs check that all services are running properly.
docker ps
All containers with names prefixed nemo-microservices- should show βUpβ with healthy status.
Note
Services may take several minutes to become fully available.
Test the API#
Run a quick test to confirm Data Designer is working:
curl -X POST -H "Content-type: application/json" localhost:8080/v1/data-designer/preview -d @- <<EOF
{
"config": {
"model_configs": [],
"columns":[
{
"name":"school_subject",
"sampler_type":"category",
"params":{
"values":[
"math",
"science",
"history",
"art"
]
}
}
]
}
}
EOF
You should see something similar to the following output if Data Designer is working
{"message":"Starting preview job","message_type":"log","extra":{"level":"debug"}}
{"message":"π©Ί Running health checks for models...","message_type":"log","extra":{"level":"info"}}
{"message":"β³ Processing batch 1 of 1","message_type":"log","extra":{"level":"info"}}
{"message":"π² Preparing samplers to generate 10 records across 1 columns","message_type":"log","extra":{"level":"info"}}
{"message":"π Model usage summary:\n{}","message_type":"log","extra":{"level":"info"}}
{"message":"[{\"school_subject\": \"history\"}, {\"school_subject\": \"history\"}, {\"school_subject\": \"math\"}, {\"school_subject\": \"art\"}, {\"school_subject\": \"art\"}, {\"school_subject\": \"history\"}, {\"school_subject\": \"science\"}, {\"school_subject\": \"science\"}, {\"school_subject\": \"math\"}, {\"school_subject\": \"history\"}]","message_type":"dataset","extra":null}
{"message":"π Measuring dataset column statistics:","message_type":"log","extra":{"level":"info"}}
{"message":" |-- π² column: 'school_subject'","message_type":"log","extra":{"level":"info"}}
{"message":"{\"num_records\": 10, \"target_num_records\": 10, \"column_statistics\": [{\"column_name\": \"school_subject\", \"num_records\": 10, \"num_null\": 0, \"num_unique\": 4, \"pyarrow_dtype\": \"string\", \"simple_dtype\": \"string\", \"column_type\": \"sampler\", \"sampler_type\": \"category\", \"distribution_type\": \"categorical\", \"distribution\": {\"most_common_value\": \"history\", \"least_common_value\": \"science\", \"histogram\": {\"categories\": [\"history\", \"math\", \"art\", \"science\"], \"counts\": [4, 2, 2, 2]}}}], \"side_effect_column_names\": [], \"column_profiles\": null}","message_type":"analysis","extra":null}
{"message":"Preview job ended","message_type":"log","extra":{"level":"debug"}}
Create your First Dataset#
Run one of our intro tutorials to create your first dataset.
Stop the Service (Optional)#
Once you are done generating data, you can stop the service if you want to:
docker compose --profile data-designer down -v
Available Services#
Once running, these services are accessible:
Data Designer API: http://localhost:8080
Data Store API: http://localhost:3000
Key Data Designer endpoints:
Data preview:
POST /v1/data-designer/preview(refer to Preview Data Generation)Batch jobs:
POST /v1/data-designer/jobs(refer to Create Data Generation Job)List jobs:
GET /v1/data-designer/jobs(refer to List Data Generation Jobs)List job results:
GET /v1/data-designer/jobs/{job_id}/results(refer to Get Job Results)
Initialize the SDK Client#
Once you have Data Designer running, you can use the NeMo Microservices SDK to interact with the service.
Install the nemo-microservices SDK:
pip install "nemo-microservices[data-designer]"
Import the necessary Data Designer functionality:
from nemo_microservices.data_designer.essentials import *
Initialize the NeMo Microservices Client:
data_designer_client = NeMoDataDesignerClient( base_url="http://localhost:8080", )
Note
If you are hosting Data Designer as a service that uses Bearer token authentication, you can provide your Bearer token via the default_headers parameter in NeMoDataDesignerClient.
Customization Options#
The Docker Compose deployment can be customized through the platform_config defined in the root docker-compose.yaml file to adjust various Data Designer settings beyond the basic model provider configuration.
Configuration Settings Reference#
Field |
Description |
Required |
Default |
|---|---|---|---|
|
Provider details for model configurations |
Yes |
|
|
Model configurations to use when unspecified by client |
No |
|
|
Details for seed dataset source |
No |
|
|
Maximum number of records that can be requested for preview requests |
No |
|
|
Number of records to return for preview requests when unspecified by client |
No |
|
Example Fully-Customized Configuration#
Refer to the following YAML snippet with all available customization options:
configs:
platform_config:
content: |
# Other NMP settings
data_designer:
model_provider_registry:
default: "nvidiabuild"
providers:
- name: "nvidiabuild"
endpoint: "https://integrate.api.nvidia.com/v1"
api_key: "NIM_API_KEY"
default_model_configs:
- alias: "text"
provider: "nvidiabuild"
model: "meta/llama-3.3-70b-instruct"
inference_parameters:
temperature: 0.7
top_p: 0.9
max_tokens: 1024
seed_dataset_source_registry:
sources:
- endpoint: "https://my-private-hf-hub.com"
token: "MY_HF_HUB_TOKEN"
preview_num_records:
max: 20
default: 8
Note
For detailed model provider and model configuration options, refer to the Configure Models guide.
Troubleshooting#
Refer to the Data Designer Troubleshooting Guide for information on how to check service health, logs, and resolve common issues.