Deploy NeMo Data Designer with Docker#
Run the microservice on your local machine using Docker Compose for experimentation.
Prerequisites#
Install Docker and Docker Compose
Create an NGC API Key for accessing the Data Designer container from NGC Catalog
Install the NGC CLI
Ensure you have 8GB+ available RAM and 20GB+ available disk space on your development environment
A build.nvidia.com API key (Free Trial available)
Make sure you have
curl
or a similar CLI took to make HTTP requests
Launch Data Designer#
Authenticate with NGC#
Export the NGC API Key into your shell environment using the following command:
export NGC_CLI_API_KEY=<your-ngc-api-key>
Log in to the NVIDIA NGC container registry (following instructions at Getting Started with the NGC CLI):
echo $NGC_CLI_API_KEY | docker login nvcr.io -u '$oauthtoken' --password-stdin
Download the Data Designer Docker Compose Resources#
ngc registry resource download-version "nvidia/nemo-microservices/nemo-microservices-quickstart:25.10"
cd nemo-microservices-quickstart_v25.10
Set platform environment variables:#
export NEMO_MICROSERVICES_IMAGE_REGISTRY="nvcr.io/nvidia/nemo-microservices"
export NEMO_MICROSERVICES_IMAGE_TAG="25.10"
export NIM_API_KEY=<build.nvidia.com-api-key> # This is the API key for build.nvidia.com created as part of the prerequisites
Start the Service#
docker compose --profile data-designer up
Verify Deployment#
Running the docker compose
command starts the platform with Data Designer API running on http://localhost:8080.
Letβs check that all services are running properly.
docker ps
All containers with names prefixed nemo-microservices-
should show βUpβ with healthy status.
Note
Services may take several minutes to become fully available.
Test the API#
Run a quick test to confirm Data Designer is working:
curl -X POST -H "Content-type: application/json" localhost:8080/v1/data-designer/preview -d @- <<EOF
{
"config": {
"model_configs": [],
"columns":[
{
"name":"school_subject",
"sampler_type":"category",
"params":{
"values":[
"math",
"science",
"history",
"art"
]
}
}
]
}
}
EOF
You should see something similar to the following output if Data Designer is working
{"message":"Starting preview job","message_type":"log","extra":{"level":"debug"}}
{"message":"π©Ί Running health checks for models...","message_type":"log","extra":{"level":"info"}}
{"message":"β³ Processing batch 1 of 1","message_type":"log","extra":{"level":"info"}}
{"message":"π² Preparing samplers to generate 10 records across 1 columns","message_type":"log","extra":{"level":"info"}}
{"message":"π Model usage summary:\n{}","message_type":"log","extra":{"level":"info"}}
{"message":"[{\"school_subject\": \"history\"}, {\"school_subject\": \"history\"}, {\"school_subject\": \"math\"}, {\"school_subject\": \"art\"}, {\"school_subject\": \"art\"}, {\"school_subject\": \"history\"}, {\"school_subject\": \"science\"}, {\"school_subject\": \"science\"}, {\"school_subject\": \"math\"}, {\"school_subject\": \"history\"}]","message_type":"dataset","extra":null}
{"message":"π Measuring dataset column statistics:","message_type":"log","extra":{"level":"info"}}
{"message":" |-- π² column: 'school_subject'","message_type":"log","extra":{"level":"info"}}
{"message":"{\"num_records\": 10, \"target_num_records\": 10, \"column_statistics\": [{\"column_name\": \"school_subject\", \"num_records\": 10, \"num_null\": 0, \"num_unique\": 4, \"pyarrow_dtype\": \"string\", \"simple_dtype\": \"string\", \"column_type\": \"sampler\", \"sampler_type\": \"category\", \"distribution_type\": \"categorical\", \"distribution\": {\"most_common_value\": \"history\", \"least_common_value\": \"science\", \"histogram\": {\"categories\": [\"history\", \"math\", \"art\", \"science\"], \"counts\": [4, 2, 2, 2]}}}], \"side_effect_column_names\": [], \"column_profiles\": null}","message_type":"analysis","extra":null}
{"message":"Preview job ended","message_type":"log","extra":{"level":"debug"}}
Create your First Dataset#
Run one of our intro tutorials to create your first datasetNeMo Data Designer Tutorials.
Stop the Service (Optional)#
Once you are done generating data, you can stop the service if you want to:
docker compose --profile data-designer down -v
Available Services#
Once running, these services are accessible:
Data Designer API: http://localhost:8080
Data Store API: http://localhost:3000
Key Data Designer endpoints:
Data preview:
POST /v1/data-designer/preview
(refer to Preview Data Generation)Batch jobs:
POST /v1/data-designer/jobs
(refer to Create Data Generation Job)List jobs:
GET /v1/data-designer/jobs
(refer to List Data Generation Jobs)List job results:
GET /v1/data-designer/jobs/{job_id}/results
(refer to Get Job Results)
Initialize the SDK Client#
Once you have Data Designer running, you can use the NeMo Microservices SDK to interact with the service.
Install the nemo-microservices SDK:
pip install "nemo-microservices[data-designer]"
Import the necessary Data Desinger functionality:
from nemo_microservices.data_designer.essentials import *
Initialize the NeMo Microservices Client:
data_designer_client = NeMoDataDesignerClient( base_url="http://localhost:8080", )
Note
If you are hosting Data Designer as a service that uses Bearer token authentication, you can provide your Bearer token via the default_headers parameter in NeMoDataDesignerClient
.
Customization Options#
The Docker Compose deployment can be customized through the platform_config
defined in the root docker-compose.yaml
file to adjust various Data Designer settings beyond the basic model provider configuration.
Configuration Settings Reference#
Field |
Description |
Required |
Default |
---|---|---|---|
|
Provider details for model configurations |
Yes |
|
|
Model configurations to use when unspecified by client |
No |
|
|
Details for seed dataset source |
No |
|
|
Maximum number of records that can be requested for preview requests |
No |
|
|
Number of records to return for preview requests when unspecified by client |
No |
|
Example Fully-Customized Configuration#
Refer to the following YAML snippet with all available customization options:
configs:
platform_config:
content: |
# Other NMP settings
data_designer:
model_provider_registry:
default: "nvidiabuild"
providers:
- name: "nvidiabuild"
endpoint: "https://integrate.api.nvidia.com/v1"
api_key: "NIM_API_KEY"
default_model_configs:
- alias: "text"
provider: "nvidiabuild"
model: "meta/llama-3.3-70b-instruct"
inference_parameters:
temperature: 0.7
top_p: 0.9
max_tokens: 1024
seed_dataset_source_registry:
sources:
- endpoint: "https://my-private-hf-hub.com"
token: "MY_HF_HUB_TOKEN"
preview_num_records:
max: 20
default: 8
Note
For detailed model provider and model configuration options, refer to the Configure Models guide.
Troubleshooting#
Refer to the Data Designer Troubleshooting Guide for information on how to check service health, logs, and resolve common issues.