Deploy NeMo Data Designer with Docker#

Run the microservice on your local machine using Docker Compose for experimentation.

Prerequisites#

Install Docker and Docker Compose
- Mac
- Windows
- Linux
Create an NGC API Key for accessing the Data Designer container from NGC Catalog
Install the NGC CLI
Ensure you have 8GB+ available RAM and 20GB+ available disk space on your development environment
A build.nvidia.com API key (Free Trial available)
Make sure you have curl or a similar CLI took to make HTTP requests

Launch Data Designer#

Authenticate with NGC#

Export the NGC API Key into your shell environment using the following command:

export NGC_CLI_API_KEY=<your-ngc-api-key>

Log in to the NVIDIA NGC container registry (following instructions at Getting Started with the NGC CLI):

echo $NGC_CLI_API_KEY | docker login nvcr.io -u '$oauthtoken' --password-stdin

Download the Data Designer Docker Compose Resources#

ngc registry resource download-version "nvidia/nemo-microservices/nemo-microservices-quickstart:25.11"
cd nemo-microservices-quickstart_v25.11

Set platform environment variables:#

export NEMO_MICROSERVICES_IMAGE_REGISTRY="nvcr.io/nvidia/nemo-microservices"
export NEMO_MICROSERVICES_IMAGE_TAG="25.11"
export NIM_API_KEY=<build.nvidia.com-api-key> # This is the API key for build.nvidia.com created as part of the prerequisites

Start the Service#

docker compose --profile data-designer up

Verify Deployment#

Running the docker compose command starts the platform with Data Designer API running on http://localhost:8080.

Let’s check that all services are running properly.

docker ps

All containers with names prefixed nemo-microservices- should show “Up” with healthy status.

Note

Services may take several minutes to become fully available.

Test the API#

Run a quick test to confirm Data Designer is working:

curl -X POST -H "Content-type: application/json" localhost:8080/v1/data-designer/preview -d @- <<EOF
{
    "config": {
        "model_configs": [],
        "columns":[
            {
                "name":"school_subject",
                "sampler_type":"category",
                "params":{
                    "values":[
                        "math",
                        "science",
                        "history",
                        "art"
                    ]
                }
            }
        ]
    }
}
EOF

You should see something similar to the following output if Data Designer is working

{"message":"Starting preview job","message_type":"log","extra":{"level":"debug"}}
{"message":"🩺 Running health checks for models...","message_type":"log","extra":{"level":"info"}}
{"message":"⏳ Processing batch 1 of 1","message_type":"log","extra":{"level":"info"}}
{"message":"🎲 Preparing samplers to generate 10 records across 1 columns","message_type":"log","extra":{"level":"info"}}
{"message":"📊 Model usage summary:\n{}","message_type":"log","extra":{"level":"info"}}
{"message":"[{\"school_subject\": \"history\"}, {\"school_subject\": \"history\"}, {\"school_subject\": \"math\"}, {\"school_subject\": \"art\"}, {\"school_subject\": \"art\"}, {\"school_subject\": \"history\"}, {\"school_subject\": \"science\"}, {\"school_subject\": \"science\"}, {\"school_subject\": \"math\"}, {\"school_subject\": \"history\"}]","message_type":"dataset","extra":null}
{"message":"📐 Measuring dataset column statistics:","message_type":"log","extra":{"level":"info"}}
{"message":"  |-- 🎲 column: 'school_subject'","message_type":"log","extra":{"level":"info"}}
{"message":"{\"num_records\": 10, \"target_num_records\": 10, \"column_statistics\": [{\"column_name\": \"school_subject\", \"num_records\": 10, \"num_null\": 0, \"num_unique\": 4, \"pyarrow_dtype\": \"string\", \"simple_dtype\": \"string\", \"column_type\": \"sampler\", \"sampler_type\": \"category\", \"distribution_type\": \"categorical\", \"distribution\": {\"most_common_value\": \"history\", \"least_common_value\": \"science\", \"histogram\": {\"categories\": [\"history\", \"math\", \"art\", \"science\"], \"counts\": [4, 2, 2, 2]}}}], \"side_effect_column_names\": [], \"column_profiles\": null}","message_type":"analysis","extra":null}
{"message":"Preview job ended","message_type":"log","extra":{"level":"debug"}}

Create your First Dataset#

Run one of our intro tutorials to create your first dataset.

Stop the Service (Optional)#

Once you are done generating data, you can stop the service if you want to:

docker compose --profile data-designer down -v

Available Services#

Once running, these services are accessible:

Data Designer API: http://localhost:8080
Data Store API: http://localhost:3000

Key Data Designer endpoints:

Data preview: POST /v1/data-designer/preview (refer to Preview Data Generation)
Batch jobs: POST /v1/data-designer/jobs (refer to Create Data Generation Job)
List jobs: GET /v1/data-designer/jobs (refer to List Data Generation Jobs)
List job results: GET /v1/data-designer/jobs/{job_id}/results (refer to Get Job Results)

Initialize the SDK Client#

Once you have Data Designer running, you can use the NeMo Microservices SDK to interact with the service.

Install the nemo-microservices SDK:

pip install "nemo-microservices[data-designer]"

Import the necessary Data Desinger functionality:

from nemo_microservices.data_designer.essentials import *

Initialize the NeMo Microservices Client:

data_designer_client = NeMoDataDesignerClient(
    base_url="http://localhost:8080",
)

Note

If you are hosting Data Designer as a service that uses Bearer token authentication, you can provide your Bearer token via the default_headers parameter in NeMoDataDesignerClient.

Customization Options#

The Docker Compose deployment can be customized through the platform_config defined in the root docker-compose.yaml file to adjust various Data Designer settings beyond the basic model provider configuration.

Configuration Settings Reference#

Field	Description	Required	Default
`model_provider_registry`	Provider details for model configurations	Yes	`None`
`default_model_configs`	Model configurations to use when unspecified by client	No	`None`
`seed_dataset_source_registry`	Details for seed dataset source	No	`None`
`preview_num_records.max`	Maximum number of records that can be requested for preview requests	No	`10`
`preview_num_records.default`	Number of records to return for preview requests when unspecified by client	No	`10`

Example Fully-Customized Configuration#

Refer to the following YAML snippet with all available customization options:

configs:
  platform_config:
    content: |
      # Other NMP settings

      data_designer:
        model_provider_registry:
          default: "nvidiabuild"
          providers:
            - name: "nvidiabuild"
              endpoint: "https://integrate.api.nvidia.com/v1"
              api_key: "NIM_API_KEY"
        default_model_configs:
          - alias: "text"
            provider: "nvidiabuild"
            model: "meta/llama-3.3-70b-instruct"
            inference_parameters:
              temperature: 0.7
              top_p: 0.9
              max_tokens: 1024
        seed_dataset_source_registry:
          sources:
            - endpoint: "https://my-private-hf-hub.com"
              token: "MY_HF_HUB_TOKEN"
        preview_num_records:
          max: 20
          default: 8

Note

For detailed model provider and model configuration options, refer to the Configure Models guide.

Troubleshooting#

Refer to the Data Designer Troubleshooting Guide for information on how to check service health, logs, and resolve common issues.