Deploy NeMo Data Designer with Docker#

Run the microservice on your local machine using Docker Compose for experimentation.

Prerequisites#

Before following this deployment guide, ensure that you have:

  • Sufficient disk space for generated artifacts (recommended: 20GB).

  • At least 8GB of available RAM.

  • Docker and Docker Compose installed on your system.

  • An NGC API Key with access to the NGC Catalog and Model endpoints on build.nvidia.com. Create an NGC API key following the instructions at Generating NGC API Keys. Specify the NGC Catalog and Public API Endpoints permissions when you generate the key.

  • The NGC CLI installed. Refer to Getting Started with the NGC CLI for details on setup.

  • A tool such as curl to make HTTP requests.


Launch Data Designer#

Authenticate with NGC#

Export the NGC API Key into your shell environment using the following command:

export NGC_CLI_API_KEY=<your-ngc-api-key>

Log in to the NVIDIA NGC container registry (following instructions at Getting Started with the NGC CLI):

echo $NGC_CLI_API_KEY | docker login nvcr.io -u '$oauthtoken' --password-stdin

Download the Data Designer Docker Compose Resources#

ngc registry resource download-version "nvidia/nemo-microservices/nemo-microservices-quickstart:25.12"
cd nemo-microservices-quickstart_v25.12

Set platform environment variables:#

export NEMO_MICROSERVICES_IMAGE_REGISTRY="nvcr.io/nvidia/nemo-microservices"
export NEMO_MICROSERVICES_IMAGE_TAG="25.12"
export NIM_API_KEY=<build.nvidia.com-api-key> # This is the API key for build.nvidia.com created as part of the prerequisites

Start the Service#

docker compose --profile data-designer up --detach --quiet-pull --wait

Verify Deployment#

Running the docker compose command starts the platform with Data Designer API running on http://localhost:8080.

Let’s check that all services are running properly.

docker ps

All containers with names prefixed nemo-microservices- should show β€œUp” with healthy status.

Note

Services may take several minutes to become fully available.

Test the API#

Run a quick test to confirm Data Designer is working:

curl -X POST -H "Content-type: application/json" localhost:8080/v1/data-designer/preview -d @- <<EOF
{
    "config": {
        "model_configs": [],
        "columns":[
            {
                "name":"school_subject",
                "sampler_type":"category",
                "params":{
                    "values":[
                        "math",
                        "science",
                        "history",
                        "art"
                    ]
                }
            }
        ]
    }
}
EOF

You should see something similar to the following output if Data Designer is working

{"message":"Starting preview job","message_type":"log","extra":{"level":"debug"}}
{"message":"🩺 Running health checks for models...","message_type":"log","extra":{"level":"info"}}
{"message":"⏳ Processing batch 1 of 1","message_type":"log","extra":{"level":"info"}}
{"message":"🎲 Preparing samplers to generate 10 records across 1 columns","message_type":"log","extra":{"level":"info"}}
{"message":"πŸ“Š Model usage summary:\n{}","message_type":"log","extra":{"level":"info"}}
{"message":"[{\"school_subject\": \"history\"}, {\"school_subject\": \"history\"}, {\"school_subject\": \"math\"}, {\"school_subject\": \"art\"}, {\"school_subject\": \"art\"}, {\"school_subject\": \"history\"}, {\"school_subject\": \"science\"}, {\"school_subject\": \"science\"}, {\"school_subject\": \"math\"}, {\"school_subject\": \"history\"}]","message_type":"dataset","extra":null}
{"message":"πŸ“ Measuring dataset column statistics:","message_type":"log","extra":{"level":"info"}}
{"message":"  |-- 🎲 column: 'school_subject'","message_type":"log","extra":{"level":"info"}}
{"message":"{\"num_records\": 10, \"target_num_records\": 10, \"column_statistics\": [{\"column_name\": \"school_subject\", \"num_records\": 10, \"num_null\": 0, \"num_unique\": 4, \"pyarrow_dtype\": \"string\", \"simple_dtype\": \"string\", \"column_type\": \"sampler\", \"sampler_type\": \"category\", \"distribution_type\": \"categorical\", \"distribution\": {\"most_common_value\": \"history\", \"least_common_value\": \"science\", \"histogram\": {\"categories\": [\"history\", \"math\", \"art\", \"science\"], \"counts\": [4, 2, 2, 2]}}}], \"side_effect_column_names\": [], \"column_profiles\": null}","message_type":"analysis","extra":null}
{"message":"Preview job ended","message_type":"log","extra":{"level":"debug"}}

Create your First Dataset#

Run one of our intro tutorials to create your first dataset.

Stop the Service (Optional)#

Once you are done generating data, you can stop the service if you want to:

docker compose --profile data-designer down -v

Available Services#

Once running, these services are accessible:

  • Data Designer API: http://localhost:8080

  • Data Store API: http://localhost:3000

Key Data Designer endpoints:

Initialize the SDK Client#

Once you have Data Designer running, you can use the NeMo Microservices SDK to interact with the service.

  1. Install the nemo-microservices SDK:

    pip install "nemo-microservices[data-designer]"
    
  2. Import the necessary Data Designer functionality:

    from nemo_microservices.data_designer.essentials import *
    
  3. Initialize the NeMo Microservices Client:

    data_designer_client = NeMoDataDesignerClient(
        base_url="http://localhost:8080",
    )
    

Note

If you are hosting Data Designer as a service that uses Bearer token authentication, you can provide your Bearer token via the default_headers parameter in NeMoDataDesignerClient.


Customization Options#

The Docker Compose deployment can be customized through the platform_config defined in the root docker-compose.yaml file to adjust various Data Designer settings beyond the basic model provider configuration.

Configuration Settings Reference#

Field

Description

Required

Default

model_provider_registry

Provider details for model configurations

Yes

None

default_model_configs

Model configurations to use when unspecified by client

No

None

seed_dataset_source_registry

Details for seed dataset source

No

None

preview_num_records.max

Maximum number of records that can be requested for preview requests

No

10

preview_num_records.default

Number of records to return for preview requests when unspecified by client

No

10

Example Fully-Customized Configuration#

Refer to the following YAML snippet with all available customization options:

configs:
  platform_config:
    content: |
      # Other NMP settings

      data_designer:
        model_provider_registry:
          default: "nvidiabuild"
          providers:
            - name: "nvidiabuild"
              endpoint: "https://integrate.api.nvidia.com/v1"
              api_key: "NIM_API_KEY"
        default_model_configs:
          - alias: "text"
            provider: "nvidiabuild"
            model: "meta/llama-3.3-70b-instruct"
            inference_parameters:
              temperature: 0.7
              top_p: 0.9
              max_tokens: 1024
        seed_dataset_source_registry:
          sources:
            - endpoint: "https://my-private-hf-hub.com"
              token: "MY_HF_HUB_TOKEN"
        preview_num_records:
          max: 20
          default: 8

Note

For detailed model provider and model configuration options, refer to the Configure Models guide.

Troubleshooting#

Refer to the Data Designer Troubleshooting Guide for information on how to check service health, logs, and resolve common issues.