NeMo Data Designer Quickstart Using Docker Compose#

Run the microservice on your local machine using Docker Compose for experimentation.

Prerequisites#

  • Install Docker and Docker Compose

  • Create an NGC API Key for accessing the Data Designer container from NGC Catalog

  • Install the NGC CLI

  • Ensure you have 8GB+ available RAM and 20GB+ available disk space on your development environment

  • A build.nvidia.com API key (Free Trial available)

  • Make sure you have curl or a similar CLI took to make HTTP requests


Launch Data Designer#

Authenticate with NGC#

Log in to the NVIDIA NGC container registry (replace $NGC_CLI_API_KEY with your actual NGC API key):

echo $NGC_CLI_API_KEY | docker login nvcr.io -u '$oauthtoken' --password-stdin

Download the Data Designer Docker Compose Resources#

ngc registry resource download-version "nvidia/nemo-microservices/nemo-microservices-quickstart:25.09"
cd nemo-microservices-quickstart_v25.09

Set platform environment variables:#

export NEMO_MICROSERVICES_IMAGE_REGISTRY="nvcr.io/nvidia/nemo-microservices"
export NEMO_MICROSERVICES_IMAGE_TAG="25.09"
export NIM_API_KEY=<build.nvidia.com-api-key> # This is the API key for build.nvidia.com created as part of the prerequisites

Start the Service#

docker compose --profile data-designer up

Verify Deployment#

Running the docker compose command starts the platform with Data Designer API running on http://localhost:8080.

Let’s check that all services are running properly.

docker ps

All containers with names prefixed nemo-microservices- should show “Up” with healthy status.

Note

Services may take several minutes to become fully available.

Test the API#

Run a quick test to confirm Data Designer is working:

curl -X POST -H "Content-type: application/json" localhost:8080/v1beta1/data-designer/preview -d @- <<EOF
{
    "config": {
        "model_configs": [],
        "columns":[
            {
                "name":"school_subject",
                "type":"category",
                "params":{
                    "values":[
                        "math",
                        "science",
                        "history",
                        "art"
                    ]
                }
            }
        ]
    }
}
EOF

You should see something similar to the following output if Data Designer is working

{"step": "", "ts": "2025-09-04T03:39:04.362490", "type": "workflow_state_change", "stream": "logs", "payload": {"state": "started"}}
{"step": "using-samplers-to-generate-1-columns", "ts": "2025-09-04T03:39:04.362905", "type": "step_state_change", "stream": "logs", "payload": {"state": "started"}}
{"step": "using-samplers-to-generate-1-columns", "ts": "2025-09-04T03:39:04.364968", "type": "log_line", "stream": "logs", "payload": {"level": "info", "msg": "\ud83c\udfb2 Using numerical samplers to generate 10 records across 1 columns"}}
{"step": "using-samplers-to-generate-1-columns", "ts": "2025-09-04T03:39:04.776972", "type": "dataset", "stream": "step_outputs", "payload": {"dataset": [{"school_subject": "art"}, {"school_subject": "math"}, {"school_subject": "history"}, {"school_subject": "science"}, {"school_subject": "science"}, {"school_subject": "history"}, {"school_subject": "history"}, {"school_subject": "math"}, {"school_subject": "art"}, {"school_subject": "math"}]}}
{"step": "using-samplers-to-generate-1-columns", "ts": "2025-09-04T03:39:04.777660", "type": "step_state_change", "stream": "logs", "payload": {"state": "completed"}}
{"step": "", "ts": "2025-09-04T03:39:04.778110", "type": "workflow_state_change", "stream": "logs", "payload": {"state": "completed", "execution_summary": {"io_token_counts": {"input_tokens": 0, "output_tokens": 44}, "billing_summary": {"input_tokens": 0, "output_tokens": 44, "total_tokens": 44, "credits_used": 4e-05}}}}

Create your First Dataset#

Run one of our intro tutorials to create your first datasetNeMo Data Designer Tutorials.

Stop the Service (Optional)#

Once you are done generating data, you can stop the service if you want to:

docker compose --profile data-designer down

Available Services#

Once running, these services are accessible:

  • Data Designer API: http://localhost:8080

  • Data Store API: http://localhost:3000

Key Data Designer endpoints:

  • Data preview: POST /v1beta1/data-designer/preview

  • Batch jobs: POST /v1beta1/data-designer/jobs

  • List jobs: GET /v1beta1/data-designer/jobs

  • Job status: GET /v1beta1/data-designer/jobs/{job_id}


Customization Options#

The Docker Compose deployment can be customized through the platform_config defined in the root docker-compose.yaml file to adjust various Data Designer settings beyond the basic model provider configuration.

Configuration Settings Reference#

Field

Description

Required

Default

model_provider_registry

Provider details for model configurations

Yes

None

data_store_endpoint

Source for seed datasets

No

"https://huggingface.co"

data_store_token

API key for authenticating with data_store_endpoint

No

None

preview_num_records.max

Maximum number of records that can be requested for preview requests

No

10

preview_num_records.default

Number of records to return for preview requests when unspecified by client

No

10

default_model_configs

Model configurations to use when unspecified by client

No

None

Example Fully-Customized Configuration#

Refer to the following YAML snippet with all available customization options:

configs:
  platform_config:
    content: |
      # Other NMP settings

      data_designer:
        data_store_endpoint: "https://my-private-hf-hub.com"
        data_store_token: "DATA_STORE_TOKEN"
        preview_num_records:
          max: 20
          default: 8
        model_provider_registry:
          default: "nvidiabuild"
          providers:
            - name: "nvidiabuild"
              endpoint: "https://integrate.api.nvidia.com/v1"
              api_key: "NIM_API_KEY"
        default_model_configs:
          - alias: "text"
            provider: "nvidiabuild"
            model: "meta/llama-3.3-70b-instruct"
            inference_parameters:
              temperature: 0.7
              top_p: 0.9
              max_tokens: 1024

Note

For detailed model provider and model configuration options, refer to the Configure Models guide.

Troubleshooting#

Refer to the Data Designer Troubleshooting Guide for information on how to check service health, logs, and resolve common issues.