Deploy NeMo Data Designer Using Docker Compose#

You can deploy the NeMo Data Designer microservice using Docker Compose for local development, testing, and quickstart scenarios. This deployment method provides a simple way to get Data Designer running quickly without complex Kubernetes configurations.

Prerequisites#

  • Docker and Docker Compose installed on your system

  • NGC API Key for accessing NGC Catalog

  • At least 8GB of available RAM (for complete stack including PostgreSQL and MinIO)

  • Sufficient disk space for generated artifacts, database, and object storage (recommended: 20GB+)

  • Access to LLM endpoints (NVIDIA API, local NIM, or other compatible endpoints)

Authenticate with NGC#

Before pulling container images, log in to the NVIDIA NGC container registry:

echo $NGC_CLI_API_KEY | docker login nvcr.io -u '$oauthtoken' --password-stdin

Replace $NGC_CLI_API_KEY with your actual NGC API key.


Deployment#

  1. Download the Docker Compose configuration from NGC:

    ngc registry resource download-version "nvidia/nemo-microservices/nemo-data-designer-docker-compose:25.08"
    cd nemo-data-designer-docker-compose_v25.08
    
  2. Set up environment variables:

    export NEMO_MICROSERVICES_IMAGE_REGISTRY="nvcr.io/nvidia/nemo-microservices"
    export NEMO_MICROSERVICES_IMAGE_TAG="25.08"
    

    Note: The Data Designer service will automatically configure additional environment variables:

    • NEMO_MICROSERVICES_DATA_DESIGNER_ARTIFACTS_ROOT=/artifacts_root

    • NEMO_MICROSERVICES_DATA_DESIGNER_DATA_STORE_ENDPOINT=http://datastore:3000/v1/hf

    • NEMO_MICROSERVICES_DATA_DESIGNER_DATA_STORE_TOKEN (optional)

  3. Start Data Designer:

    docker compose -f docker-compose.ea.yaml up -d
    

    This will start a complete backend stack with the following services:

    • data-designer: The main Data Designer service (accessible on port 8000)

    • data-designer-volume-permissions: Initializes proper permissions for artifacts storage. Runs once before data-designer and exits.

    • datastore: Backend storage service built on Gitea for dataset management

    • datastore-volume-permissions: Initializes proper permissions for datastore storage. Runs once before datastore and exits.

    • postgres: PostgreSQL database for datastore metadata and configuration. A dependency of datastore.

    • minio: Object storage service for large files and artifacts. A dependency of datastore.


Verify Deployment#

After starting the services, verify everything is working:

  1. Check service status:

    docker ps
    

The response should show four containers running: data-designer, datastore, postgres, and minio.

  1. Test the health endpoint:

    curl localhost:8000/health
    

The response should have status 200 and body {"status": "healthy"}.

Service Endpoints#

After starting Data Designer, the following services will be accessible:

Primary Services#

  • Data Designer API: http://localhost:8000

    • Health check: GET /health

    • Data preview: POST /v1beta1/data-designer/preview

    • Batch jobs: POST /v1beta1/data-designer/jobs

    • List jobs: GET /v1beta1/data-designer/jobs

    • Job status: GET /v1beta1/data-designer/jobs/{job_id}

    • Job logs: GET /v1beta1/data-designer/jobs/{job_id}/logs

    • Job results: GET /v1beta1/data-designer/jobs/{job_id}/results

    • Download result: GET /v1beta1/data-designer/jobs/{job_id}/results/{result_id}/download

  • Data Store API: http://localhost:3000

    • Health check: GET /v1/health

    • Repository management for datasets and artifacts

Backend Services (for troubleshooting)#

Quick API Test#

Test the service with a simple preview request (generates basic categorical data):

curl --json @- localhost:8000/v1beta1/data-designer/preview <<EOF
{
    "config": {
        "model_configs": [],
        "columns":[
            {
                "name":"school_subject",
                "type":"category",
                "params":{
                    "values":[
                        "math",
                        "science",
                        "history",
                        "art"
                    ]
                }
            }
        ]
    }
}
EOF

Backend Architecture#

The Data Designer deployment includes a complete backend stack:

Data Flow#

  1. Data Designer processes requests and generates synthetic data

  2. Datastore manages dataset repositories and metadata via Gitea

  3. PostgreSQL stores datastore configuration and repository metadata

  4. MinIO provides object storage for large files and artifacts

Storage Volumes#

  • artifacts_root: Stores generated synthetic datasets

  • datastore_storage: Stores datastore application data

  • postgres_storage: PostgreSQL database files

  • minio_storage: Object storage data

Networking#

All services communicate via the nmp Docker network bridge.


Troubleshooting#

Check All Services#

docker ps

All services should show as “Up” or “healthy”.

Service Health Checks#

# Data Designer
curl localhost:8000/health

# Datastore
curl localhost:3000/v1/health

# PostgreSQL
docker compose -f docker-compose.ea.yaml exec postgres pg_isready -d ndsdb -U ndsuser

# MinIO
curl localhost:9000/minio/health/ready

View Service Logs#

# Data Designer logs
docker compose -f docker-compose.ea.yaml logs data-designer

# Datastore logs
docker compose -f docker-compose.ea.yaml logs datastore

# Database logs
docker compose -f docker-compose.ea.yaml logs postgres

Stop the Service#

To stop Data Designer:

docker compose -f docker-compose.ea.yaml down