Migrating from Standalone Library#

If you’re already using the standalone DataDesigner library, this guide shows you how to migrate to the NMP service.

Key Insight

Your configuration code stays the same. All config_builder code (e.g. columns, constraints) works identically. Only the execution interface changes, and some features are not supported (see below).

Migration Summary#

What changes:

  • Execution interface (imports and client initialization)

  • Model provider setup (reference by name instead of direct configuration)

  • Seed data sources (use Filesets or HuggingFace instead of local files)

What stays the same:

  • All column configurations (samplers, LLM columns, expressions, etc.)

  • Constraints and validation logic

  • Jinja2 templating and prompt syntax

  • Method names: preview(), create()

Why migrate: Get distributed execution, job monitoring, centralized secrets, and team collaboration.

Quick Overview#

Standalone Library

from data_designer.interface import DataDesigner
import data_designer.config as dd

# Build config
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(...)

# Execute locally
data_designer = DataDesigner(artifact_path="./artifacts")
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000)

NMP Service

import os
from nemo_platform import NeMoPlatform
import data_designer.config as dd

# Build config (identical)
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(...)

# Execute on NMP
data_designer = NeMoPlatform(base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), workspace="default").data_designer
preview = data_designer.preview(config_builder, num_records=10)
job = data_designer.create(config_builder, num_records=1000)

Step-by-Step Migration#

Step 1: Install the NMP SDK#

Replace or supplement your standalone library installation:

# Remove standalone library (optional)
pip uninstall data-designer

# Install NMP SDK with Data Designer support
pip install nemo-platform[data-designer]

The [data-designer] extra includes the data_designer.config package, so you can still build configurations the same way.

Note

The nemo-platform[data-designer] package pins to a specific version of the Data Designer library that matches the service version in your NMP deployment. This ensures compatibility between your configuration code and the service.

Step 2: Update Imports#

Change your execution imports:

# Before
from data_designer.interface import DataDesigner

# After
from nemo_platform import NeMoPlatform

Keep these imports unchanged:

import data_designer.config as dd  # Still works!

Step 3: Update Model Configurations#

In the standalone library, you pass ModelProvider objects to the DataDesigner constructor. In the NMP service, model providers are created and registered with the Models service. (In both contexts, model providers are referenced by name in each ModelConfig.)

# Before (standalone library)
from data_designer.interface import DataDesigner
import data_designer.config as dd

# Define model providers
model_providers = [
    dd.ModelProvider(
        name="nvidia-build",
        endpoint="https://integrate.api.nvidia.com",
        api_key="your-api-key"
    )
]

# Model configs reference providers by name
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",
        provider="nvidia-build",  # References the provider name
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# Pass providers to DataDesigner constructor
data_designer = DataDesigner(
    artifact_path="./artifacts",
    model_providers=model_providers
)

# After (NMP service)
import os
from nemo_platform import NeMoPlatform
import data_designer.config as dd

sdk = NeMoPlatform(base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), workspace="default")

# Create a secret for your API key
sdk.secrets.create(name="nvidia-build-api-key", data="your-api-key")

# Create a model provider
sdk.inference.providers.create(
    name="nvidia-build",
    host_url="https://integrate.api.nvidia.com",
    api_key_secret_name="nvidia-build-api-key",
)

# Model configs reference NMP model providers
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",  # Use the `served_model_name` from the provider
        provider="default/nvidia-build",  # workspace/provider-name
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# No need to pass providers - they're managed by Inference Gateway

Key changes:

  • Model providers are configured once with the Models service

  • No direct API keys in code - managed by Secrets service

Step 4: Update Client Initialization#

Replace the DataDesigner client with the NMP DataDesignerResource, using an instance of NeMo Platform:

# Before (standalone library)
data_designer = DataDesigner(
    artifact_path="./artifacts",
    model_providers=[...]  # Optional provider list
)

# After (NMP service)
import os
from nemo_platform import NeMoPlatform

sdk = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)
data_designer = sdk.data_designer

Step 5: Update Seed Data Sources (If Used)#

If you use seed datasets, you need to migrate local sources to remote ones:

Local Files → Filesets#

# Before (standalone library)
from data_designer.config import LocalFileSeedSource

seed_source = LocalFileSeedSource(path="./data/seed.csv")
config_builder.with_seed_dataset(seed_source)

# After (NMP service)
# 1. Upload file to Fileset
sdk.files.filesets.create(name="my-seed-data")
sdk.files.upload(
    fileset="my-seed-data",
    local_path="./data/seed.csv",
    remote_path="seed.csv"
)

# 2. Reference Fileset in config
from nemo_platform.data_designer import FilesetFileSeedSource

seed_source = FilesetFileSeedSource(path="default/my-seed-data#seed.csv")
config_builder.with_seed_dataset(seed_source)

DataFrames → Filesets#

# Before (standalone library)
from data_designer.config import DataFrameSeedSource
import pandas as pd

df = pd.read_csv("data.csv")
seed_source = DataFrameSeedSource(dataframe=df)
config_builder.with_seed_dataset(seed_source)

# After (NMP service)
# 1. Save DataFrame and upload to Fileset
import tempfile
import pandas as pd

df = pd.read_csv("data.csv")

with tempfile.NamedTemporaryFile(suffix=".parquet", delete=False) as tmp:
    df.to_parquet(tmp.name)

    sdk.files.filesets.create(name="my-seed-data")
    sdk.files.upload(
        fileset="my-seed-data",
        local_path=tmp.name,
        remote_path="seed.parquet"
    )

# 2. Reference Fileset in config
from nemo_platform.data_designer import FilesetFileSeedSource

seed_source = FilesetFileSeedSource(path="default/my-seed-data#seed.parquet")
config_builder.with_seed_dataset(seed_source)

HuggingFace (No Changes Needed)#

# Works the same in both!
from data_designer.config import HuggingFaceSeedSource

# Public dataset
seed_source = HuggingFaceSeedSource(
    path="datasets/username/dataset/data/*.parquet"
)

# Private dataset - update token to reference NMP secret
seed_source = HuggingFaceSeedSource(
    path="datasets/username/dataset/data/*.parquet",
    token="default/hf-token"  # Reference to NMP secret
)

config_builder.with_seed_dataset(seed_source)

Step 6: Update Execution Calls#

The method names stay the same, but the client is different:

# Before (standalone library)
# data_designer is a `DataDesigner` object from the OSS library
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000, dataset_name="my-dataset")

# After (NMP service)
# data_designer is a `DataDesignerResource` from the NMP SDK
preview = data_designer.preview(config_builder, num_records=10)
job = data_designer.create(config_builder, num_records=1000)

Note: The dataset_name parameter is not available in the service client. Job names are auto-generated.

Step 7: Update Result Access#

Result access is similar but with some differences:

# Preview results (identical)
preview.dataset  # pandas DataFrame
preview.analysis.to_report()  # Analysis report
preview.display_sample_record()  # Display sample

# Create results
# Before (standalone library)
results = data_designer.create(config_builder, num_records=1000)
dataset = results.dataset  # Available immediately
analysis = results.analysis

# After (NMP service)
job = data_designer.create(config_builder, num_records=1000)
job.wait_until_done()  # Must wait for job completion
results = job.download_artifacts()  # Download results from artifact storage
dataset = results.load_dataset()
analysis = results.load_analysis()

What Stays the Same#

All configuration code remains identical. You can copy your existing config_builder code directly without any changes.

Configuration APIs#

API

Status

Notes

DataDesignerConfigBuilder(model_configs)

✅ Identical

Constructor signature unchanged

config_builder.add_column(...)

✅ Identical

All column types supported

config_builder.add_constraint(...)

✅ Identical

All constraint types supported

config_builder.with_seed_dataset(...)

✅ Identical

Method signature unchanged (seed sources differ)

config_builder.build()

✅ Identical

Returns same DataDesignerConfig object

Column Types#

All column types work identically:

  • SamplerColumnConfig - All sampler types and parameters

  • LLMTextColumnConfig - Text generation with prompts

  • LLMCodeColumnConfig - Code generation

  • LLMStructuredColumnConfig - JSON generation with schemas

  • LLMJudgeColumnConfig - Quality scoring

  • ExpressionColumnConfig - Jinja2 transformations

  • EmbeddingColumnConfig - Vector embeddings

  • ValidationColumnConfig - Code and HTTP validation

  • SeedDatasetColumnConfig - Automatically added with seed data

Other Features#

  • Jinja2 templating in prompts - Reference other columns with {{ column_name }}

  • Constraints - All constraint types (scalar, column inequalities)

  • Inference parameters - Temperature, top_p, max_tokens, etc.

  • Sampler parameters - All distributions and configurations

  • Column dependencies - Automatic resolution based on references

What Changes#

Required Changes#

These changes are mandatory for migration:

Component

Standalone Library

NMP Service

Migration Step

Import

from data_designer.interface import DataDesigner

from nemo_platform import NeMoPlatform

Step 2

Client

DataDesigner(artifact_path="...")

NeMoPlatform(base_url="...", workspace="...").data_designer

Step 4

Model Providers

Direct ModelProvider objects

Entities created in Models service

Step 3

Local Seed Files

LocalFileSeedSource

FilesetFileSeedSource (upload first)

Step 5

DataFrame Seeds

DataFrameSeedSource

FilesetFileSeedSource (upload first)

Step 5

Behavioral Changes#

These differences affect how you interact with results:

Feature

Standalone Library

NMP Service

Execution

Synchronous (blocks until complete)

Asynchronous jobs (returns immediately)

Result Access

results.dataset (immediate)

job.load_dataset() (after completion)

Artifact Storage

Local filesystem

NMP artifact storage

Job Tracking

No tracking

Full job status and monitoring

Note: You can use wait_until_done=True with create() for synchronous behavior similar to the standalone library.

Unsupported Features#

The following standalone library features are not available in the NMP service:

Feature

Status

Workaround

LocalFileSeedSource

❌ Not supported

Upload to Fileset, use FilesetFileSeedSource

DataFrameSeedSource

❌ Not supported

Save to file, upload to Fileset

Custom Python function validators

❌ Not supported

Use code validators or HTTP validators

Local model providers

❌ Not supported

Use remote inference endpoints via Inference Gateway

MCP tools

❌ Not supported

No workaround, support for this may be added in a future NMP version

Complete Migration Example#

Here’s a full before/after example:

from data_designer.interface import DataDesigner
import data_designer.config as dd

# Define model providers
model_providers = [
    dd.ModelProvider(
        name="nvidia-build",
        endpoint="https://integrate.api.nvidia.com",
        api_key="nvapi-xxx"
    )
]

# Model configuration
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",
        provider="nvidia-build",  # References provider name
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# Build configuration
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="category",
        sampler_type=dd.SamplerType.CATEGORY,
        params=dd.CategorySamplerParams(values=["A", "B", "C"]),
    )
)
config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="description",
        prompt="Describe a {{ category }} product.",
        model_alias="text",
    )
)

# Execute
data_designer = DataDesigner(
    artifact_path="./artifacts",
    model_providers=model_providers
)
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000)

# Access results
dataset = results.dataset
analysis = results.analysis
import os
from nemo_platform import NeMoPlatform
import data_designer.config as dd

sdk = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)

sdk.secrets.create(
    name="nvidia-api-key",
    data="nvapi-xxx",
    description="NVIDIA API key"
)

sdk.inference.providers.create(
    name="nvidia-build",
    description="NVIDIA Build API",
    host_url="https://integrate.api.nvidia.com",
    api_key_secret_name="nvidia-api-key"
)

# Model configuration
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",
        provider="default/nvidia-build",  # Reference NMP provider
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# Build configuration (identical!)
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="category",
        sampler_type=dd.SamplerType.CATEGORY,
        params=dd.CategorySamplerParams(values=["A", "B", "C"]),
    )
)
config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="description",
        prompt="Describe a {{ category }} product.",
        model_alias="text",
    )
)

# Execute
data_designer = sdk.data_designer
preview = data_designer.preview(config_builder, num_records=10)
job = data_designer.create(config_builder, num_records=1000)

# Access results
job.wait_until_done()
dataset = job.load_dataset()
analysis = job.load_analysis()

Benefits of Migration#

Migrating to the NMP service provides:

  • Scalability: Distributed execution for large datasets

  • Monitoring: Job tracking and status updates

  • Artifact Management: Centralized storage and versioning

  • Team Collaboration: Shared workspaces and resources

  • Security: Centralized secret management

  • Infrastructure: Managed inference and compute resources

Getting Help#