Migrating from Standalone Library#

If you’re already using the standalone DataDesigner library, this guide shows you how to migrate to the NMP service.

Key Insight

Your configuration code stays the same. All config_builder code (e.g. columns, constraints) works identically. Only the execution interface changes, and some features are not supported (see below).

Migration Summary#

What changes:

Execution interface (imports and client initialization)
Model provider setup (reference by name instead of direct configuration)
Seed data sources (use Filesets or HuggingFace instead of local files)

What stays the same:

All column configurations (samplers, LLM columns, expressions, etc.)
Constraints and validation logic
Jinja2 templating and prompt syntax
Method names: preview(), create()

Why migrate: Get distributed execution, job monitoring, centralized secrets, and team collaboration.

Quick Overview#

Standalone Library

from data_designer.interface import DataDesigner
import data_designer.config as dd

# Build config
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(...)

# Execute locally
data_designer = DataDesigner(artifact_path="./artifacts")
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000)

NMP Service

import os
from nemo_platform import NeMoPlatform
import data_designer.config as dd

# Build config (identical)
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(...)

# Execute on NMP
data_designer = NeMoPlatform(base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), workspace="default").data_designer
preview = data_designer.preview(config_builder, num_records=10)
job = data_designer.create(config_builder, num_records=1000)

Step-by-Step Migration#

Step 1: Install the NMP SDK#

Replace or supplement your standalone library installation:

# Remove standalone library (optional)
pip uninstall data-designer

# Install NMP SDK with Data Designer support
pip install nemo-platform[data-designer]

The [data-designer] extra includes the data_designer.config package, so you can still build configurations the same way.

Note

The nemo-platform[data-designer] package pins to a specific version of the Data Designer library that matches the service version in your NMP deployment. This ensures compatibility between your configuration code and the service.

Step 2: Update Imports#

Change your execution imports:

# Before
from data_designer.interface import DataDesigner

# After
from nemo_platform import NeMoPlatform

Keep these imports unchanged:

import data_designer.config as dd  # Still works!

Step 3: Update Model Configurations#

In the standalone library, you pass ModelProvider objects to the DataDesigner constructor. In the NMP service, model providers are created and registered with the Models service. (In both contexts, model providers are referenced by name in each ModelConfig.)

# Before (standalone library)
from data_designer.interface import DataDesigner
import data_designer.config as dd

# Define model providers
model_providers = [
    dd.ModelProvider(
        name="nvidia-build",
        endpoint="https://integrate.api.nvidia.com",
        api_key="your-api-key"
    )
]

# Model configs reference providers by name
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",
        provider="nvidia-build",  # References the provider name
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# Pass providers to DataDesigner constructor
data_designer = DataDesigner(
    artifact_path="./artifacts",
    model_providers=model_providers
)

# After (NMP service)
import os
from nemo_platform import NeMoPlatform
import data_designer.config as dd

sdk = NeMoPlatform(base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), workspace="default")

# Create a secret for your API key
sdk.secrets.create(name="nvidia-build-api-key", data="your-api-key")

# Create a model provider
sdk.inference.providers.create(
    name="nvidia-build",
    host_url="https://integrate.api.nvidia.com",
    api_key_secret_name="nvidia-build-api-key",
)

# Model configs reference NMP model providers
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",  # Use the `served_model_name` from the provider
        provider="default/nvidia-build",  # workspace/provider-name
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# No need to pass providers - they're managed by Inference Gateway

Key changes:

Model providers are configured once with the Models service
No direct API keys in code - managed by Secrets service

Step 4: Update Client Initialization#

Replace the DataDesigner client with the NMP DataDesignerResource, using an instance of NeMo Platform:

# Before (standalone library)
data_designer = DataDesigner(
    artifact_path="./artifacts",
    model_providers=[...]  # Optional provider list
)

# After (NMP service)
import os
from nemo_platform import NeMoPlatform

sdk = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)
data_designer = sdk.data_designer

Step 5: Update Seed Data Sources (If Used)#

If you use seed datasets, you need to migrate local sources to remote ones:

Local Files → Filesets#

# Before (standalone library)
from data_designer.config import LocalFileSeedSource

seed_source = LocalFileSeedSource(path="./data/seed.csv")
config_builder.with_seed_dataset(seed_source)

# After (NMP service)
# 1. Upload file to Fileset
sdk.files.filesets.create(name="my-seed-data")
sdk.files.upload(
    fileset="my-seed-data",
    local_path="./data/seed.csv",
    remote_path="seed.csv"
)

# 2. Reference Fileset in config
from nemo_platform.data_designer import FilesetFileSeedSource

seed_source = FilesetFileSeedSource(path="default/my-seed-data#seed.csv")
config_builder.with_seed_dataset(seed_source)

DataFrames → Filesets#

# Before (standalone library)
from data_designer.config import DataFrameSeedSource
import pandas as pd

df = pd.read_csv("data.csv")
seed_source = DataFrameSeedSource(dataframe=df)
config_builder.with_seed_dataset(seed_source)

# After (NMP service)
# 1. Save DataFrame and upload to Fileset
import tempfile
import pandas as pd

df = pd.read_csv("data.csv")

with tempfile.NamedTemporaryFile(suffix=".parquet", delete=False) as tmp:
    df.to_parquet(tmp.name)

    sdk.files.filesets.create(name="my-seed-data")
    sdk.files.upload(
        fileset="my-seed-data",
        local_path=tmp.name,
        remote_path="seed.parquet"
    )

# 2. Reference Fileset in config
from nemo_platform.data_designer import FilesetFileSeedSource

seed_source = FilesetFileSeedSource(path="default/my-seed-data#seed.parquet")
config_builder.with_seed_dataset(seed_source)

HuggingFace (No Changes Needed)#

# Works the same in both!
from data_designer.config import HuggingFaceSeedSource

# Public dataset
seed_source = HuggingFaceSeedSource(
    path="datasets/username/dataset/data/*.parquet"
)

# Private dataset - update token to reference NMP secret
seed_source = HuggingFaceSeedSource(
    path="datasets/username/dataset/data/*.parquet",
    token="default/hf-token"  # Reference to NMP secret
)

config_builder.with_seed_dataset(seed_source)

Step 6: Update Execution Calls#

The method names stay the same, but the client is different:

# Before (standalone library)
# data_designer is a `DataDesigner` object from the OSS library
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000, dataset_name="my-dataset")

# After (NMP service)
# data_designer is a `DataDesignerResource` from the NMP SDK
preview = data_designer.preview(config_builder, num_records=10)
job = data_designer.create(config_builder, num_records=1000)

Note: The dataset_name parameter is not available in the service client. Job names are auto-generated.

Step 7: Update Result Access#

Result access is similar but with some differences:

# Preview results (identical)
preview.dataset  # pandas DataFrame
preview.analysis.to_report()  # Analysis report
preview.display_sample_record()  # Display sample

# Create results
# Before (standalone library)
results = data_designer.create(config_builder, num_records=1000)
dataset = results.dataset  # Available immediately
analysis = results.analysis

# After (NMP service)
job = data_designer.create(config_builder, num_records=1000)
job.wait_until_done()  # Must wait for job completion
results = job.download_artifacts()  # Download results from artifact storage
dataset = results.load_dataset()
analysis = results.load_analysis()

What Stays the Same#

All configuration code remains identical. You can copy your existing config_builder code directly without any changes.

Configuration APIs#

API	Status	Notes
`DataDesignerConfigBuilder(model_configs)`	✅ Identical	Constructor signature unchanged
`config_builder.add_column(...)`	✅ Identical	All column types supported
`config_builder.add_constraint(...)`	✅ Identical	All constraint types supported
`config_builder.with_seed_dataset(...)`	✅ Identical	Method signature unchanged (seed sources differ)
`config_builder.build()`	✅ Identical	Returns same `DataDesignerConfig` object

Column Types#

All column types work identically:

✅ SamplerColumnConfig - All sampler types and parameters
✅ LLMTextColumnConfig - Text generation with prompts
✅ LLMCodeColumnConfig - Code generation
✅ LLMStructuredColumnConfig - JSON generation with schemas
✅ LLMJudgeColumnConfig - Quality scoring
✅ ExpressionColumnConfig - Jinja2 transformations
✅ EmbeddingColumnConfig - Vector embeddings
✅ ValidationColumnConfig - Code and HTTP validation
✅ SeedDatasetColumnConfig - Automatically added with seed data

Other Features#

✅ Jinja2 templating in prompts - Reference other columns with {{ column_name }}
✅ Constraints - All constraint types (scalar, column inequalities)
✅ Inference parameters - Temperature, top_p, max_tokens, etc.
✅ Sampler parameters - All distributions and configurations
✅ Column dependencies - Automatic resolution based on references

What Changes#

Required Changes#

These changes are mandatory for migration:

Component	Standalone Library	NMP Service	Migration Step
Import	`from data_designer.interface import DataDesigner`	`from nemo_platform import NeMoPlatform`	Step 2
Client	`DataDesigner(artifact_path="...")`	`NeMoPlatform(base_url="...", workspace="...").data_designer`	Step 4
Model Providers	Direct `ModelProvider` objects	Entities created in Models service	Step 3
Local Seed Files	`LocalFileSeedSource`	`FilesetFileSeedSource` (upload first)	Step 5
DataFrame Seeds	`DataFrameSeedSource`	`FilesetFileSeedSource` (upload first)	Step 5

Behavioral Changes#

These differences affect how you interact with results:

Feature	Standalone Library	NMP Service
Execution	Synchronous (blocks until complete)	Asynchronous jobs (returns immediately)
Result Access	`results.dataset` (immediate)	`job.load_dataset()` (after completion)
Artifact Storage	Local filesystem	NMP artifact storage
Job Tracking	No tracking	Full job status and monitoring

Note: You can use wait_until_done=True with create() for synchronous behavior similar to the standalone library.

Unsupported Features#

The following standalone library features are not available in the NMP service:

Feature	Status	Workaround
`LocalFileSeedSource`	❌ Not supported	Upload to Fileset, use `FilesetFileSeedSource`
`DataFrameSeedSource`	❌ Not supported	Save to file, upload to Fileset
Custom Python function validators	❌ Not supported	Use code validators or HTTP validators
Local model providers	❌ Not supported	Use remote inference endpoints via Inference Gateway
MCP tools	❌ Not supported	No workaround, support for this may be added in a future NMP version

Complete Migration Example#

Here’s a full before/after example:

Before (Standalone)

from data_designer.interface import DataDesigner
import data_designer.config as dd

# Define model providers
model_providers = [
    dd.ModelProvider(
        name="nvidia-build",
        endpoint="https://integrate.api.nvidia.com",
        api_key="nvapi-xxx"
    )
]

# Model configuration
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",
        provider="nvidia-build",  # References provider name
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# Build configuration
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="category",
        sampler_type=dd.SamplerType.CATEGORY,
        params=dd.CategorySamplerParams(values=["A", "B", "C"]),
    )
)
config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="description",
        prompt="Describe a {{ category }} product.",
        model_alias="text",
    )
)

# Execute
data_designer = DataDesigner(
    artifact_path="./artifacts",
    model_providers=model_providers
)
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000)

# Access results
dataset = results.dataset
analysis = results.analysis

After (NMP Service)

import os
from nemo_platform import NeMoPlatform
import data_designer.config as dd

sdk = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)

sdk.secrets.create(
    name="nvidia-api-key",
    data="nvapi-xxx",
    description="NVIDIA API key"
)

sdk.inference.providers.create(
    name="nvidia-build",
    description="NVIDIA Build API",
    host_url="https://integrate.api.nvidia.com",
    api_key_secret_name="nvidia-api-key"
)

# Model configuration
model_configs = [
    dd.ModelConfig(
        alias="text",
        model="nvidia/nemotron-3-nano-30b-a3b",
        provider="default/nvidia-build",  # Reference NMP provider
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# Build configuration (identical!)
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="category",
        sampler_type=dd.SamplerType.CATEGORY,
        params=dd.CategorySamplerParams(values=["A", "B", "C"]),
    )
)
config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="description",
        prompt="Describe a {{ category }} product.",
        model_alias="text",
    )
)

# Execute
data_designer = sdk.data_designer
preview = data_designer.preview(config_builder, num_records=10)
job = data_designer.create(config_builder, num_records=1000)

# Access results
job.wait_until_done()
dataset = job.load_dataset()
analysis = job.load_analysis()

Benefits of Migration#

Migrating to the NMP service provides:

Scalability: Distributed execution for large datasets
Monitoring: Job tracking and status updates
Artifact Management: Centralized storage and versioning
Team Collaboration: Shared workspaces and resources
Security: Centralized secret management
Infrastructure: Managed inference and compute resources

Getting Help#

Quick Start: See the quickstart guide for setup instructions
Tutorials: Follow the tutorials for hands-on examples
Library Docs: Refer to the open-source library documentation for configuration details