Migrating from Standalone Library#
If you’re already using the standalone DataDesigner library, this guide shows you how to migrate to the NMP service.
Key Insight
Your configuration code stays the same. All config_builder code (e.g. columns, constraints) works identically. Only the execution interface changes, and some features are not supported (see below).
Migration Summary#
What changes:
Execution interface (imports and client initialization)
Model provider setup (reference by name instead of direct configuration)
Seed data sources (use Filesets or HuggingFace instead of local files)
What stays the same:
All column configurations (samplers, LLM columns, expressions, etc.)
Constraints and validation logic
Jinja2 templating and prompt syntax
Method names:
preview(),create()
Why migrate: Get distributed execution, job monitoring, centralized secrets, and team collaboration.
Quick Overview#
Standalone Library
from data_designer.interface import DataDesigner
import data_designer.config as dd
# Build config
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(...)
# Execute locally
data_designer = DataDesigner(artifact_path="./artifacts")
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000)
NMP Service
import os
from nemo_platform import NeMoPlatform
import data_designer.config as dd
# Build config (identical)
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(...)
# Execute on NMP
data_designer = NeMoPlatform(base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), workspace="default").data_designer
preview = data_designer.preview(config_builder, num_records=10)
job = data_designer.create(config_builder, num_records=1000)
Step-by-Step Migration#
Step 1: Install the NMP SDK#
Replace or supplement your standalone library installation:
# Remove standalone library (optional)
pip uninstall data-designer
# Install NMP SDK with Data Designer support
pip install nemo-platform[data-designer]
The [data-designer] extra includes the data_designer.config package, so you can still build configurations the same way.
Note
The nemo-platform[data-designer] package pins to a specific version of the Data Designer library that matches the service version in your NMP deployment. This ensures compatibility between your configuration code and the service.
Step 2: Update Imports#
Change your execution imports:
# Before
from data_designer.interface import DataDesigner
# After
from nemo_platform import NeMoPlatform
Keep these imports unchanged:
import data_designer.config as dd # Still works!
Step 3: Update Model Configurations#
In the standalone library, you pass ModelProvider objects to the DataDesigner constructor.
In the NMP service, model providers are created and registered with the Models service.
(In both contexts, model providers are referenced by name in each ModelConfig.)
# Before (standalone library)
from data_designer.interface import DataDesigner
import data_designer.config as dd
# Define model providers
model_providers = [
dd.ModelProvider(
name="nvidia-build",
endpoint="https://integrate.api.nvidia.com",
api_key="your-api-key"
)
]
# Model configs reference providers by name
model_configs = [
dd.ModelConfig(
alias="text",
model="nvidia/nemotron-3-nano-30b-a3b",
provider="nvidia-build", # References the provider name
inference_parameters=dd.ChatCompletionInferenceParams(
temperature=1.0,
top_p=1.0,
),
)
]
# Pass providers to DataDesigner constructor
data_designer = DataDesigner(
artifact_path="./artifacts",
model_providers=model_providers
)
# After (NMP service)
import os
from nemo_platform import NeMoPlatform
import data_designer.config as dd
sdk = NeMoPlatform(base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"), workspace="default")
# Create a secret for your API key
sdk.secrets.create(name="nvidia-build-api-key", data="your-api-key")
# Create a model provider
sdk.inference.providers.create(
name="nvidia-build",
host_url="https://integrate.api.nvidia.com",
api_key_secret_name="nvidia-build-api-key",
)
# Model configs reference NMP model providers
model_configs = [
dd.ModelConfig(
alias="text",
model="nvidia/nemotron-3-nano-30b-a3b", # Use the `served_model_name` from the provider
provider="default/nvidia-build", # workspace/provider-name
inference_parameters=dd.ChatCompletionInferenceParams(
temperature=1.0,
top_p=1.0,
),
)
]
# No need to pass providers - they're managed by Inference Gateway
Key changes:
Model providers are configured once with the Models service
No direct API keys in code - managed by Secrets service
Step 4: Update Client Initialization#
Replace the DataDesigner client with the NMP DataDesignerResource, using an instance of NeMo Platform:
# Before (standalone library)
data_designer = DataDesigner(
artifact_path="./artifacts",
model_providers=[...] # Optional provider list
)
# After (NMP service)
import os
from nemo_platform import NeMoPlatform
sdk = NeMoPlatform(
base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
workspace="default",
)
data_designer = sdk.data_designer
Step 5: Update Seed Data Sources (If Used)#
If you use seed datasets, you need to migrate local sources to remote ones:
Local Files → Filesets#
# Before (standalone library)
from data_designer.config import LocalFileSeedSource
seed_source = LocalFileSeedSource(path="./data/seed.csv")
config_builder.with_seed_dataset(seed_source)
# After (NMP service)
# 1. Upload file to Fileset
sdk.files.filesets.create(name="my-seed-data")
sdk.files.upload(
fileset="my-seed-data",
local_path="./data/seed.csv",
remote_path="seed.csv"
)
# 2. Reference Fileset in config
from nemo_platform.data_designer import FilesetFileSeedSource
seed_source = FilesetFileSeedSource(path="default/my-seed-data#seed.csv")
config_builder.with_seed_dataset(seed_source)
DataFrames → Filesets#
# Before (standalone library)
from data_designer.config import DataFrameSeedSource
import pandas as pd
df = pd.read_csv("data.csv")
seed_source = DataFrameSeedSource(dataframe=df)
config_builder.with_seed_dataset(seed_source)
# After (NMP service)
# 1. Save DataFrame and upload to Fileset
import tempfile
import pandas as pd
df = pd.read_csv("data.csv")
with tempfile.NamedTemporaryFile(suffix=".parquet", delete=False) as tmp:
df.to_parquet(tmp.name)
sdk.files.filesets.create(name="my-seed-data")
sdk.files.upload(
fileset="my-seed-data",
local_path=tmp.name,
remote_path="seed.parquet"
)
# 2. Reference Fileset in config
from nemo_platform.data_designer import FilesetFileSeedSource
seed_source = FilesetFileSeedSource(path="default/my-seed-data#seed.parquet")
config_builder.with_seed_dataset(seed_source)
HuggingFace (No Changes Needed)#
# Works the same in both!
from data_designer.config import HuggingFaceSeedSource
# Public dataset
seed_source = HuggingFaceSeedSource(
path="datasets/username/dataset/data/*.parquet"
)
# Private dataset - update token to reference NMP secret
seed_source = HuggingFaceSeedSource(
path="datasets/username/dataset/data/*.parquet",
token="default/hf-token" # Reference to NMP secret
)
config_builder.with_seed_dataset(seed_source)
Step 6: Update Execution Calls#
The method names stay the same, but the client is different:
# Before (standalone library)
# data_designer is a `DataDesigner` object from the OSS library
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000, dataset_name="my-dataset")
# After (NMP service)
# data_designer is a `DataDesignerResource` from the NMP SDK
preview = data_designer.preview(config_builder, num_records=10)
job = data_designer.create(config_builder, num_records=1000)
Note: The dataset_name parameter is not available in the service client. Job names are auto-generated.
Step 7: Update Result Access#
Result access is similar but with some differences:
# Preview results (identical)
preview.dataset # pandas DataFrame
preview.analysis.to_report() # Analysis report
preview.display_sample_record() # Display sample
# Create results
# Before (standalone library)
results = data_designer.create(config_builder, num_records=1000)
dataset = results.dataset # Available immediately
analysis = results.analysis
# After (NMP service)
job = data_designer.create(config_builder, num_records=1000)
job.wait_until_done() # Must wait for job completion
results = job.download_artifacts() # Download results from artifact storage
dataset = results.load_dataset()
analysis = results.load_analysis()
What Stays the Same#
All configuration code remains identical. You can copy your existing config_builder code directly without any changes.
Configuration APIs#
API |
Status |
Notes |
|---|---|---|
|
✅ Identical |
Constructor signature unchanged |
|
✅ Identical |
All column types supported |
|
✅ Identical |
All constraint types supported |
|
✅ Identical |
Method signature unchanged (seed sources differ) |
|
✅ Identical |
Returns same |
Column Types#
All column types work identically:
✅
SamplerColumnConfig- All sampler types and parameters✅
LLMTextColumnConfig- Text generation with prompts✅
LLMCodeColumnConfig- Code generation✅
LLMStructuredColumnConfig- JSON generation with schemas✅
LLMJudgeColumnConfig- Quality scoring✅
ExpressionColumnConfig- Jinja2 transformations✅
EmbeddingColumnConfig- Vector embeddings✅
ValidationColumnConfig- Code and HTTP validation✅
SeedDatasetColumnConfig- Automatically added with seed data
Other Features#
✅ Jinja2 templating in prompts - Reference other columns with
{{ column_name }}✅ Constraints - All constraint types (scalar, column inequalities)
✅ Inference parameters - Temperature, top_p, max_tokens, etc.
✅ Sampler parameters - All distributions and configurations
✅ Column dependencies - Automatic resolution based on references
What Changes#
Required Changes#
These changes are mandatory for migration:
Component |
Standalone Library |
NMP Service |
Migration Step |
|---|---|---|---|
Import |
|
|
|
Client |
|
|
|
Model Providers |
Direct |
Entities created in Models service |
|
Local Seed Files |
|
|
|
DataFrame Seeds |
|
|
Behavioral Changes#
These differences affect how you interact with results:
Feature |
Standalone Library |
NMP Service |
|---|---|---|
Execution |
Synchronous (blocks until complete) |
Asynchronous jobs (returns immediately) |
Result Access |
|
|
Artifact Storage |
Local filesystem |
NMP artifact storage |
Job Tracking |
No tracking |
Full job status and monitoring |
Note: You can use wait_until_done=True with create() for synchronous behavior similar to the standalone library.
Unsupported Features#
The following standalone library features are not available in the NMP service:
Feature |
Status |
Workaround |
|---|---|---|
|
❌ Not supported |
Upload to Fileset, use |
|
❌ Not supported |
Save to file, upload to Fileset |
Custom Python function validators |
❌ Not supported |
Use code validators or HTTP validators |
Local model providers |
❌ Not supported |
Use remote inference endpoints via Inference Gateway |
MCP tools |
❌ Not supported |
No workaround, support for this may be added in a future NMP version |
Complete Migration Example#
Here’s a full before/after example:
from data_designer.interface import DataDesigner
import data_designer.config as dd
# Define model providers
model_providers = [
dd.ModelProvider(
name="nvidia-build",
endpoint="https://integrate.api.nvidia.com",
api_key="nvapi-xxx"
)
]
# Model configuration
model_configs = [
dd.ModelConfig(
alias="text",
model="nvidia/nemotron-3-nano-30b-a3b",
provider="nvidia-build", # References provider name
inference_parameters=dd.ChatCompletionInferenceParams(
temperature=1.0,
top_p=1.0,
),
)
]
# Build configuration
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(
dd.SamplerColumnConfig(
name="category",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(values=["A", "B", "C"]),
)
)
config_builder.add_column(
dd.LLMTextColumnConfig(
name="description",
prompt="Describe a {{ category }} product.",
model_alias="text",
)
)
# Execute
data_designer = DataDesigner(
artifact_path="./artifacts",
model_providers=model_providers
)
preview = data_designer.preview(config_builder, num_records=10)
results = data_designer.create(config_builder, num_records=1000)
# Access results
dataset = results.dataset
analysis = results.analysis
import os
from nemo_platform import NeMoPlatform
import data_designer.config as dd
sdk = NeMoPlatform(
base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
workspace="default",
)
sdk.secrets.create(
name="nvidia-api-key",
data="nvapi-xxx",
description="NVIDIA API key"
)
sdk.inference.providers.create(
name="nvidia-build",
description="NVIDIA Build API",
host_url="https://integrate.api.nvidia.com",
api_key_secret_name="nvidia-api-key"
)
# Model configuration
model_configs = [
dd.ModelConfig(
alias="text",
model="nvidia/nemotron-3-nano-30b-a3b",
provider="default/nvidia-build", # Reference NMP provider
inference_parameters=dd.ChatCompletionInferenceParams(
temperature=1.0,
top_p=1.0,
),
)
]
# Build configuration (identical!)
config_builder = dd.DataDesignerConfigBuilder(model_configs)
config_builder.add_column(
dd.SamplerColumnConfig(
name="category",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(values=["A", "B", "C"]),
)
)
config_builder.add_column(
dd.LLMTextColumnConfig(
name="description",
prompt="Describe a {{ category }} product.",
model_alias="text",
)
)
# Execute
data_designer = sdk.data_designer
preview = data_designer.preview(config_builder, num_records=10)
job = data_designer.create(config_builder, num_records=1000)
# Access results
job.wait_until_done()
dataset = job.load_dataset()
analysis = job.load_analysis()
Benefits of Migration#
Migrating to the NMP service provides:
Scalability: Distributed execution for large datasets
Monitoring: Job tracking and status updates
Artifact Management: Centralized storage and versioning
Team Collaboration: Shared workspaces and resources
Security: Centralized secret management
Infrastructure: Managed inference and compute resources
Getting Help#
Quick Start: See the quickstart guide for setup instructions
Tutorials: Follow the tutorials for hands-on examples
Library Docs: Refer to the open-source library documentation for configuration details