Download this tutorial as a Jupyter notebook
Quick Start#
This guide walks you through setting up inference and running your first Data Designer job.
Prerequisites#
Access to an NMP deployment
An API key for a model provider (e.g., NVIDIA Build, OpenAI)
Step 1: Install the SDK#
Install the NMP SDK with Data Designer support:
pip install nemo-platform[data-designer]
The [data-designer] extra includes the data_designer.config package for building configurations.
Step 2: Initialize the SDK#
import os
from nemo_platform import NeMoPlatform
base_url = os.environ.get("NMP_BASE_URL", "http://localhost:8080")
sdk = NeMoPlatform(base_url=base_url, workspace="default")
Step 3: Configure Inference#
Data Designer routes all inference through the Inference Gateway service. You need a model provider configured.
Note
The platform pre-configures a system/nvidia-build model provider during startup.
This provider routes inference requests to models hosted on build.nvidia.com using the API base URL https://integrate.api.nvidia.com
and the NGC API key with Public API Endpoints permissions provided during deployment (automatically saved as the built-in system/ngc-api-key secret).
You can verify this provider exists by running nmp inference providers list --workspace system.
The tutorials in these docs use this provider for inference, but you can alternatively create your own and use it instead.
Step 4: Build a Configuration#
Use the data_designer.config package to define your dataset:
import data_designer.config as dd
# Define model configuration
model_configs = [
dd.ModelConfig(
provider="system/nvidia-build",
model="nvidia/nemotron-3-nano-30b-a3b", # Use the `served_model_name` from the provider
alias="text",
inference_parameters=dd.ChatCompletionInferenceParams(
temperature=1.0,
top_p=1.0,
),
)
]
# Create config builder
config_builder = dd.DataDesignerConfigBuilder(model_configs)
# Add columns
config_builder.add_column(
dd.SamplerColumnConfig(
name="category",
sampler_type=dd.SamplerType.CATEGORY,
params=dd.CategorySamplerParams(
values=["Electronics", "Clothing", "Books"]
),
)
)
config_builder.add_column(
dd.LLMTextColumnConfig(
name="product_name",
prompt="Generate a creative product name for a {{ category }} product.",
model_alias="text",
)
)
Step 5: Preview Your Dataset#
Use the preview method for fast iteration:
data_designer = sdk.data_designer
preview = data_designer.preview(config_builder)
# View sample records
preview.display_sample_record()
# Access as DataFrame
df = preview.dataset
print(df.head())
# View analysis
preview.analysis.to_report()
More about preview results
The PreviewResults object returned by sdk.data_designer.preview stores all its fields in memory; nothing is persisted to disk by default.
Use standard Python methods to save any preview data you want to keep around longer term.
For example, the dataset is a regular Pandas DataFrame and can be saved to disk via methods like to_csv or to_parquet.
Step 6: Generate Full Dataset#
When satisfied with the preview, create a full dataset:
# Defaulting to 30 for demo speed purposes. Happy with the output? Scale it up!
job = data_designer.create(config_builder, num_records=30)
# Wait for completion
job.wait_until_done()
# Download and view results
results = job.download_artifacts()
dataset = results.load_dataset()
analysis = results.load_analysis()
print(dataset.head())
analysis.to_report()
More about job results
The Data Designer library writes several artifacts to disk when running a full generation job, including the final dataset as parquet.
When a Data Designer job runs on NMP, the entire working directory of artifacts produced by the library is saved as a job result.
The download_artifacts method downloads this artifacts directory (stored in NMP as a .tar.gz archive),
unarchives it, and returns a DataDesignerJobResults object that can be used to load results into memory as DataFrames or other objects for programmatic inspection.
By default, download_artifacts saves the artifacts to a relative local directory named after the job.
An alternative path can be passed to download_artifacts.
Next Steps#
Learn more: Explore the tutorials for detailed examples
Column types: See the library documentation for all available column types
Advanced features: Learn about processors and validators