Download this tutorial as a Jupyter notebook

Quick Start#

This guide walks you through setting up inference and running your first Data Designer job.

Prerequisites#

  • Access to an NMP deployment

  • An API key for a model provider (e.g., NVIDIA Build, OpenAI)

Step 1: Install the SDK#

Install the NMP SDK with Data Designer support:

pip install nemo-platform[data-designer]

The [data-designer] extra includes the data_designer.config package for building configurations.

Step 2: Initialize the SDK#

import os
from nemo_platform import NeMoPlatform

base_url = os.environ.get("NMP_BASE_URL", "http://localhost:8080")
sdk = NeMoPlatform(base_url=base_url, workspace="default")

Step 3: Configure Inference#

Data Designer routes all inference through the Inference Gateway service. You need a model provider configured.

Note

The platform pre-configures a system/nvidia-build model provider during startup. This provider routes inference requests to models hosted on build.nvidia.com using the API base URL https://integrate.api.nvidia.com and the NGC API key with Public API Endpoints permissions provided during deployment (automatically saved as the built-in system/ngc-api-key secret).

You can verify this provider exists by running nmp inference providers list --workspace system.

The tutorials in these docs use this provider for inference, but you can alternatively create your own and use it instead.

Step 4: Build a Configuration#

Use the data_designer.config package to define your dataset:

import data_designer.config as dd

# Define model configuration
model_configs = [
    dd.ModelConfig(
        provider="system/nvidia-build",
        model="nvidia/nemotron-3-nano-30b-a3b",  # Use the `served_model_name` from the provider
        alias="text",
        inference_parameters=dd.ChatCompletionInferenceParams(
            temperature=1.0,
            top_p=1.0,
        ),
    )
]

# Create config builder
config_builder = dd.DataDesignerConfigBuilder(model_configs)

# Add columns
config_builder.add_column(
    dd.SamplerColumnConfig(
        name="category",
        sampler_type=dd.SamplerType.CATEGORY,
        params=dd.CategorySamplerParams(
            values=["Electronics", "Clothing", "Books"]
        ),
    )
)

config_builder.add_column(
    dd.LLMTextColumnConfig(
        name="product_name",
        prompt="Generate a creative product name for a {{ category }} product.",
        model_alias="text",
    )
)

Step 5: Preview Your Dataset#

Use the preview method for fast iteration:

data_designer = sdk.data_designer

preview = data_designer.preview(config_builder)

# View sample records
preview.display_sample_record()

# Access as DataFrame
df = preview.dataset
print(df.head())

# View analysis
preview.analysis.to_report()
More about preview results

The PreviewResults object returned by sdk.data_designer.preview stores all its fields in memory; nothing is persisted to disk by default. Use standard Python methods to save any preview data you want to keep around longer term. For example, the dataset is a regular Pandas DataFrame and can be saved to disk via methods like to_csv or to_parquet.

Step 6: Generate Full Dataset#

When satisfied with the preview, create a full dataset:

# Defaulting to 30 for demo speed purposes. Happy with the output? Scale it up!
job = data_designer.create(config_builder, num_records=30)

# Wait for completion
job.wait_until_done()

# Download and view results
results = job.download_artifacts()
dataset = results.load_dataset()
analysis = results.load_analysis()

print(dataset.head())
analysis.to_report()
More about job results

The Data Designer library writes several artifacts to disk when running a full generation job, including the final dataset as parquet. When a Data Designer job runs on NMP, the entire working directory of artifacts produced by the library is saved as a job result. The download_artifacts method downloads this artifacts directory (stored in NMP as a .tar.gz archive), unarchives it, and returns a DataDesignerJobResults object that can be used to load results into memory as DataFrames or other objects for programmatic inspection.

By default, download_artifacts saves the artifacts to a relative local directory named after the job. An alternative path can be passed to download_artifacts.

Next Steps#