Column Types#

Data Designer supports various column types that determine how data is generated. This guide explains the different column types available and how to use them.

Prerequisites#

Before using any column types, ensure you have the Data Designer client and configuration builder set up:

import os
from nemo_microservices import NeMoMicroservices
from nemo_microservices.beta.data_designer import DataDesignerClient, DataDesignerConfigBuilder
from nemo_microservices.beta.data_designer.config import columns as C
from nemo_microservices.beta.data_designer.config import params as P

data_designer_client = DataDesignerClient(
    client=NeMoMicroservices(base_url=os.environ["NEMO_MICROSERVICES_BASE_URL"])
)

config_builder = DataDesignerConfigBuilder(model_configs="path/to/your/model_configs.yaml")

How to Define Columns#

Data Designer offers two approaches to define columns:

  1. Simplified API: Direct parameter passing with string type names

  2. Typed API: More verbose but provides better type checking and IDE support

Simplified API

Typed API

You prefer concise, readable code

You want code completion and type checking in your IDE

You’re working on quick prototypes or simple designs

You’re working on complex designs where type safety helps prevent errors

You don’t need IDE autocompletion for parameters

You need clarity about available parameters and their types

You’re collaborating with a team and want more self-documenting code

Both approaches offer the same functionality – choose the style that works best for your needs.

The simplified approach is concise and easy to use:

# Simplified API approach
config_builder.add_column(
    name="product_category",
    type="category",
    params={"values": ["Electronics", "Clothing", "Home Goods"]}
)

The typed API provides better code completion and type checking:

from nemo_microservices.beta.data_designer.config import columns as C
from nemo_microservices.beta.data_designer.config import params as P

# Typed API approach
config_builder.add_column(
    C.SamplerColumn(
        name="product_category",
        type=P.SamplerType.CATEGORY,
        params=P.CategorySamplerParams(values=["Electronics", "Clothing", "Home Goods"])
    )
)

Column Types#

Sampling-Based Columns

Generate data by sampling statistical distributions, categories, dates, and person entities.

Sampling-Based Columns
Expression Columns

Compute values using jinja expressions that reference other columns.

Expression Columns
LLM-Based Columns

Generate text, code, and structured data using large language models.

LLM-Based Columns

For end-to-end examples, we recommend following along with the Tutorial Notebooks.