Column Types#
Data Designer supports various column types that determine how data is generated. This guide explains the different column types available and how to use them.
Prerequisites#
Before using any column types, ensure you have the Data Designer client and configuration builder set up:
import os
from nemo_microservices import NeMoMicroservices
from nemo_microservices.beta.data_designer import DataDesignerClient, DataDesignerConfigBuilder
from nemo_microservices.beta.data_designer.config import columns as C
from nemo_microservices.beta.data_designer.config import params as P
data_designer_client = DataDesignerClient(
client=NeMoMicroservices(base_url=os.environ["NEMO_MICROSERVICES_BASE_URL"])
)
config_builder = DataDesignerConfigBuilder(model_configs="path/to/your/model_configs.yaml")
How to Define Columns#
Data Designer offers two approaches to define columns:
Simplified API: Direct parameter passing with string type names
Typed API: More verbose but provides better type checking and IDE support
Simplified API |
Typed API |
---|---|
You prefer concise, readable code |
You want code completion and type checking in your IDE |
You’re working on quick prototypes or simple designs |
You’re working on complex designs where type safety helps prevent errors |
You don’t need IDE autocompletion for parameters |
You need clarity about available parameters and their types |
You’re collaborating with a team and want more self-documenting code |
Both approaches offer the same functionality – choose the style that works best for your needs.
The simplified approach is concise and easy to use:
# Simplified API approach
config_builder.add_column(
name="product_category",
type="category",
params={"values": ["Electronics", "Clothing", "Home Goods"]}
)
The typed API provides better code completion and type checking:
from nemo_microservices.beta.data_designer.config import columns as C
from nemo_microservices.beta.data_designer.config import params as P
# Typed API approach
config_builder.add_column(
C.SamplerColumn(
name="product_category",
type=P.SamplerType.CATEGORY,
params=P.CategorySamplerParams(values=["Electronics", "Clothing", "Home Goods"])
)
)
Column Types#
Generate data by sampling statistical distributions, categories, dates, and person entities.
Compute values using jinja expressions that reference other columns.
Generate text, code, and structured data using large language models.
For end-to-end examples, we recommend following along with the Tutorial Notebooks.