Create Data Generation Job#
Prerequisites#
Before you can create a data generation job, make sure that you have:
Obtained the base URL of your NeMo Data Designer service
Prepared your data generation configuration including:
Model configurations - Configure model aliases and inference parameters
Column schemas - Define your data column types and parameters
Model constraints - Optional validation rules for your data
Set the
NEMO_MICROSERVICES_BASE_URLenvironment variable to your NeMo Data Designer service endpoint
export NEMO_MICROSERVICES_BASE_URL="https://your-data-designer-service-url"
To Create a Data Generation Job#
Use the NeMoDataDesignerClient to create and monitor a job:
import os
from nemo_microservices.data_designer.essentials import (
CategorySamplerParams,
DataDesignerConfigBuilder,
InferenceParameters,
LLMStructuredColumn,
ModelConfig,
NeMoDataDesignerClient,
SamplerColumnConfig,
SamplerType,
)
# Create a configuration builder with your model
config_builder = DataDesignerConfigBuilder(
model_configs=[
ModelConfig(
alias="main-model",
model="meta/llama-3.3-70b-instruct",
inference_parameters=InferenceParameters(
temperature=0.90,
top_p=0.99,
max_tokens=2048,
),
),
]
)
# Add columns to define your data structure
config_builder.add_column(
SamplerColumnConfig(
name="language",
sampler_type=SamplerType.CATEGORY,
params=CategorySamplerParams(
values=["English", "French"]
)
)
)
config_builder.add_column(
LLMTextColumnConfig(
name="story",
prompt="Write one sentence about synthetic data in {{ language }} language",
model_alias="main_model"
)
)
# Initialize client
data_designer_client = NeMoDataDesignerClient(
base_url=os.environ["NEMO_MICROSERVICES_BASE_URL"]
)
# Create job with automatic waiting and result loading
job_result = data_designer_client.create(
config_builder,
num_records=100,
wait_until_done=True # Waits for completion automatically
)
# Access results as pandas DataFrame
df = job_result.load_dataset()
print(f"Job completed! Generated {len(df)} records.")