Model configurations define the specific models you use for synthetic data generation and their associated inference parameters. Each ModelConfig represents a named model that can be referenced throughout your data generation workflows.
A ModelConfig specifies which LLM model to use and how it should behave during generation. When you create column configurations (like LLMText, LLMCode, or LLMStructured), you reference a model by its alias. Data Designer uses the model configuration to determine which model to call and with what parameters.
The ModelConfig class has the following fields:
Experiment with max_tokens for Task-Specific Model Configurations
The number of tokens required to generate a single data entry can vary significantly with use case. For example, reasoning models often need more tokens to “think through” problems before generating a response. Note that max_tokens specifies the maximum number of output tokens to generate in the response, so set this value based on the expected length of the generated content.
By default, Data Designer runs a health check for each model before starting data generation to ensure the model is accessible and configured correctly. You can skip this health check for specific models by setting skip_health_check=True:
When to Skip Health Checks Skipping health checks can be useful when:
Note that skipping health checks means errors will only be discovered during actual data generation.