Inference parameters control how models generate responses during synthetic data generation. Data Designer provides three types of inference parameters: ChatCompletionInferenceParams for text/code/structured generation, EmbeddingInferenceParams for embedding generation, and ImageInferenceParams for image generation.
When you create a ModelConfig, you can specify inference parameters to adjust model behavior. These parameters control aspects like randomness (temperature), diversity (top_p), context size (max_tokens), and more. Data Designer supports both static values and dynamic distribution-based sampling for certain parameters.
The ChatCompletionInferenceParams class controls how models generate text completions (for text, code, and structured data generation). It provides fine-grained control over generation behavior and supports both static values and dynamic distribution-based sampling.
Default Values
If temperature, top_p, or max_tokens are not provided, the model provider’s default values will be used. Different providers and models may have different defaults.
Controlling Reasoning Effort for Reasoning Models
For reasoning models like Nemotron 3 Super (nvidia/nemotron-3-super-120b-a12b) and GPT-OSS (gpt-oss-20b, gpt-oss-120b), you can control the reasoning effort using the extra_body parameter:
Temperature:
0.0-0.3: Highly deterministic, focused outputs (ideal for structured/reasoning tasks)0.4-0.7: Balanced creativity and coherence (general purpose)0.8-1.0: Creative, diverse outputs (ideal for creative writing)1.0+: Highly random and experimentalTop P:
0.1-0.5: Very focused, only most likely tokens0.6-0.9: Balanced diversity0.95-1.0: Maximum diversity, including less likely tokensAdjusting Temperature and Top P Together When tuning both parameters simultaneously, consider these combinations:
0.0-0.3) + moderate-to-high top_p (0.8-0.95)
0.5-0.7) + high top_p (0.9-0.95)
0.8-1.0) + high top_p (0.95-1.0)
Avoid: Setting both very low (overly restrictive) or adjusting both dramatically at once. When experimenting, adjust one parameter at a time to understand its individual effect.
For temperature and top_p in ChatCompletionInferenceParams, you can specify distributions instead of fixed values. This allows Data Designer to sample different values for each generation request, introducing controlled variability into your synthetic data.
Samples values uniformly between a low and high bound:
Samples from a discrete set of values with optional weights:
The max_parallel_requests parameter controls how many concurrent API calls Data Designer makes to a specific model. This directly impacts throughput and should be tuned to match your inference server’s capacity.
Performance Tuning For recommended values by deployment type (NVIDIA API Catalog, vLLM, OpenAI, NIMs) and detailed optimization strategies, see the Architecture & Performance guide.
The EmbeddingInferenceParams class controls how models generate embeddings. This is used when working with embedding models for tasks like semantic search or similarity analysis.
The ImageInferenceParams class is used for image generation models, including both diffusion models (DALL·E, Stable Diffusion, Imagen) and autoregressive models (Gemini image, GPT image). Unlike text models, image-specific options are passed entirely via extra_body, since they vary significantly between providers.