For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
      • Default Model Settings
      • Configure with the CLI
      • Custom Model Settings
      • Model Providers
      • Model Configs
      • Inference Parameters
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
  • Dev Notes
    • Overview
    • Prompt Sensitivity
    • Retriever SDG Toolkit
    • Have It Your Way
    • VLM Long Document Understanding
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
On this page
  • Overview
  • Chat Completion Inference Parameters
  • Fields
  • Temperature and Top P Guidelines
  • Distribution-Based Inference Parameters
  • Uniform Distribution
  • Manual Distribution
  • Concurrency Control
  • Embedding Inference Parameters
  • Fields
  • Image Inference Parameters
  • Fields
  • Examples
  • See Also
ConceptsModels

Inference Parameters

||View as Markdown|
Previous

Model Configurations

Next

Custom Columns

Inference parameters control how models generate responses during synthetic data generation. Data Designer provides three types of inference parameters: ChatCompletionInferenceParams for text/code/structured generation, EmbeddingInferenceParams for embedding generation, and ImageInferenceParams for image generation.

Overview

When you create a ModelConfig, you can specify inference parameters to adjust model behavior. These parameters control aspects like randomness (temperature), diversity (top_p), context size (max_tokens), and more. Data Designer supports both static values and dynamic distribution-based sampling for certain parameters.

Chat Completion Inference Parameters

The ChatCompletionInferenceParams class controls how models generate text completions (for text, code, and structured data generation). It provides fine-grained control over generation behavior and supports both static values and dynamic distribution-based sampling.

Fields

FieldTypeRequiredDescription
temperaturefloat or DistributionNoControls randomness in generation (0.0 to 2.0). Higher values = more creative/random
top_pfloat or DistributionNoNucleus sampling parameter (0.0 to 1.0). Controls diversity by filtering low-probability tokens
max_tokensintNoMaximum number of tokens to generate in the response (≥ 1)
max_parallel_requestsintNoMaximum concurrent API requests to this model (default: 4, ≥ 1). See Concurrency Control below.
timeoutintNoAPI request timeout in seconds (≥ 1)
extra_bodydict[str, Any]NoAdditional parameters to include in the API request body

Default Values If temperature, top_p, or max_tokens are not provided, the model provider’s default values will be used. Different providers and models may have different defaults.

Controlling Reasoning Effort for Reasoning Models For reasoning models like Nemotron 3 Super (nvidia/nemotron-3-super-120b-a12b) and GPT-OSS (gpt-oss-20b, gpt-oss-120b), you can control the reasoning effort using the extra_body parameter:

1import data_designer.config as dd
2
3# High reasoning effort (more thorough, slower)
4inference_parameters = dd.ChatCompletionInferenceParams(
5 extra_body={"reasoning_effort": "high"}
6)
7
8# Medium reasoning effort (balanced)
9inference_parameters = dd.ChatCompletionInferenceParams(
10 extra_body={"reasoning_effort": "medium"}
11)
12
13# Low reasoning effort (faster, less thorough)
14inference_parameters = dd.ChatCompletionInferenceParams(
15 extra_body={"reasoning_effort": "low"}
16)

Temperature and Top P Guidelines

  • Temperature:

    • 0.0-0.3: Highly deterministic, focused outputs (ideal for structured/reasoning tasks)
    • 0.4-0.7: Balanced creativity and coherence (general purpose)
    • 0.8-1.0: Creative, diverse outputs (ideal for creative writing)
    • 1.0+: Highly random and experimental
  • Top P:

    • 0.1-0.5: Very focused, only most likely tokens
    • 0.6-0.9: Balanced diversity
    • 0.95-1.0: Maximum diversity, including less likely tokens

Adjusting Temperature and Top P Together When tuning both parameters simultaneously, consider these combinations:

  • For deterministic/structured outputs: Low temperature (0.0-0.3) + moderate-to-high top_p (0.8-0.95)
    • The low temperature ensures focus, while top_p allows some token diversity
  • For balanced generation: Moderate temperature (0.5-0.7) + high top_p (0.9-0.95)
    • This is a good starting point for most use cases
  • For creative outputs: Higher temperature (0.8-1.0) + high top_p (0.95-1.0)
    • Both parameters work together to maximize diversity

Avoid: Setting both very low (overly restrictive) or adjusting both dramatically at once. When experimenting, adjust one parameter at a time to understand its individual effect.

Distribution-Based Inference Parameters

For temperature and top_p in ChatCompletionInferenceParams, you can specify distributions instead of fixed values. This allows Data Designer to sample different values for each generation request, introducing controlled variability into your synthetic data.

Uniform Distribution

Samples values uniformly between a low and high bound:

1import data_designer.config as dd
2
3inference_params = dd.ChatCompletionInferenceParams(
4 temperature=dd.UniformDistribution(
5 params=dd.UniformDistributionParams(low=0.7, high=1.0)
6 ),
7)

Manual Distribution

Samples from a discrete set of values with optional weights:

1import data_designer.config as dd
2
3# Equal probability for each value
4inference_params = dd.ChatCompletionInferenceParams(
5 temperature=dd.ManualDistribution(
6 params=dd.ManualDistributionParams(values=[0.5, 0.7, 0.9])
7 ),
8)
9
10# Weighted probabilities (normalized automatically)
11inference_params = dd.ChatCompletionInferenceParams(
12 top_p=dd.ManualDistribution(
13 params=dd.ManualDistributionParams(
14 values=[0.8, 0.9, 0.95],
15 weights=[0.2, 0.5, 0.3] # 20%, 50%, 30% probability
16 )
17 ),
18)

Concurrency Control

The max_parallel_requests parameter controls how many concurrent API calls Data Designer makes to a specific model. This directly impacts throughput and should be tuned to match your inference server’s capacity.

Performance Tuning For recommended values by deployment type (NVIDIA API Catalog, vLLM, OpenAI, NIMs) and detailed optimization strategies, see the Architecture & Performance guide.

Embedding Inference Parameters

The EmbeddingInferenceParams class controls how models generate embeddings. This is used when working with embedding models for tasks like semantic search or similarity analysis.

Fields

FieldTypeRequiredDescription
encoding_formatLiteral["float", "base64"]NoFormat of the embedding encoding (default: “float”)
dimensionsintNoNumber of dimensions for the embedding
max_parallel_requestsintNoMaximum concurrent API requests (default: 4, ≥ 1)
timeoutintNoAPI request timeout in seconds (≥ 1)
extra_bodydict[str, Any]NoAdditional parameters to include in the API request body

Image Inference Parameters

The ImageInferenceParams class is used for image generation models, including both diffusion models (DALL·E, Stable Diffusion, Imagen) and autoregressive models (Gemini image, GPT image). Unlike text models, image-specific options are passed entirely via extra_body, since they vary significantly between providers.

Fields

FieldTypeRequiredDescription
max_parallel_requestsintNoMaximum concurrent API requests (default: 4, ≥ 1)
timeoutintNoAPI request timeout in seconds (≥ 1)
extra_bodydict[str, Any]NoModel-specific image options (size, quality, aspect ratio, etc.)

Examples

1import data_designer.config as dd
2
3# Autoregressive model (chat completions API, supports image context)
4dd.ModelConfig(
5 alias="image-model",
6 model="black-forest-labs/flux.2-pro",
7 provider="openrouter",
8 inference_parameters=dd.ImageInferenceParams(
9 extra_body={"height": 512, "width": 512}
10 ),
11)
12
13# Diffusion model (e.g., DALL·E, Stable Diffusion)
14dd.ModelConfig(
15 alias="dalle",
16 model="dall-e-3",
17 provider="openai",
18 inference_parameters=dd.ImageInferenceParams(
19 extra_body={"size": "1024x1024", "quality": "hd"}
20 ),
21)

See Also

  • Default Model Settings: Pre-configured model settings included with Data Designer
  • Custom Model Settings: Learn how to create custom providers and model configurations
  • Model Configurations: Learn about configuring model settings
  • Model Providers: Learn about configuring model providers
  • Architecture & Performance: Understanding separation of concerns and optimizing concurrency