For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Welcome
    • Contributing
  • Concepts
    • Columns
    • Seed Datasets
    • Agent Rollout Ingestion
    • Custom Columns
    • Validators
    • Processors
    • Person Sampling
    • Traces
    • Architecture & Performance
    • Deployment Options
    • Security
  • Tutorials
    • Overview
    • The Basics
    • Structured Outputs, Jinja Expressions, and Conditional Generation
    • Seeding with an External Dataset
    • Providing Images as Context
    • Generating Images
    • Image-to-Image Editing
  • Recipes
    • Recipe Cards
  • Plugins
    • Overview
    • Example Plugin
    • FileSystemSeedReader Plugins
    • Discover
  • Code Reference
    • Overview
      • Overview
      • models
      • mcp
      • column_configs
      • config_builder
      • data_designer_config
      • run_config
      • sampler_params
      • validator_params
      • seeds
      • processors
      • analysis
      • Config API
        • Analysis
        • Base
        • Column Configs
        • Column Types
        • Config Builder
        • Custom Column
        • Data Designer Config
        • Dataset Metadata
        • Default Model Settings
        • Errors
        • Exportable Config
        • Fingerprint
        • Interface
        • Mcp
        • Models
        • Preview Results
        • Processor Types
        • Processors
        • Run Config
        • Sampler Constraints
        • Sampler Params
        • Seed
        • Seed Source
        • Seed Source Dataframe
        • Seed Source Types
        • Testing
        • Utils
        • Validator Params
        • Version
  • Dev Notes
    • Overview
    • Have It Your Way
    • VLM Long Document Understanding
    • Push Datasets to Hugging Face Hub
    • Text-to-SQL for Nemotron Super
    • Async All the Way Down
    • Owning the Model Stack
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Data Designer
On this page
  • Module Contents
  • Classes
  • API
Code ReferenceConfigConfig API

data_designer.config.seed

||View as Markdown|
Previous

Sampler Params

Next

Seed Source

Module Contents

Classes

NameDescription
SamplingStrategystr(object=”) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
IndexRange!!! abstract “Usage Documentation” Models
PartitionBlock!!! abstract “Usage Documentation” Models
SeedConfigConfiguration for sampling data from a seed dataset.

API

1class data_designer.config.seed.SamplingStrategy

Bases: str, enum.Enum

1ORDERED = ordered
1SHUFFLE = shuffle
1class data_designer.config.seed.IndexRange(
2 /,
3 **data: typing.Any
4)

Bases: data_designer.config.base.ConfigBase

1start: int = Field(...)
1end: int = Field(...)
1_validate_index_range() -> typing_extensions.Self
1size: int
1class data_designer.config.seed.PartitionBlock(
2 /,
3 **data: typing.Any
4)

Bases: data_designer.config.base.ConfigBase

1index: int = Field(...)
1num_partitions: int = Field(...)
1_validate_partition_block() -> typing_extensions.Self
1to_index_range(dataset_size: int) -> data_designer.config.seed.IndexRange
1class data_designer.config.seed.SeedConfig(
2 /,
3 **data: typing.Any
4)

Bases: data_designer.config.base.ConfigBase

Configuration for sampling data from a seed dataset.

Parameters:

source

A SeedSource defining where the seed data exists

sampling_strategy

Strategy for how to sample rows from the dataset.

  • ORDERED: Read rows sequentially in their original order.
  • SHUFFLE: Randomly shuffle rows before sampling. When used with selection_strategy, shuffling occurs within the selected range/partition.
selection_strategy

Optional strategy to select a subset of the dataset.

  • IndexRange: Select a specific range of indices (e.g., rows 100-200).
  • PartitionBlock: Select a partition by splitting the dataset into N equal parts. Partition indices are zero-based (index=0 is the first partition, index=1 is the second, etc.).

Attributes:

source

A SeedSource defining where the seed data exists

sampling_strategy

Strategy for how to sample rows from the dataset.

  • ORDERED: Read rows sequentially in their original order.
  • SHUFFLE: Randomly shuffle rows before sampling. When used with selection_strategy, shuffling occurs within the selected range/partition.
selection_strategy

Optional strategy to select a subset of the dataset.

  • IndexRange: Select a specific range of indices (e.g., rows 100-200).
  • PartitionBlock: Select a partition by splitting the dataset into N equal parts. Partition indices are zero-based (index=0 is the first partition, index=1 is the second, etc.).

Examples:

1Read rows sequentially from start to end:
2 SeedConfig(
3 source=LocalFileSeedSource(path="my_data.parquet"),
4 sampling_strategy=SamplingStrategy.ORDERED
5 )
6
7Read rows in random order:
8 SeedConfig(
9 source=LocalFileSeedSource(path="my_data.parquet"),
10 sampling_strategy=SamplingStrategy.SHUFFLE
11 )
12
13Read specific index range (rows 100-199):
14 SeedConfig(
15 source=LocalFileSeedSource(path="my_data.parquet"),
16 sampling_strategy=SamplingStrategy.ORDERED,
17 selection_strategy=IndexRange(start=100, end=199)
18 )
19
20Read random rows from a specific index range (shuffles within rows 100-199):
21 SeedConfig(
22 source=LocalFileSeedSource(path="my_data.parquet"),
23 sampling_strategy=SamplingStrategy.SHUFFLE,
24 selection_strategy=IndexRange(start=100, end=199)
25 )
26
27Read from partition 2 (3rd partition, zero-based) of 5 partitions (20% of dataset):
28 SeedConfig(
29 source=LocalFileSeedSource(path="my_data.parquet"),
30 sampling_strategy=SamplingStrategy.ORDERED,
31 selection_strategy=PartitionBlock(index=2, num_partitions=5)
32 )
33
34Read shuffled rows from partition 0 of 10 partitions (shuffles within the partition):
35 SeedConfig(
36 source=LocalFileSeedSource(path="my_data.parquet"),
37 sampling_strategy=SamplingStrategy.SHUFFLE,
38 selection_strategy=PartitionBlock(index=0, num_partitions=10)
39 )

Initialization:

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

1source: data_designer.config.seed_source_types.SeedSourceT
1sampling_strategy: data_designer.config.seed.SamplingStrategy
1selection_strategy: data_designer.config.seed.IndexRange | data_designer.config.seed.PartitionBlock | Noneselection_strategy: data_designer.config.seed.IndexRange | data_designer.config.seed.PartitionBlock | None