> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/datadesigner/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/datadesigner/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/datadesigner/_mcp/server.

# data\_designer.config.seed

## Module Contents

### Classes

| Name                                                           | Description                                                                 |
| -------------------------------------------------------------- | --------------------------------------------------------------------------- |
| [`SamplingStrategy`](#data_designerconfigseedsamplingstrategy) | str(object='') -> str str(bytes\_or\_buffer\[, encoding\[, errors]]) -> str |
| [`IndexRange`](#data_designerconfigseedindexrange)             | !!! abstract "Usage Documentation" [Models](../concepts/models.md)          |
| [`PartitionBlock`](#data_designerconfigseedpartitionblock)     | !!! abstract "Usage Documentation" [Models](../concepts/models.md)          |
| [`SeedConfig`](#data_designerconfigseedseedconfig)             | Configuration for sampling data from a seed dataset.                        |

### API

```python
class data_designer.config.seed.SamplingStrategy
```

**Bases**: `str`, `enum.Enum`

```python
ORDERED = ordered
```

```python
SHUFFLE = shuffle
```

```python
class data_designer.config.seed.IndexRange(
    /,
    **data: typing.Any
)
```

**Bases**: `data_designer.config.base.ConfigBase`

```python
start: int = Field(...)
```

```python
end: int = Field(...)
```

```python
_validate_index_range() -> typing_extensions.Self
```

```python
size: int
```

```python
class data_designer.config.seed.PartitionBlock(
    /,
    **data: typing.Any
)
```

**Bases**: `data_designer.config.base.ConfigBase`

```python
index: int = Field(...)
```

```python
num_partitions: int = Field(...)
```

```python
_validate_partition_block() -> typing_extensions.Self
```

```python
to_index_range(dataset_size: int) -> data_designer.config.seed.IndexRange
```

```python
class data_designer.config.seed.SeedConfig(
    /,
    **data: typing.Any
)
```

**Bases**: `data_designer.config.base.ConfigBase`

Configuration for sampling data from a seed dataset.

**Parameters:**

A SeedSource defining where the seed data exists

Strategy for how to sample rows from the dataset.

* ORDERED: Read rows sequentially in their original order.
* SHUFFLE: Randomly shuffle rows before sampling. When used with
  selection\_strategy, shuffling occurs within the selected range/partition.

Optional strategy to select a subset of the dataset.

* IndexRange: Select a specific range of indices (e.g., rows 100-200).
* PartitionBlock: Select a partition by splitting the dataset into N equal parts.
  Partition indices are zero-based (index=0 is the first partition, index=1 is
  the second, etc.).

**Attributes:**

A SeedSource defining where the seed data exists

Strategy for how to sample rows from the dataset.

* ORDERED: Read rows sequentially in their original order.
* SHUFFLE: Randomly shuffle rows before sampling. When used with
  selection\_strategy, shuffling occurs within the selected range/partition.

Optional strategy to select a subset of the dataset.

* IndexRange: Select a specific range of indices (e.g., rows 100-200).
* PartitionBlock: Select a partition by splitting the dataset into N equal parts.
  Partition indices are zero-based (index=0 is the first partition, index=1 is
  the second, etc.).

**Examples:**

```python
Read rows sequentially from start to end:
    SeedConfig(
        source=LocalFileSeedSource(path="my_data.parquet"),
        sampling_strategy=SamplingStrategy.ORDERED
    )

Read rows in random order:
    SeedConfig(
        source=LocalFileSeedSource(path="my_data.parquet"),
        sampling_strategy=SamplingStrategy.SHUFFLE
    )

Read specific index range (rows 100-199):
    SeedConfig(
        source=LocalFileSeedSource(path="my_data.parquet"),
        sampling_strategy=SamplingStrategy.ORDERED,
        selection_strategy=IndexRange(start=100, end=199)
    )

Read random rows from a specific index range (shuffles within rows 100-199):
    SeedConfig(
        source=LocalFileSeedSource(path="my_data.parquet"),
        sampling_strategy=SamplingStrategy.SHUFFLE,
        selection_strategy=IndexRange(start=100, end=199)
    )

Read from partition 2 (3rd partition, zero-based) of 5 partitions (20% of dataset):
    SeedConfig(
        source=LocalFileSeedSource(path="my_data.parquet"),
        sampling_strategy=SamplingStrategy.ORDERED,
        selection_strategy=PartitionBlock(index=2, num_partitions=5)
    )

Read shuffled rows from partition 0 of 10 partitions (shuffles within the partition):
    SeedConfig(
        source=LocalFileSeedSource(path="my_data.parquet"),
        sampling_strategy=SamplingStrategy.SHUFFLE,
        selection_strategy=PartitionBlock(index=0, num_partitions=10)
    )
```

**Initialization:**

Create a new model by parsing and validating input data from keyword arguments.

Raises \[`ValidationError`]\[pydantic\_core.ValidationError] if the input data cannot be
validated to form a valid model.

`self` is explicitly positional-only to allow `self` as a field name.

```python
source: data_designer.config.seed_source_types.SeedSourceT
```

```python
sampling_strategy: data_designer.config.seed.SamplingStrategy
```

```python
selection_strategy: data_designer.config.seed.IndexRange | data_designer.config.seed.PartitionBlock | None
```