***
layout: overview
slug: nemo-curator/nemo\_curator/tasks/interleaved
title: nemo\_curator.tasks.interleaved
--------------------------------------
Interleaved task type and schema for row-wise interleaved multimodal records.
Schema columns fall into two categories:
**Reserved columns** (`RESERVED_COLUMNS`) -- managed by pipeline stages:
\================== ============= =========== ===============================================
Column Type Category Description
\================== ============= =========== ===============================================
`sample_id` string (req) Identity Unique document/sample identifier
`position` int32 (req) Identity Position within sample (-1 for metadata rows)
`modality` string (req) Identity Row modality -- built-in values are `text`,
`image`, and `metadata`; extensible to
`audio`, `table`, `generated_image`, etc.
`content_type` string Content MIME type (e.g. `text/plain`, `image/jpeg`)
`text_content` string Content Text payload for text rows
`binary_content` large\_binary Content Image bytes (populated by materialization)
`source_ref` string Internal JSON locator `{path, member,
byte_offset, byte_size, frame_index}`.
`path` alone = direct/remote read;
* `member` = tar extract;
* `byte_offset/size` = range read (fastest).
`path` accepts local or remote (`s3://`) URIs.
`materialize_error` string Internal Error message if materialization failed
\================== ============= =========== ===============================================
**User columns** (passthrough) -- extra fields from source data added via the
`fields` parameter on the reader. These flow through the pipeline untouched.
## Module Contents
### Classes
| Name | Description |
| ---------------------------------------------------------------------- | ------------------------------------------ |
| [`InterleavedBatch`](#nemo_curator-tasks-interleaved-InterleavedBatch) | Task carrying row-wise multimodal records. |
### Data
[`INTERLEAVED_SCHEMA`](#nemo_curator-tasks-interleaved-INTERLEAVED_SCHEMA)
[`RESERVED_COLUMNS`](#nemo_curator-tasks-interleaved-RESERVED_COLUMNS)
### API
```python
class nemo_curator.tasks.interleaved.InterleavedBatch(
task_id: str,
dataset_name: str,
data: pyarrow.Table | pandas.DataFrame = (lambda: pa.Table.from_pyli...,
_stage_perf: list[nemo_curator.utils.performance_utils.StagePerfStats] = list(),
_metadata: dict[str, typing.Any] = dict(),
REQUIRED_COLUMNS: frozenset[str] = frozenset(name for name, f ...
)
```
Dataclass
**Bases:** [Task\[Table | DataFrame\]](/nemo-curator/nemo_curator/tasks/tasks#nemo_curator-tasks-tasks-Task)
Task carrying row-wise multimodal records.
See module docstring for the full schema reference (reserved vs user columns).
Number of unique samples (distinct `sample_id` values).
```python
nemo_curator.tasks.interleaved.InterleavedBatch.add_rows(
rows: pyarrow.Table | pandas.DataFrame | list[dict],
sample_id: str | None = None,
auto_position: bool = True
) -> nemo_curator.tasks.interleaved.InterleavedBatch
```
Add rows to this task.
**Parameters:**
New rows to append. Must contain required columns unless
overridden by *sample\_id* / *auto\_position*.
If provided, assign this `sample_id` to all new rows.
If `True`, auto-assign `position` values
continuing from the existing maximum per sample.
```python
nemo_curator.tasks.interleaved.InterleavedBatch.build_source_ref(
path: str | None,
member: str | None,
byte_offset: int | None = None,
byte_size: int | None = None,
frame_index: int | None = None
) -> str
```
staticmethod
Build a `source_ref` JSON locator string.
```python
nemo_curator.tasks.interleaved.InterleavedBatch.count(
modality: str | None = None
) -> int
```
Return row count, optionally filtered by modality.
Examples::
task.count() # total rows
task.count(modality="image") # image rows only
task.count(modality="text") # text rows only
```python
nemo_curator.tasks.interleaved.InterleavedBatch.delete_rows(
mask: pandas.Series
) -> nemo_curator.tasks.interleaved.InterleavedBatch
```
Delete rows where *mask* is `True`.
**Parameters:**
Boolean Series aligned to the data. `True` marks a row
for deletion.
```python
nemo_curator.tasks.interleaved.InterleavedBatch.get_columns() -> list[str]
```
```python
nemo_curator.tasks.interleaved.InterleavedBatch.parse_source_ref(
source_value: str | None
) -> dict[str, str | int | None]
```
staticmethod
Parse a `source_ref` JSON string into a locator dict.
```python
nemo_curator.tasks.interleaved.InterleavedBatch.to_pandas() -> pandas.DataFrame
```
```python
nemo_curator.tasks.interleaved.InterleavedBatch.to_pyarrow() -> pyarrow.Table
```
```python
nemo_curator.tasks.interleaved.InterleavedBatch.validate() -> bool
```
```python
nemo_curator.tasks.interleaved.InterleavedBatch.with_parsed_source_ref_columns(
prefix: str = '_src_'
) -> pandas.DataFrame
```
Return a DataFrame copy with parsed `source_ref` columns added.
Columns: `{prefix}path`, `{prefix}member`, `{prefix}byte_offset`,
`{prefix}byte_size`, `{prefix}frame_index`.
```python
nemo_curator.tasks.interleaved.INTERLEAVED_SCHEMA = pa.schema([pa.field('sample_id', pa.string(), nullable=False), pa.field('positio...
```
```python
nemo_curator.tasks.interleaved.RESERVED_COLUMNS: frozenset[str] = frozenset(INTERLEAVED_SCHEMA.names)
```