> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.datasets.diffusion.text_to_video_dataset

## Module Contents

### Classes

| Name                                                                                                           | Description                                                     |
| -------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------- |
| [`TextToVideoDataset`](#nemo_automodel-components-datasets-diffusion-text_to_video_dataset-TextToVideoDataset) | Text-to-Video dataset with multiresolution bucket organization. |

### Functions

| Name                                                                                                                                 | Description                                                          |
| ------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------- |
| [`collate_optional_video_fields`](#nemo_automodel-components-datasets-diffusion-text_to_video_dataset-collate_optional_video_fields) | Concatenate optional video fields present in batch into result dict. |
| [`load_optional_video_fields`](#nemo_automodel-components-datasets-diffusion-text_to_video_dataset-load_optional_video_fields)       | Extract optional model-specific fields, moving to device.            |

### Data

[`VIDEO_OPTIONAL_FIELDS`](#nemo_automodel-components-datasets-diffusion-text_to_video_dataset-VIDEO_OPTIONAL_FIELDS)

### API

```python
class nemo_automodel.components.datasets.diffusion.text_to_video_dataset.TextToVideoDataset(
    cache_dir: str,
    model_type: str = 'wan',
    device: str = 'cpu'
)
```

**Bases:** [BaseMultiresolutionDataset](/nemo-automodel/nemo_automodel/components/datasets/diffusion/base_dataset#nemo_automodel-components-datasets-diffusion-base_dataset-BaseMultiresolutionDataset)

Text-to-Video dataset with multiresolution bucket organization.

Loads preprocessed .meta files organized by resolution bucket.
Compatible with SequentialBucketSampler for multiresolution training.

```python
nemo_automodel.components.datasets.diffusion.text_to_video_dataset.TextToVideoDataset.__getitem__(
    idx: int
) -> typing.Dict[str, torch.Tensor]
```

Load a single video sample from its .meta file.

```python
nemo_automodel.components.datasets.diffusion.text_to_video_dataset.collate_optional_video_fields(
    batch: typing.List[typing.Dict],
    result: dict
) -> None
```

Concatenate optional video fields present in batch into result dict.

```python
nemo_automodel.components.datasets.diffusion.text_to_video_dataset.load_optional_video_fields(
    data: dict,
    device: str = 'cpu'
) -> dict
```

Extract optional model-specific fields, moving to device.

```python
nemo_automodel.components.datasets.diffusion.text_to_video_dataset.VIDEO_OPTIONAL_FIELDS = ('text_mask', 'text_embeddings_2', 'text_mask_2', 'image_embeds')
```