***
layout: overview
slug: nemo-curator/nemo\_curator/tasks/document
title: nemo\_curator.tasks.document
-----------------------------------
## Module Contents
### Classes
| Name | Description |
| ------------------------------------------------------------- | ---------------------------------------------- |
| [`DocumentBatch`](#nemo_curator-tasks-document-DocumentBatch) | Task for processing batches of text documents. |
### API
```python
class nemo_curator.tasks.document.DocumentBatch(
task_id: str,
dataset_name: str,
data: pyarrow.Table | pandas.DataFrame = pa.Table(),
_stage_perf: list[nemo_curator.utils.performance_utils.StagePerfStats] = list(),
_metadata: dict[str, typing.Any] = dict()
)
```
Dataclass
**Bases:** [Task\[Table | DataFrame\]](/nemo-curator/nemo_curator/tasks/tasks#nemo_curator-tasks-tasks-Task)
Task for processing batches of text documents.
Documents are stored as a dataframe (PyArrow Table or Pandas DataFrame).
Get the number of documents in this batch.
```python
nemo_curator.tasks.document.DocumentBatch.get_columns() -> list[str]
```
Get column names from the data.
```python
nemo_curator.tasks.document.DocumentBatch.to_pandas() -> pandas.DataFrame
```
Convert data to Pandas DataFrame.
```python
nemo_curator.tasks.document.DocumentBatch.to_pyarrow() -> pyarrow.Table
```
Convert data to PyArrow table.
```python
nemo_curator.tasks.document.DocumentBatch.validate() -> bool
```
Validate the task data.