tasks.document
#
Module Contents#
Classes#
Task for processing batches of text documents. Documents are stored as a dataframe (PyArrow Table or Pandas DataFrame). |
API#
- class tasks.document.DocumentBatch#
Bases:
tasks.tasks.Task
[pyarrow.Table | pandas.DataFrame
]Task for processing batches of text documents. Documents are stored as a dataframe (PyArrow Table or Pandas DataFrame).
- data: pyarrow.Table | pandas.DataFrame#
‘field(…)’
- get_columns() list[str] #
Get column names from the data.
- property num_items: int#
Get the number of documents in this batch.
- to_pandas() pandas.DataFrame #
Convert data to Pandas DataFrame.
- to_pyarrow() pyarrow.Table #
Convert data to PyArrow table.
- validate() bool #
Validate the task data.