morpheus.io.data_manager.DataManager#

class DataManager(storage_type='in_memory', file_format='parquet')[source]#

Bases: object

DataManager class to manage the storage and retrieval of files using either in-memory or filesystem storage.

Attributes:
manifest

Retrieve a mapping of UUIDs to their filenames or labels.

num_rows

Get the number of rows in a source given its source ID.

records
storage_type

Get the storage type used by the DataManager instance.

Methods

get_record(source_id)

Get a DataRecord instance given a source ID.

load(source_id)

Load a cuDF DataFrame given a source ID.

remove(source_id)

Remove a source using its source ID.

store(data_source[, copy_from_source, ...])

Store a DataFrame or file path as a source and return the source ID.

get_record(source_id)[source]#

Get a DataRecord instance given a source ID.

Parameters:

source_id (UUID) – UUID of the source to be retrieved.

Returns:

DataRecord instance.

Return type:

DataRecord

load(source_id)[source]#

Load a cuDF DataFrame given a source ID.

Parameters:

source_id (UUID) – UUID of the source to be loaded.

Returns:

Loaded cuDF DataFrame.

Return type:

cudf.DataFrame

property manifest: dict#

Retrieve a mapping of UUIDs to their filenames or labels.

Returns:

A dictionary containing UUID to filename/label mappings.

property num_rows: int#

Get the number of rows in a source given its source ID. :return:

remove(source_id)[source]#

Remove a source using its source ID.

Parameters:

source_id (UUID) – UUID of the source to be removed.

property storage_type: str#

Get the storage type used by the DataManager instance.

Returns:

Storage type as a string.

store(
data_source,
copy_from_source=False,
data_label=None,
)[source]#

Store a DataFrame or file path as a source and return the source ID.

Parameters:
  • data_source (cudf.DataFrame | pandas.DataFrame | str) – DataFrame or file path to store as a source.

  • copy_from_source (bool) – Whether to copy the data on disk when the input is a file path and the storage type is ‘filesystem’.

  • data_label (str | None) – Optional label for the stored data.

Returns:

UUID of the stored source.

Return type:

UUID