morpheus.io.data_manager.DataManager

class DataManager(storage_type='in_memory', file_format='parquet')[source]

Bases: object

DataManager class to manage the storage and retrieval of files using either in-memory or filesystem storage.

Attributes:
manifest

Retrieve a mapping of UUIDs to their filenames or labels.

num_rows

Get the number of rows in a source given its source ID.

records

storage_type

Get the storage type used by the DataManager instance.

Methods

get_record(source_id)

Get a DataRecord instance given a source ID.

load(source_id)

Load a cuDF DataFrame given a source ID.

remove(source_id)

Remove a source using its source ID.

store(data_source[, copy_from_source, ...])

Store a DataFrame or file path as a source and return the source ID.

get_record(source_id)[source]

Get a DataRecord instance given a source ID.

Parameters:

source_id (uuid.UUID) – UUID of the source to be retrieved.

Returns:

DataRecord instance.

Return type:

morpheus.io.data_record.DataRecord

load(source_id)[source]

Load a cuDF DataFrame given a source ID.

Parameters:

source_id (uuid.UUID) – UUID of the source to be loaded.

Returns:

Loaded cuDF DataFrame.

Return type:

cudf.DataFrame

property manifest: dict

Retrieve a mapping of UUIDs to their filenames or labels.

Returns:

A dictionary containing UUID to filename/label mappings.

property num_rows: int

Get the number of rows in a source given its source ID. :return:

remove(source_id)[source]

Remove a source using its source ID.

Parameters:

source_id (uuid.UUID) – UUID of the source to be removed.

property storage_type: str

Get the storage type used by the DataManager instance.

Returns:

Storage type as a string.

store(data_source, copy_from_source=False, data_label=None)[source]

Store a DataFrame or file path as a source and return the source ID.

Parameters:
  • data_source (Union[cudf.DataFrame, pandas.DataFrame, str]) – DataFrame or file path to store as a source.

  • copy_from_source (bool) – Whether to copy the data on disk when the input is a file path and the storage type is ‘filesystem’.

  • data_label (Optional[str]) – Optional label for the stored data.

Returns:

UUID of the stored source.

Return type:

uuid.UUID

© Copyright 2023, NVIDIA. Last updated on Aug 23, 2023.