morpheus.models.dfencoder.dataloader.DFEncoderDataLoader

class DFEncoderDataLoader(*args, **kwargs)[source]

Bases: torch.utils.data.dataloader.DataLoader

Attributes
multiprocessing_context

Methods

get_distributed_training_dataloader_from_dataset(...)

Returns a distributed training DataLoader given a dataset and other arguments.

get_distributed_training_dataloader_from_df(...)

A helper funtion to get a distributed training DataLoader given a pandas dataframe.

get_distributed_training_dataloader_from_path(...)

A helper funtion to get a distributed training DataLoader given a path to a folder containing data.

check_worker_number_rationality

static get_distributed_training_dataloader_from_dataset(dataset, rank, world_size, pin_memory=False, num_workers=0)[source]

Returns a distributed training DataLoader given a dataset and other arguments.

Parameters
datasetDataset

The dataset to load the data from.

rankint

The rank of the current process.

world_sizeint

The number of processes to distribute the data across.

pin_memorybool, optional

Whether to pin memory when loading data, by default False.

num_workersint, optional

The number of worker processes to use for loading data, by default 0.

Returns
DataLoader

The training DataLoader with DistributedSampler for distributed training.

static get_distributed_training_dataloader_from_df(model, df, rank, world_size, pin_memory=False, num_workers=0)[source]

A helper funtion to get a distributed training DataLoader given a pandas dataframe.

Parameters
modelAutoEncoder

The autoencoder model used to get relevant params and the preprocessing func.

dfpandas.DataFrame

The pandas dataframe containing the data.

rankint

The rank of the current process.

world_sizeint

The number of processes to distribute the data across.

pin_memorybool, optional

Whether to pin memory when loading data, by default False.

num_workersint, optional

The number of worker processes to use for loading data, by default 0.

Returns
DFEncoderDataLoader

The training DataLoader with DistributedSampler for distributed training.

static get_distributed_training_dataloader_from_path(model, data_folder, rank, world_size, load_data_fn=<function read_csv>, pin_memory=False, num_workers=0)[source]

A helper funtion to get a distributed training DataLoader given a path to a folder containing data.

Parameters
modelAutoEncoder

The autoencoder model used to get relevant params and the preprocessing func.

data_folderstr

The path to the folder containing the data.

rankint

The rank of the current process.

world_sizeint

The number of processes to distribute the data across.

load_data_fnfunction, optional

A function for loading data from a provided file path into a pandas.DataFrame, by default pd.read_csv.

pin_memorybool, optional

Whether to pin memory when loading data, by default False.

num_workersint, optional

The number of worker processes to use for loading data, by default 0.

Returns
DFEncoderDataLoader

The training DataLoader with DistributedSampler for distributed training.

© Copyright 2023, NVIDIA. Last updated on Apr 11, 2023.