aistore.pytorch.multishard_dataset
aistore.pytorch.multishard_dataset
Multishard Stream Dataset for AIS.
Copyright (c) 2024-2025, NVIDIA CORPORATION. All rights reserved.
Module Contents
Classes
API
Bases: IterableDataset
An iterable-style dataset that iterates over multiple shard streams and yields combined samples.
Parameters:
data_sources
List of DataShard objects
Returns:
Iterable over the combined samples, where each sample is a tuple of one object bytes from each shard stream
Create an iterable over all the objects in the given shards.
Parameters:
bucket
Bucket containing the shards
prefix
Prefix of the object names
etl_name
ETL name to apply on each object
Returns: Iterable[bytes]
Iterable[Object]: Iterable over all the objects in the given shards, with each iteration returning a combined sample