aistore.pytorch.shard_reader

View as Markdown

AIS Shard Reader for PyTorch

PyTorch Dataset and DataLoader for AIS.

Copyright (c) 2024-2025, NVIDIA CORPORATION. All rights reserved.

Module Contents

Classes

NameDescription
AISShardReaderAn iterable-style dataset that iterates over objects stored as Webdataset shards

API

class aistore.pytorch.shard_reader.AISShardReader(
bucket_list: typing.Union[aistore.sdk.Bucket, typing.List[aistore.sdk.Bucket]], bucket_list: typing.Union[aistore.sdk.Bucket, typing.List[aistore.sdk.Bucket]],
prefix_map: typing.Dict[aistore.sdk.Bucket, typing.Union[str, typing.List[str]]] = {},
etl_name: str = None,
show_progress: bool = False
)

Bases: AISBaseIterDataset

An iterable-style dataset that iterates over objects stored as Webdataset shards and yields samples represented as a tuple of basename (str) and contents (dictionary).

Parameters:

bucket_list
Union[Bucket, List[Bucket]]

Single or list of Bucket objects to load data

prefix_map
Dict(AISSource, Union[str, List[str]])Defaults to {}

Map of Bucket objects to list of prefixes that only allows

etl_name
strDefaults to None

Optional ETL on the AIS cluster to apply to each object

show_progress
boolDefaults to False

Enables console shard reading progress indicator

_observed_keys
= set()
aistore.pytorch.shard_reader.AISShardReader.__iter__() -> typing.Iterator
aistore.pytorch.shard_reader.AISShardReader.__len__()

Returns the length of the dataset. Note that calling this will iterate through the dataset, taking O(N) time.

NOTE: If you want the length of the dataset after iterating through it, use for i, data in enumerate(dataset) instead.

aistore.pytorch.shard_reader.AISShardReader._read_samples_from_shards(
shard_content
) -> typing.Dict