PyTorch: Loading Data from AIStore
PyTorch: Loading Data from AIStore
Note: The
torchdata.datapipesmodule has been deprecated and removed in recent versions oftorchdata. Some information in this blog post may be outdated.
Listing and loading data from AIS buckets (buckets that are not 3rd party backend-based) and remote cloud buckets (3rd party backend-based cloud buckets) using AISFileLister and AISFileLoader.
AIStore (AIS for short) fully supports Amazon S3, Google Cloud, and Microsoft Azure backends, providing a unified namespace across multiple connected backends and/or other AIS clusters, and more.
In the following example, we use the Caltech-256 Object Category Dataset containing 256 object categories and a total of 30607 images stored on an AIS bucket and the Microsoft COCO Dataset which has 330K images with over 200K labels of more than 1.5 million object instances across 80 object categories stored on Google Cloud.
Running the AIStore Cluster
Getting started with AIS will take only a few minutes (prerequisites boil down to having a Linux with a disk). See the available deployment options here.
To keep this example simple, we will be running a minimal container-based deployment of AIStore.
To create and put objects (dataset) in the bucket, I am going to be using AIS CLI. But we can also use the Python SDK for the same.
OUTPUT:
“ais://caltech256” created (see https://github.com/NVIDIA/aistore/blob/main/docs/bucket.md#bucket-properties)
Files to upload:
EXTENSION COUNT SIZE
1 3.06KiB
.jpg 30607 1.08GiB
TOTAL 30608 1.08GiB
PUT 30608 objects to “ais://caltech256”
Preloaded dataset
The following assumes that AIS cluster is running and one of its buckets contains Caltech-256 dataset.
OUTPUT:
[‘ais://caltech256/002.american-flag/002_0001.jpg’,
‘ais://caltech256/002.american-flag/002_0002.jpg’,
‘ais://caltech256/002.american-flag/002_0003.jpg’,
‘ais://caltech256/002.american-flag/002_0004.jpg’,
‘ais://caltech256/002.american-flag/002_0005.jpg’]
OUTPUT:
image url: ais://caltech256/002.american-flag/002_0001.jpg
Remote cloud buckets
AIStore supports multiple remote backends. With AIS, accessing cloud buckets doesn’t require any additional setup assuming, of course, that you have the corresponding credentials (to access cloud buckets).
For the following example, AIStore must be built with --gcp build tag.
--gcp,--aws, and a number of other build tags is the mechanism we use to include optional libraries in the build.
OUTPUT:
[‘gcp://webdataset-testing/coco-train2014-seg-000000.tar’,
‘gcp://webdataset-testing/coco-train2014-seg-000001.tar’,
‘gcp://webdataset-testing/coco-train2014-seg-000002.tar’,
‘gcp://webdataset-testing/coco-train2014-seg-000003.tar’,
‘gcp://webdataset-testing/coco-train2014-seg-000004.tar’]