For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://docs.nvidia.com/aistore/llms.txt. For full documentation content, see https://docs.nvidia.com/aistore/llms-full.txt.

# AIStore Blog

## 2026

**[Eliminating Cluster Authentication Risks: AIStore with RSA and OIDC Issuer Discovery](/aistore/blog/rsa-and-oidc)** — Apr 09, 2026
> Back in February 1997, [RFC 2104](https://datatracker.ietf.org/doc/html/rfc2104) introduced HMAC as a mechanism for authenticating messages based on a shared secret key.

**[Native Bucket Inventory: Up to 17x Faster Remote Bucket Listing](/aistore/blog/native-bucket-inventory)** — Apr 06, 2026
> AIStore 4.3 introduces Native Bucket Inventory (NBI), a new mechanism for accelerating large remote-bucket listings by turning a repeatedly expensive operation into a local, reusable metadata path. In...

**[Parallel Download: 9x Lower Latency for Large-Object Reads](/aistore/blog/parallel-download)** — Mar 25, 2026
> In AIStore 4.3, we introduced parallel download APIs to accelerate reads of large objects in an AIS cluster. Instead of pulling the entire object through one long sequential GET request stream, parall...

## 2025

**[The Many Lives of a Dataset Called 'data'](/aistore/blog/s3-data-with-namespace)** — Dec 15, 2025
> For whatever reason, a bucket called `s3://data` shows up with remarkable frequency as we deploy AIStore (AIS) clusters and populate them with user datasets. Likely for the same reason that `password ...

**[Blob Downloader: Accelerate Remote Object Fetching with Concurrent Range-Reads](/aistore/blog/blob-downloader)** — Nov 26, 2025
> In AIStore 4.1, we extended [blob downloader](https://github.com/NVIDIA/aistore/blob/main/docs/blob_downloader.md) to leverage the chunked object representation and speed up fetching remote objects. T...

**[GetBatch API: faster data retrieval for ML workloads](/aistore/blog/get-batch-sequential)** — Oct 06, 2025
> ML training and inference typically operate on batches of samples or data items. To simplify such workflows, AIStore 4.0 introduces the `GetBatch` API.

**[Automated API Documentation Generation with GenDocs](/aistore/blog/automated-api-documentation-generation-with-gendocs)** — Aug 29, 2025
> Maintaining accurate and up-to-date HTTP API documentation is critical for the developer experience when building and debugging SDKs. Clear HTTP documentation saves developers from digging through AIS...

**[AIStore + HuggingFace: Distributed Downloads for Large-Scale Machine Learning](/aistore/blog/huggingface-integration)** — Aug 22, 2025
> Machine learning teams increasingly rely on large datasets from [HuggingFace](https://huggingface.co/) to power their models. But traditional download tools struggle with terabyte-scale datasets conta...

**[Single-Object Copy/Transform Capability](/aistore/blog/single-object-copy-transformation-capability)** — Jul xx, 2025
> In version 3.30, AIStore introduced a lightweight, flexible API to copy or transform a single object between buckets. It provides a simpler alternative to existing batch-style operations, ideal for fa...

**[The Perfect Line](/aistore/blog/smooth-max-line-speed)** — Jul 26, 2025
> I didn't want to write this blog.

**[Single-Object Copy/Transform Capability](/aistore/blog/single-object-copy-transformation-capability)** — Jul 25, 2025
> In version 3.30, AIStore introduced a lightweight, flexible API to copy or transform a single object between buckets. It provides a simpler alternative to existing batch-style operations, ideal for fa...

**[AIStore v3.28: Boost ETL Performance with Optimized Data Movement and Specialized Web Server Framework](/aistore/blog/etl-optimized-data-movement-and-server-framework)** — May 15, 2025
> The current state of the art involves executing data pre-processing, augmentation, and a wide variety of custom ETL workflows on individual client machines. This approach lacks scalability and often r...

**[AIStore Python SDK: Maintaining Resilient Connectivity During Lifecycle Events](/aistore/blog/python-retry)** — Apr 02, 2025
> In distributed systems, maintaining seamless connectivity during [lifecycle events](https://aistore.nvidia.com/docs/lifecycle_node) is a key challenge. If the cluster’s state changes while read operat...

**[Unified Rate Limiting: Frontend and Backend](/aistore/blog/rate-limit-blog)** — Mar 19, 2025
> AIStore v3.28 introduces a unified **rate-limiting** capability that works at both the frontend (client-facing) and backend (cloud-facing) layers. It enables proactive control to prevent hitting limit...

**[Comparing OCI's Native Object Storage and S3 API Backends](/aistore/blog/oci-object-native-vs-s3-api)** — Feb 26, 2025
> The newly available support for Oracle Cloud Infrastructure ("OCI") Object Storage was made

**[Split-brain is Inevitable](/aistore/blog/split-brain-blog)** — Feb 16, 2025
> Split-brain is inevitable. The way it approaches varies greatly but there are telltale signs that, in hindsight, you wish you'd taken more seriously.

**[Arrival of native backed OCI Object Storage support](/aistore/blog/oci-object-storage-support)** — Feb 06, 2025
> Oracle Cloud Infrastructure ("OCI") has been supported via OCI's Amazon S3 Compatibility

## 2024

**[Adding Data to AIStore -- PUT Performance](/aistore/blog/put-performance)** — Nov 22, 2024
> AI training workloads primarily _read_ data, and lots of it.

**[Enhancing ObjectFile Performance with Zero-Copy Techniques](/aistore/blog/enhancing-object-file-performance-with-zero-copy-techniques)** — Nov 21, 2024
> In our [previous blog post](https://aistore.nvidia.com/blog/2024/09/26/resilient-streaming-with-object-file), we introduced `ObjectFile`, a resilient, file-like interface in the AIStore Python SDK des...

**[Resilient Data Loading with ObjectFile](/aistore/blog/resilient-streaming-with-object-file)** — Sep 26, 2024
> Massively parallel loading of terabytes of data in a distributed system presents reliability challenges. This holds true even for data centers where network stability is supposed to be stellar. Consid...

**[Google Colab + AIStore: Easier Cloud Data Access for AI/ML Experiments](/aistore/blog/google-colab-aistore)** — Sep 18, 2024
> Working with data stored in cloud services like GCP, AWS, Azure, and OCI in [Google Colab](https://colab.research.google.com/) can be challenging. The entire process—from installing libraries and conf...

**[Accelerating AI Workloads with AIStore and PyTorch](/aistore/blog/pytorch-integration)** — Aug 28, 2024
> As AI workloads are becoming increasingly demanding, our models need more and more data to train.[<sup>[1]</sup>](#references) These massive datasets can overwhelm filesystems, both local and network-...

**[Initial Sharding of Machine Learning Datasets](/aistore/blog/ishard)** — Aug 16, 2024
> Over the past decade, and especially in the last 3-4 years, the size of AI datasets has grown significantly, often exceeding the combined capacity of block storage devices that can be attached to a si...

**[Very large](/aistore/blog/very-large)** — May 20, 2024
> The idea of _extremely large_ is constantly shifting, evolving. As time passes by we quickly adopt its new numeric definition and only rarely, with a mild sense of amusement, recall the old one.

**[AIS on NFS](/aistore/blog/ais-on-nfs)** — Mar 30, 2024
> > This is an excerpt from an article that I posted at storagetarget.com. The full text can be found at:

**[Maximizing Cluster Bandwidth with AIS Multihoming](/aistore/blog/multihome-bench)** — Feb 16, 2024
> Identifying bottlenecks in high-performance systems is critical to optimize the hardware and associated costs.

## 2023

**[AIStore as a Fast Tier Storage Solution: Enhancing Petascale Deep Learning Across Remote Cloud Backends](/aistore/blog/aistore-fast-tier)** — Nov 27, 2023
> The challenges associated with loading petascale datasets, crucial for training models in both vision and language processing, pose significant hurdles in the field of deep learning. These datasets, o...

**[AIStore with WebDataset Part 3 -- Building a Pipeline for Model Training](/aistore/blog/aisio-transforms-with-webdataset-pt-3)** — Jun 09, 2023
> In the previous posts ([pt1](https://aiatscale.org/blog/2023/05/05/aisio-transforms-with-webdataset-pt-1), [pt2](https://aiatscale.org/blog/2023/05/11/aisio-transforms-with-webdataset-pt-2)), we discu...

**[AIStore with WebDataset Part 2 -- Transforming WebDataset Shards in AIS](/aistore/blog/aisio-transforms-with-webdataset-pt-2)** — May 11, 2023
> > **Note:** This blog post references `init_code` which has been removed and replaced with `init_class`. For the most up-to-date ETL initialization methods, please refer to the [init_class documentati...

**[AIStore with WebDataset Part 1 -- Storing WebDataset format in AIS](/aistore/blog/aisio-transforms-with-webdataset-pt-1)** — May 08, 2023
> Training AI models is expensive, so it's important to keep GPUs fed with all the data they need as fast as they can consume it. WebDataset and AIStore each address different parts of this problem indi...

**[Transforming non-existing datasets](/aistore/blog/tco-any-to-any)** — Apr 10, 2023
> There's an old trick that never quite gets old: you run a high-velocity exercise that generates a massive amount of traffic through some sort of a multi-part system, whereby some of those parts are (s...

**[AIStore SDK & ETL: Transform an image dataset with AIS SDK and load into PyTorch](/aistore/blog/transform-images-with-python-sdk)** — Apr 03, 2023
> > **Note:** This blog post references `init_code` which has been removed and replaced with `init_class`. For the most up-to-date ETL initialization methods, please refer to the [init_class documentati...

## 2022

**[AIStore 3.12 Release Notes](/aistore/blog/relnotes-3.12)** — Nov 13, 2022
> This AIStore release, version 3.12, has been in development for almost four months. It includes a number of significant changes that can be further detailed and grouped as follows:

**[AIStore: Data Analysis w/ DataFrames](/aistore/blog/dask-data-analysis)** — Aug 15, 2022
> [Dask](https://www.dask.org/) is a new and flexible open-source Python library for *parallel/distributed computing* and *optimized memory usage*. Dask extends many of today's popular Python libraries ...

**[Python SDK: Getting Started](/aistore/blog/python-sdk)** — Jul 20, 2022
> Python has grounded itself as a popular language of choice among data scientists and machine learning developers. Python's recent popularity in the field can be attributed to Python's general *ease-of...

**[PyTorch: Loading Data from AIStore](/aistore/blog/aisio-pytorch)** — Jul 11, 2022
> > **Note:** The `torchdata.datapipes` module has been [deprecated and removed](https://github.com/pytorch/data?tab=readme-ov-file#torchdata-see-note-below-on-current-status) in recent versions of

**[Promoting local and shared files](/aistore/blog/promote)** — Mar 17, 2022
> When it comes to working with files, the first question  often is *how*? How to easily and quickly move or copy existing file datasets into AIS clusters?

**[What's new in AIS v3.9](/aistore/blog/whats-new-in-v3.9)** — Mar 15, 2022
> AIS **v3.9** is substantial [productization and performance-improving release](https://github.com/NVIDIA/aistore/releases/tag/3.9). Much of the codebase has been refactored for consistency, with micro...

## 2021

**[What's new in AIS v3.8](/aistore/blog/whats-new-in-v3.8)** — Dec 15, 2021
> AIStore v3.8 is a significant upgrade delivering [long-awaited features, stabilization fixes, and performance improvements](https://github.com/NVIDIA/aistore/releases/tag/3.8). There's also the cumula...

**[Copying existing file datasets in two easy steps](/aistore/blog/cp-files-to-ais)** — Dec 07, 2021
> AIStore supports [numerous ways](https://github.com/NVIDIA/aistore/blob/main/docs/overview.md#existing-datasets) to copy, download, or otherwise transfer existing datasets. Much depends on *where is* ...

**[AIStore & ETL: Using WebDataset to train on a sharded dataset (post #3)](/aistore/blog/ais-etl-3)** — Oct 29, 2021
> **Deprecated** -- WDTransform is no longer included as part of the AIS client, so this post only remains for educational purposes. ETL is in development and additional transformation tools will be inc...

**[AIStore & ETL: Using AIS/PyTorch connector to transform ImageNet (post #2)](/aistore/blog/ais-etl-2)** — Oct 22, 2021
> The goal now is to deploy our first ETL and have AIStore run it on each storage node, harnessing the distributed power (and close to data - meaning, **fast**). For the problem statement, background an...

**[AIStore & ETL: Introduction (post #1)](/aistore/blog/ais-etl-1)** — Oct 21, 2021
> [AIStore](https://github.com/NVIDIA/aistore) (AIS) is a reliable lightweight storage cluster that deploys anywhere, runs user containers and functions, and scales linearly with no limitation. The deve...

**[Go: append a file to a TAR archive](/aistore/blog/tar-append)** — Aug 10, 2021
> AIStore supports a whole gamut of "archival" operations that allow to read, write, and list archives such as .tar, .tgz, and .zip. When we started working on **appending** content to existing archives...

**[Integrated Storage Stack for Training, Inference, and Transformations](/aistore/blog/etl)** — Jul 30, 2021
> In the end, the choice, like the majority of important choices, comes down to a binary: either this or that. Either you go to storage, or you don’t. Either you cache a dataset in question (and then tr...

**[AIStore: an open system for petascale deep learning](/aistore/blog/aistore)** — Jul 30, 2021
> AIStore (or AIS) has been in development for more than three years so far and has accumulated a fairly long list of capabilities, all duly noted via release notes on the corresponding GitHub pages. At...