AIStore Blog

View as Markdown

2026

Eliminating Cluster Authentication Risks: AIStore with RSA and OIDC Issuer Discovery — Apr 09, 2026

Back in February 1997, RFC 2104 introduced HMAC as a mechanism for authenticating messages based on a shared secret key.

Native Bucket Inventory: Up to 17x Faster Remote Bucket Listing — Apr 06, 2026

AIStore 4.3 introduces Native Bucket Inventory (NBI), a new mechanism for accelerating large remote-bucket listings by turning a repeatedly expensive operation into a local, reusable metadata path. In…

Parallel Download: 9x Lower Latency for Large-Object Reads — Mar 25, 2026

In AIStore 4.3, we introduced parallel download APIs to accelerate reads of large objects in an AIS cluster. Instead of pulling the entire object through one long sequential GET request stream, parall…

2025

The Many Lives of a Dataset Called ‘data’ — Dec 15, 2025

For whatever reason, a bucket called s3://data shows up with remarkable frequency as we deploy AIStore (AIS) clusters and populate them with user datasets. Likely for the same reason that `password …

Blob Downloader: Accelerate Remote Object Fetching with Concurrent Range-Reads — Nov 26, 2025

In AIStore 4.1, we extended blob downloader to leverage the chunked object representation and speed up fetching remote objects. T…

GetBatch API: faster data retrieval for ML workloads — Oct 06, 2025

ML training and inference typically operate on batches of samples or data items. To simplify such workflows, AIStore 4.0 introduces the GetBatch API.

Automated API Documentation Generation with GenDocs — Aug 29, 2025

Maintaining accurate and up-to-date HTTP API documentation is critical for the developer experience when building and debugging SDKs. Clear HTTP documentation saves developers from digging through AIS…

AIStore + HuggingFace: Distributed Downloads for Large-Scale Machine Learning — Aug 22, 2025

Machine learning teams increasingly rely on large datasets from HuggingFace to power their models. But traditional download tools struggle with terabyte-scale datasets conta…

Single-Object Copy/Transform Capability — Jul xx, 2025

In version 3.30, AIStore introduced a lightweight, flexible API to copy or transform a single object between buckets. It provides a simpler alternative to existing batch-style operations, ideal for fa…

The Perfect Line — Jul 26, 2025

I didn’t want to write this blog.

Single-Object Copy/Transform Capability — Jul 25, 2025

In version 3.30, AIStore introduced a lightweight, flexible API to copy or transform a single object between buckets. It provides a simpler alternative to existing batch-style operations, ideal for fa…

AIStore v3.28: Boost ETL Performance with Optimized Data Movement and Specialized Web Server Framework — May 15, 2025

The current state of the art involves executing data pre-processing, augmentation, and a wide variety of custom ETL workflows on individual client machines. This approach lacks scalability and often r…

AIStore Python SDK: Maintaining Resilient Connectivity During Lifecycle Events — Apr 02, 2025

In distributed systems, maintaining seamless connectivity during lifecycle events is a key challenge. If the cluster’s state changes while read operat…

Unified Rate Limiting: Frontend and Backend — Mar 19, 2025

AIStore v3.28 introduces a unified rate-limiting capability that works at both the frontend (client-facing) and backend (cloud-facing) layers. It enables proactive control to prevent hitting limit…

Comparing OCI’s Native Object Storage and S3 API Backends — Feb 26, 2025

The newly available support for Oracle Cloud Infrastructure (“OCI”) Object Storage was made

Split-brain is Inevitable — Feb 16, 2025

Split-brain is inevitable. The way it approaches varies greatly but there are telltale signs that, in hindsight, you wish you’d taken more seriously.

Arrival of native backed OCI Object Storage support — Feb 06, 2025

Oracle Cloud Infrastructure (“OCI”) has been supported via OCI’s Amazon S3 Compatibility

2024

Adding Data to AIStore — PUT Performance — Nov 22, 2024

AI training workloads primarily read data, and lots of it.

Enhancing ObjectFile Performance with Zero-Copy Techniques — Nov 21, 2024

In our previous blog post, we introduced ObjectFile, a resilient, file-like interface in the AIStore Python SDK des…

Resilient Data Loading with ObjectFile — Sep 26, 2024

Massively parallel loading of terabytes of data in a distributed system presents reliability challenges. This holds true even for data centers where network stability is supposed to be stellar. Consid…

Google Colab + AIStore: Easier Cloud Data Access for AI/ML Experiments — Sep 18, 2024

Working with data stored in cloud services like GCP, AWS, Azure, and OCI in Google Colab can be challenging. The entire process—from installing libraries and conf…

Accelerating AI Workloads with AIStore and PyTorch — Aug 28, 2024

As AI workloads are becoming increasingly demanding, our models need more and more data to train.[1] These massive datasets can overwhelm filesystems, both local and network-…

Initial Sharding of Machine Learning Datasets — Aug 16, 2024

Over the past decade, and especially in the last 3-4 years, the size of AI datasets has grown significantly, often exceeding the combined capacity of block storage devices that can be attached to a si…

Very large — May 20, 2024

The idea of extremely large is constantly shifting, evolving. As time passes by we quickly adopt its new numeric definition and only rarely, with a mild sense of amusement, recall the old one.

AIS on NFS — Mar 30, 2024

This is an excerpt from an article that I posted at storagetarget.com. The full text can be found at:

Maximizing Cluster Bandwidth with AIS Multihoming — Feb 16, 2024

Identifying bottlenecks in high-performance systems is critical to optimize the hardware and associated costs.

2023

AIStore as a Fast Tier Storage Solution: Enhancing Petascale Deep Learning Across Remote Cloud Backends — Nov 27, 2023

The challenges associated with loading petascale datasets, crucial for training models in both vision and language processing, pose significant hurdles in the field of deep learning. These datasets, o…

AIStore with WebDataset Part 3 — Building a Pipeline for Model Training — Jun 09, 2023

In the previous posts (pt1, pt2), we discu…

AIStore with WebDataset Part 2 — Transforming WebDataset Shards in AIS — May 11, 2023

Note: This blog post references init_code which has been removed and replaced with init_class. For the most up-to-date ETL initialization methods, please refer to the [init_class documentati…

AIStore with WebDataset Part 1 — Storing WebDataset format in AIS — May 08, 2023

Training AI models is expensive, so it’s important to keep GPUs fed with all the data they need as fast as they can consume it. WebDataset and AIStore each address different parts of this problem indi…

Transforming non-existing datasets — Apr 10, 2023

There’s an old trick that never quite gets old: you run a high-velocity exercise that generates a massive amount of traffic through some sort of a multi-part system, whereby some of those parts are (s…

AIStore SDK & ETL: Transform an image dataset with AIS SDK and load into PyTorch — Apr 03, 2023

Note: This blog post references init_code which has been removed and replaced with init_class. For the most up-to-date ETL initialization methods, please refer to the [init_class documentati…

2022

AIStore 3.12 Release Notes — Nov 13, 2022

This AIStore release, version 3.12, has been in development for almost four months. It includes a number of significant changes that can be further detailed and grouped as follows:

AIStore: Data Analysis w/ DataFrames — Aug 15, 2022

Dask is a new and flexible open-source Python library for parallel/distributed computing and optimized memory usage. Dask extends many of today’s popular Python libraries …

Python SDK: Getting Started — Jul 20, 2022

Python has grounded itself as a popular language of choice among data scientists and machine learning developers. Python’s recent popularity in the field can be attributed to Python’s general *ease-of…

PyTorch: Loading Data from AIStore — Jul 11, 2022

Note: The torchdata.datapipes module has been deprecated and removed in recent versions of

Promoting local and shared files — Mar 17, 2022

When it comes to working with files, the first question often is how? How to easily and quickly move or copy existing file datasets into AIS clusters?

What’s new in AIS v3.9 — Mar 15, 2022

AIS v3.9 is substantial productization and performance-improving release. Much of the codebase has been refactored for consistency, with micro…

2021

What’s new in AIS v3.8 — Dec 15, 2021

AIStore v3.8 is a significant upgrade delivering long-awaited features, stabilization fixes, and performance improvements. There’s also the cumula…

Copying existing file datasets in two easy steps — Dec 07, 2021

AIStore supports numerous ways to copy, download, or otherwise transfer existing datasets. Much depends on where is

AIStore & ETL: Using WebDataset to train on a sharded dataset (post #3) — Oct 29, 2021

Deprecated — WDTransform is no longer included as part of the AIS client, so this post only remains for educational purposes. ETL is in development and additional transformation tools will be inc…

AIStore & ETL: Using AIS/PyTorch connector to transform ImageNet (post #2) — Oct 22, 2021

The goal now is to deploy our first ETL and have AIStore run it on each storage node, harnessing the distributed power (and close to data - meaning, fast). For the problem statement, background an…

AIStore & ETL: Introduction (post #1) — Oct 21, 2021

AIStore (AIS) is a reliable lightweight storage cluster that deploys anywhere, runs user containers and functions, and scales linearly with no limitation. The deve…

Go: append a file to a TAR archive — Aug 10, 2021

AIStore supports a whole gamut of “archival” operations that allow to read, write, and list archives such as .tar, .tgz, and .zip. When we started working on appending content to existing archives…

Integrated Storage Stack for Training, Inference, and Transformations — Jul 30, 2021

In the end, the choice, like the majority of important choices, comes down to a binary: either this or that. Either you go to storage, or you don’t. Either you cache a dataset in question (and then tr…

AIStore: an open system for petascale deep learning — Jul 30, 2021

AIStore (or AIS) has been in development for more than three years so far and has accumulated a fairly long list of capabilities, all duly noted via release notes on the corresponding GitHub pages. At…