For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Blog
DocsAPI Reference
DocsAPI Reference
    • AIStore
    • Documentation
  • Core Documentation
    • In-depth Overview
    • Terminology and core abstractions
    • Getting Started
    • Networking model
    • Buckets: design, operations, namespaces, and system buckets
    • Observability overview
    • CLI overview
    • Production deployment
    • Technical Blog
  • APIs, SDKs, and Compatibility
    • Go API
    • Python SDK
    • PyPI package
    • Python SDK reference guide
    • PyTorch integration
    • TensorFlow integration
    • HTTP API reference
    • curl examples
    • Easy URL
    • S3 compatibility
    • s3cmd quick start
    • Presigned S3 requests
    • Boto3 support
  • Command-Line Interface
    • CLI overview
    • ais help
    • CLI reference guide
    • Bucket operations
    • Cluster and remote-cluster management
    • Storage and mountpath management
    • Monitoring and ais show
    • Downloads
    • Jobs
    • Authentication and access control
    • Configuration via CLI
    • ETL CLI
    • Distributed shuffle CLI
    • ML / get-batch CLI
    • GCP credentials
    • TLS certificate management
  • Storage and Data Management
    • Storage services
    • Buckets: design, operations, namespaces, and system buckets
    • Native Bucket Inventory (NBI)
    • Backend providers
    • On-disk layout
    • Virtual directories
    • System files
    • Evicting remote buckets and cached data
  • Cluster Operations
    • Node lifecycle: maintenance, shutdown, decommission
    • Global rebalance
    • Resilver
    • AIS in Containerized Environments
    • Highly available control plane
    • Information Center (IC)
    • Out-of-band updates
    • Troubleshooting
  • Configuration and Security
    • Configuration
    • Environment variables
    • Feature flags
    • AuthN and access control
    • Authentication validation
    • HTTPS and certificates
    • Switching a cluster to HTTPS
  • ETL and Advanced Workflows
    • ETL overview
    • ETL CLI docs
    • ETL Python SDK examples
    • Custom transformers
    • ETL Python webserver SDK
    • ETL Go webserver package
    • Archives: read, write, and list
    • Distributed shuffle (dsort)
    • Initial sharding utility (ishard)
    • Downloader
    • Blob Downloader
    • Batch object retrieval (get-batch)
    • Batch operations
    • Tools and utilities
    • Extended actions (xactions)
  • Observability, Monitoring, and Performance
    • Observability overview
    • Monitoring with CLI
    • Logs
    • Prometheus integration
    • Metrics reference
    • Grafana dashboards
    • Kubernetes monitoring
    • Distributed tracing
    • Monitoring get-batch
    • AIS load generator (aisloader)
    • Benchmarking AIStore
    • Performance tuning and testing
    • Performance monitoring via CLI
    • Rate limiting
    • Checksumming
    • Filesystem Health Checker (FSHC)
    • Traffic patterns
  • Networking
    • Networking: multi-homing, network separation, IPv6
    • HTTPS configuration
    • Switching to HTTPS
    • Idle connections
    • MessagePack protocol
  • Deployment
    • AIStore on Kubernetes
    • Kubernetes Operator
    • Ansible playbooks
    • Helm charts
    • Deployment monitoring
    • Docker
  • Developer Resources
    • Development guide
    • aisnode command line
    • Build tags
  • Object and Bucket Naming
    • Unicode and special symbols in object and bucket names
    • Extremely long object names
Blog
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoAIStore
On this page
  • Introduction
  • Remote AIS cluster
  • Unified Global Namespace
  • Cloud object storage
  • Example: accessing Cloud storage via remote AIS
  • Step 1: show remote cluster
  • Step 2: list s3://data directly via remote cluster
  • Step 3: redirect “local” ais://nnn to Cloud-based s3://data via remote AIS cluster
  • Step 4: finally, list s3//data via “local” ais://nnn
  • HTTP(S) based dataset
  • Footnotes
Storage and Data Management

Backend providers

||View as Markdown|
Previous

Native Bucket Inventory (NBI)

Next

On-disk layout

Introduction

Terminology first:

Backend Provider is a designed-in backend interface abstraction and, simultaneously, an API-supported option that allows to delineate between “remote” and “local” buckets with respect to a given AIS cluster.

AIStore natively integrates with multiple backend providers:

BackendSchema(s)Description
aisais://, ais://@remote_uuidAIStore bucket provider, can also refer to remote AIS cluster
awsaws://, s3://Amazon Cloud Storage
azureazure://, az://Azure Cloud Storage
gcpgcp://, gs://Google Cloud Storage
ocioc://, oci://Oracle Cloud Storage1
htht://HTTP(S) based dataset

Native integration, in turn, implies:

  • utilizing vendor’s SDK libraries to operate on the respective remote backends;
  • providing unified namespace (where, e.g., two same-name buckets from different backends can co-exist with no conflicts);
  • on-the-fly populating AIS own bucket metadata with the properties of remote buckets.

The last bullet deserves a little more explanation. First, there’s a piece of cluster-wide metadata that we call BMD. Like every other type of metadata, BMD is versioned, checksummed, and replicated - the process that is carried out by the currently elected primary.

BMD contains all bucket definitions and per-bucket configurable management policies - local and remote.

Here’s what happens upon the very first (read or write or list, etc.) access to a remote bucket that is not yet in the BMD:

  1. Behind the scenes, AIS will try to confirm the bucket’s existence and accessibility.
  2. If confirmed, AIS will atomically add the bucket to the BMD (along with its remote properties).
  3. Once all of the above is set and done, AIS will go ahead to perform that original (read or write or list, etc.) operation

There are advanced-usage type options to skip Steps 1. and 2. above - see e.g. LisObjsMsg flags

The full taxonomy of the supported backends is shown below (and note that AIS supports itself on the back as well):

Supported Backends

For types of supported buckets (AIS, Cloud, remote AIS, etc.), bucket identity, properties, lifecycle, and associated policies, storage services and usage examples, see the comprehensive:

  • AIS Buckets: Design and Operations

And further:

  • For API reference, see APIs.
  • For AIS command-line management tool, see CLI.

Remote AIS cluster

In addition to the listed above 3rd party Cloud storages and non-Cloud HTTP(S) backend, any given pair of AIS clusters can be organized in a way where one cluster would be providing fully-accessible backend to another.

Terminology:

TermComment
attach remote clusterAllow one cluster to see remote datasets, cache those datasets on demand, copy remote buckets, list, create, and destroy remote buckets, read and write remote buckets, etc.
detach remote clusterOperation that (as the name implies) removes the corresponding attachment
aliasAn optional user-friendly alias that can be assigned at attachment time and be further used in all subsequent operations instead of the remote cluster’s UUID
global namespaceRefers to the capability to unambiguously indicate and access any dataset in an arbitrary network (or DAG, to be precise) of AIS clusters whereby some clusters are attached to another ones. By attaching AIS clusters we are, effectively and ad-hoc, forming a unified global namespace of all individually hosted datasets.

Example working with remote AIS cluster (as well as easy-to-use scripts) can be found in the README for developers.

Unified Global Namespace

Examples first. The following two commands attach and then show remote cluster at the addressmy.remote.ais:51080:

1$ ais cluster remote-attach alias111=http://my.remote.ais:51080
2Remote cluster (alias111=http://my.remote.ais:51080) successfully attached
3$ ais show remote-cluster
4UUID URL Alias Primary Smap Targets Online
5eKyvPyHr my.remote.ais:51080 alias111 p[80381p11080] v27 10 yes

Notice two aspects of this:

  • user-defined aliasing whereby a user can assign an arbitrary name (aka alias) to a given remote cluster
  • the remote cluster does not have to be online at attachment time; offline or currently not reachable clusters are shown as follows:
1$ ais show remote-cluster
2UUID URL Alias Primary Smap Targets Online
3eKyvPyHr my.remote.ais:51080 alias111 p[primary1] v27 10 no
4<alias222> <other.remote.ais:51080> n/a n/a n/a no

Notice the difference between the first and the second lines in the printout above: while both clusters appear to be currently offline (see the rightmost column), the first one was accessible at some earlier time and therefore we do show that it has (in this example) 10 storage nodes and other details.

To detach any of the previously configured association, simply run:

1$ ais cluster remote-detach alias111
2$ ais show remote-cluster
3UUID URL Alias Primary Smap Targets Online
4<alias222> <other.remote.ais:51080> n/a n/a n/a no

Configuration-wise, the following two examples specify a single-URL and multi-URL attachments that can be also be configured prior to runtime (or can be added at runtime via the ais remote attach CLI as shown above):

  • Example: single URL

    1"backend": {
    2 "ais": {
    3 "remote-cluster-alias": ["http://10.233.84.233:51080"]
    4 }
    5}
  • Example: multiple URL

    1"backend": {
    2 "ais": {
    3 "remote-cluster-alias": [
    4 "http://10.233.84.217",
    5 "https://cluster.aistore.org"
    6 ]
    7 }
    8}

Multiple remote URLs can be provided for the same typical reasons that include fault tolerance. However, once connected we will rely on the remote cluster map to retry upon connection errors and load balance.

For more usage examples, please see:

  • working with remote AIS clusters
  • example: minimal remote cluster

And one more comment:

You can run ais cluster remote-attach and/or ais show remote-cluster CLI to refresh remote configuration: check availability and reload remote cluster maps.

In other words, repeating the same ais cluster remote-attach command will have the side effect of refreshing all the currently configured attachments. Or, use ais show remote-cluster CLI for the same exact purpose.

Cloud object storage

Cloud-based object storage include:

  • aws - Amazon S3
  • azure - Microsoft Azure Blob Storage
  • gcp - Google Cloud Storage
  • oci - Oracle Cloud Storage1

In each case, we use the vendor’s own SDK/API to provide transparent access to Cloud storage with the additional capability of persistently caching all read data in the AIStore’s remote buckets.

The term “persistent caching” is used to indicate much more than what’s conventionally understood as “caching”: irrespectively of its origin and source, all data inside an AIStore cluster is end-to-end checksummed and protected by the storage services configured both globally and on a per bucket basis. For instance, both remote buckets and ais buckets can be erasure coded, etc.

Notwithstanding, remote buckets will often serve as a fast cache or a fast tier in front of a given 3rd party Cloud storage.

Note that AIS provides multiple easy ways to populate its remote buckets, including - but not limited to - conventional on-demand, self-populating, dubbed cold GET.

Example: accessing Cloud storage via remote AIS

There are, essentially, two different capabilities:

  • attach other AIS clusters
  • redirect AIS bucket to read, write, and otherwise operate on a different bucket

Here’s a quick and commented example where we access (e.g.) s3://data indirectly, via another bucket called ais://nnn.

Notice that the cluster that contains ais://nnn does no necessarily has to have AWS credentials to access s3://data.

Step 1: show remote cluster

1$ ais show remote-cluster
2
3UUID URL Alias Primary Smap Targets Uptime
4A9A78a_cSc http://aistore:51080 remais v2145 4 64d23h

Step 2: list s3://data directly via remote cluster

1$ AIS_ENDPOINT=http://aistore:51080 ais ls s3://data
2
3NAME SIZE
4aaa/bbb/ccc 16.26KiB
5aaa/bbb/eee 16.26KiB
6aaa/ddd 16.26KiB
7aaabbb 16.26KiB
8aaaccc 16.26KiB
9bbb/111 16.26KiB
10ttt/hhh 16.26KiB
11ttt/qqq 16.26KiB

Step 3: redirect “local” ais://nnn to Cloud-based s3://data via remote AIS cluster

1$ ais bucket props set ais://nnn <TAB-TAB>
2
3backend_bck.name checksum.validate_cold_get lru.enabled ec.bundle_multiplier features
4backend_bck.provider checksum.validate_warm_get mirror.copies ec.data_slices write_policy.data
5versioning.enabled checksum.validate_obj_move mirror.burst_buffer ec.parity_slices write_policy.md
6versioning.validate_warm_get checksum.enable_read_range mirror.enabled ec.enabled
7versioning.synchronize lru.dont_evict_time ec.objsize_limit ec.disk_only
8checksum.type lru.capacity_upd_time ec.compression access
9
10$ ais bucket props set ais://nnn backend_bck=s3://@A9A78a_cSc/data
11
12"backend_bck.name" set to: "data" (was: "")
13"backend_bck.provider" set to: "aws" (was: "")
14
15Bucket props successfully updated.

Note that attached clusters have (human-readable) aliases that often may be easier to use, e.g.:

1$ ais bucket props set ais://nnn backend_bck=s3://@remais/data

In other words, actual cluster UUID (A9A78a_cSc above) and its alias (remais) can be used interchangibly.

Step 4: finally, list s3//data via “local” ais://nnn

1$ ais ls ais://nnn
2
3aaa/bbb/ccc 16.26KiB
4aaa/bbb/eee 16.26KiB
5aaa/ddd 16.26KiB
6aaabbb 16.26KiB
7aaaccc 16.26KiB
8bbb/111 16.26KiB
9ttt/hhh 16.26KiB
10ttt/qqq 16.26KiB

HTTP(S) based dataset

AIS bucket may be implicitly defined by HTTP(S) based dataset, where files such as, for instance:

  • https://a/b/c/imagenet/train-000000.tar
  • https://a/b/c/imagenet/train-123456.tar
  • …
  • https://a/b/c/imagenet/train-999999.tar

would all be stored in a single AIS bucket that would have a protocol prefix ht:// and a bucket name derived from the directory part of the URL Path (“a/b/c/imagenet”, in this case).

WARNING: Currently HTTP(S) based datasets can only be used with clients which support an option of overriding the proxy for certain hosts (for e.g. curl ... --noproxy=$(curl -s G/v1/cluster?what=target_ips)). If used otherwise, we get stuck in a redirect loop, as the request to target gets redirected via proxy.

Footnotes

  1. Note: OCI support is currently experimental and may have limited functionality or stability. ↩ ↩2