AIS Buckets: Design and Operations
AIS Buckets: Design and Operations
AIS Buckets: Design and Operations
A bucket is a named container for objects - monolithic files or chunked representations - with associated metadata.
Buckets are the primary unit of data organization and policy application in AIStore (AIS).
Object metadata includes checksum, version, size, access time, replica/EC placement, unique bucket ID (
BID), and custom user-defined attributes. For remote buckets, AIS may also store backend-specific metadata such as ETag, LastModified timestamps, backend version identifiers, and provider checksums when available.Metadata v2 includes additional flags used by AIS features (for example, chunked object representation).
AIS uses a flat hierarchy: bucket-name/object-name key space. It supports virtual directories through prefix-based naming with recursive and non-recursive operations.
This document is organized in two parts:
AIS does not treat a bucket as a passive container. A bucket is a logical namespace that AIS materializes lazily (for remote backends), configures dynamically, and manages cluster-wide.
Table of Contents
The idea is to provide a unified storage abstraction. Instead of maintaining different APIs for in-cluster storage, Cloud providers, other remote backends - AIS exposes everything through a single, consistent bucket abstraction.
The design goals were (and remain):
Users interact with buckets uniformly, regardless of where they live:
The provider and namespace differentiate the backend; the API stays the same.
Another core design goal was to eliminate boilerplate: if a bucket exists in the remote backend (Cloud, Remote AIS, etc.) and is accessible, AIS makes it immediately usable. Remote buckets are added lazily, on first reference, without a separate creation step.
Explicit creation is supported when additional control is required - credentials, endpoints, namespaces, or properties that must be set before first access.
Further details - in section Bucket Lifecycle below.
Once added to BMD, a bucket’s identity becomes cluster-wide and immutable:
Identity = Provider + Namespace + Name
AIS never guesses or rewrites identity. s3://#ns1/bucket and s3://#ns2/bucket are distinct buckets.
Example: S3 bucket with namespace
Example: AIS bucket with backend
Bucket names are limited to 64 bytes and may contain only letters, digits, dashes (-), underscores (_), and single dots (.). Consecutive dots (..) are not allowed.
Names that start with . are reserved for system buckets. User-defined buckets therefore cannot use a leading dot, and any unrecognized .-prefixed name is rejected.
Indicates the storage backend:
Remote AIS clusters use the ais provider with a namespace referencing the cluster alias or UUID:
Namespaces disambiguate buckets that share the same name.
Originally, all cloud buckets had an implicit global namespace. That model breaks when:
Namespaces fix this:
These resolve to:
They are independent in every way - separate BMD entries, credentials, and on-disk paths.
Note: The
Nsstruct has two fields: UUID (for remote AIS clusters) and Name (for logical namespaces). For cloud buckets, namespace identifier (e.g., #prod in s3://#prod/bucket) enables multiple same-name buckets with different credentials or endpoints.
For remote AIS clusters, the namespace additionally carries the cluster’s UUID:
Note: The bucket namespace you choose - whether it represents an AWS profile, a GCS account, or simply a human-readable alias - becomes part of the bucket’s physical on-disk path. What starts as a logical identifier materializes into on-disk naming structure.
Bucket properties - stored in BMD, inherited from cluster config, overridable per-bucket - control data protection (checksums, EC, mirroring), chunked representation, versioning and synchronization with remote sources, LRU eviction, rate limiting, access permissions, provider-specific settings, and more.
The properties:
ais bucket props set or the corresponding Go or Python API;At the top level:
Feature flags are a 64-bit bitmask controlling assorted runtime behaviors. Most flags are cluster-wide, but a subset can be configured per-bucket.
For the full list, see this separate Feature Flags document.
integrity+ - enhances data safetyintegrity- - trades safety for performanceperf - performance optimizationoverhead - may impact performances3,compat - S3 compatibilitySome flags are mutually exclusive. For example,
Disable-Cold-GETandStreaming-Cold-GETcannot both be set - the system will reject the configuration. For complete details on all feature flags (cluster-wide and bucket-level), see Feature Flags.
The distinction between implicit bucket discovery and explicit creation is best summarized by the AIS CLI itself.
When you run ais create --help, it outlines the specific scenarios where ‘on-the-fly’ discovery isn’t enough:
On first reference:
AIS:
cmn.Bck)HEAD(bucket) to validate accessThis behavior is foundational, motivated by removing the operational overhead of bucket management.
Invoked with:
AIS:
HEAD (unless --skip-lookup or bucket already in BMD)Use --skip-lookup when default credentials cannot access the bucket:
AIS buckets:
Destroys the bucket and all objects permanently.
Cloud buckets:
Removes AIS state (BMD entry, cached objects). Cloud data remains untouched.
Eviction options:
Eviction is namespace-aware:
See also: Three Ways to Evict Remote Bucket
Namespaces solve real-world scenarios that global namespace cannot handle:
Examples:
Note: The bucket namespace you choose - whether it represents an AWS profile, a GCS account, or simply a human-readable alias - becomes part of the bucket’s physical on-disk path.
Metadata-wise, each bucket receives:
Bprops)AIS clusters can attach to each other, forming a global namespace of distributed datasets.
The alias resolves to the remote cluster’s UUID, stored in the namespace:
See also: Remote AIS Cluster
Backend buckets represent indirection - an AIS bucket that proxies to a remote bucket. This is fundamentally different from namespaces.
Note: See section Working with Same-Name Remote Buckets below for further guidelines and usage examples.
Now reads/writes to ais://cache transparently forward to s3://origin.
Hot cache for cold storage:
Dataset aliasing:
Access control:
Cached objects remain in the AIS bucket.
See also: Backend Bucket CLI examples
AIS distinguishes between user buckets and system buckets.
User buckets are created by users (or lazily on first remote access) and follow standard naming rules: alphanumeric characters, dashes, underscores, and single dots are allowed, up to 64 characters.
System buckets are AIS-internal infrastructure. They are created automatically when needed and are identified by a reserved dot-prefix: names starting with . are reserved for system use. Any attempt to create a user bucket with a .-prefixed name is rejected unless it matches a known system bucket.
The current naming convention is .sys-*. The first system bucket is:
Bucket names are limited to 64 bytes and may contain only letters, digits, dashes (-), underscores (_), and single dots (.). Consecutive dots (..) are not allowed.
Names that start with . are reserved for system buckets. User-defined buckets therefore cannot use a leading dot, and any unrecognized .-prefixed name is rejected.
System buckets are visible in regular ais ls output and can be listed and read with appropriate permissions. They are not intended for direct user writes - AIS creates and destroys them behind the scenes, and manages their content.
System buckets are created on demand (for example, ais://.sys-inventory is created on the fly upon the first inventory creation request) and follow the same cluster-wide replication and metasync lifecycle as user buckets.
The .sys-* namespace is designed to accommodate additional AIS-internal services over time. Planned and potential uses include:
ais://.sys-inventoryTable of Contents
A common scenario: you have buckets with identical names across different AWS accounts, S3-compatible endpoints, or cloud providers. AIS handles this two ways.
Create each bucket with its own namespace and credentials:
Now s3://#prod/data and s3://#dev/data are distinct buckets - separate BMD entries, separate on-disk paths, separate credentials. Access them directly:
Create AIS buckets that front the remote buckets:
Access through the AIS buckets:
Namespaces give you direct access with minimal overhead. Backend buckets add a layer of indirection but unlock full AIS bucket capabilities - LRU, mirroring, erasure coding, and transformation pipelines.
Note that Option B requires the namespaced S3 bucket to exist first. You can’t skip straight to backend_bck=s3://data with custom credentials - AIS needs to resolve the backend bucket, which requires proper credentials already in place. Create the namespaced cloud bucket first, then front it with an AIS bucket if needed.
See also: AWS Profiles and S3 Endpoints
See also: GCP Per-Bucket Credentials
AIS clusters can be attached to each other, forming a global namespace of all individually hosted datasets. For background and configuration details, see Remote AIS Cluster.
Proactively fetch objects from remote storage into AIS cache:
Remove cached objects (cloud data untouched):
Note: The terms “cached” and “in-cluster” are used interchangeably. A “cached” object is one that exists in AIS storage regardless of its origin.
Bucket access is controlled by a 64-bit access property. Bits map to operations:
Note: When enabled, access permissions are enforced by AIS and apply to both local and backend operations; misconfiguration can block cold GETs or deletes. See version 4.1 release notes for additional pointers on the topics of authentication and security.
See also: Authentication and Access Control
See also: AWS Profiles and S3 Endpoints
ListObjects and ListObjectsPage API (Go and Python) return object names and properties. For large - many millions of objects - buckets we strongly recommend the “paginated” version of the API.
AIS CLI supports both - a quick glance at ais ls --help will provide an idea of all (numerous) supported options.
As always with AIS CLI, a quick look at the command’s help (ais ls --help in this case) may save time.
More basic examples follow below:
Request specific properties with --props:
For large buckets, results are paginated:
See also: CLI: List Objects
All operations respect namespaces. ais ls s3://#ns1/bucket and ais ls s3://#ns2/bucket operate on different buckets.
Note: This section is provided for advanced troubleshooting and debugging only.
The bucket identity you specify in CLI or API - provider, namespace, bucket name - materializes as directory structure on every mountpath. This isn’t just metadata; it’s physical layout.
Say, we have an S3 bucket called s3://dataset, and an object images/cat.jpg in it. Given two different bucket namespaces, the respective FQNs inside AIStore may look like:
where:
Note: disk partitioning not recommended, may degrade performance.
The namespace you choose - whether it maps to an AWS profile, a SwiftStack account, or just a human-readable tag like #prod - becomes a physical directory on every target node. This guarantees:
s3://#acct1/data and s3://#acct2/data never share storage pathsWhat starts as a logical identifier in ais create s3://#prod/bucket ends up as /mpath/@aws/#prod/bucket/ on disk.
For details, see On-Disk Layout document.