Bucket operations
Background and Introduction
A bucket is a named container for objects - monolithic files or chunked representations - with associated metadata. It is the fundamental unit of data organization and data management.
AIS buckets are categorized by their provider and origin. Native ais:// buckets managed by this cluster are always created explicitly (via ais create or the respective Go and/or Python APIs).
Remote buckets (including s3://, gs://, etc., and ais:// buckets in remote AIS clusters) are usually discovered and auto-added on-the-fly on first access.
In a cluster, every bucket is assigned a unique, cluster-wide bucket ID (BID). Same-name remote buckets with different namespaces get different IDs.
Every object a) belongs to exactly one bucket and b) is identified by a unique name within that bucket.
Bucket properties define data protection (checksums, mirroring, erasure coding), chunked representation, versioning and synchronization with remote sources, access control, backend linkage, feature flags, rate-limit settings, and more.
For types of supported buckets (AIS, Cloud, remote AIS, etc.), bucket identity, properties, lifecycle, and associated policies, storage services and usage examples, see the comprehensive:
It is easy to see all CLI operations on buckets:
For convenience, a few of the most popular verbs are also aliased:
Table of Contents
- Create bucket
- Delete bucket
- List buckets
- List objects
- Evict remote bucket
- Move or Rename a bucket
- Copy (list, range, and/or prefix) selected objects or entire (in-cluster or remote) buckets
- Show bucket summary
- Start N-way Mirroring
- Start Erasure Coding
- Show bucket properties
- Set bucket properties
- Archive multiple objects
- Show and set AWS-specific properties
- Reset bucket properties to cluster defaults
- Show bucket metadata
Create bucket
ais create BUCKET [BUCKET...]
Create bucket(s).
Examples
Create AIS bucket
Create buckets bucket_name1 and bucket_name2, both with AIS provider.
Create AIS bucket in local namespace
Create bucket bucket_name in ml namespace.
Create bucket in remote AIS cluster
Create bucket bucket_name in global namespace of AIS remote cluster with Bghort1l UUID.
Create bucket bucket_name in ml namespace of AIS remote cluster with Bghort1l UUID.
Create bucket with custom properties
Create bucket bucket_name with custom properties specified.
Incorrect buckets creation
See also
Delete bucket
ais bucket rm BUCKET [BUCKET...]
Delete an ais bucket or buckets.
Examples
Remove AIS buckets
Remove AIS buckets bucket_name1 and bucket_name2.
Remove AIS bucket in local namespace
Remove bucket bucket_name from ml namespace.
Remove bucket in remote AIS cluster
Remove bucket bucket_name from global namespace of AIS remote cluster with Bghort1l UUID.
Remove bucket bucket_name from ml namespace of AIS remote cluster with Bghort1l UUID.
Incorrect buckets removal
Removing remote buckets is not supported.
List buckets
ais ls PROVIDER:[//BUCKET_NAME] [command options]
Notice the optional [//BUCKET_NAME]. When there’s no bucket, ais ls will list buckets. Otherwise, it’ll list objects.
Usage
Assorted options
The options are numerous. Here’s a non-exhaustive list (for the most recent update, run ais ls --help)
ais ls --regex "ngn*"
List all buckets matching the ngn* regex expression.
ais ls aws: or (same) ais ls s3
List all existing buckets for the specific provider.
ais ls aws --all or (same) ais ls s3: --all
List absolutely all buckets that cluster can “see” including those that are not necessarily present in the cluster.
ais ls ais:// or (same) ais ls ais
List all AIS buckets.
ais ls ais://#name
List all buckets for the ais provider and name namespace.
ais ls ais://@uuid#namespace
List all remote AIS buckets that have uuid#namespace namespace. Note that:
- the
uuidmust be the remote cluster UUID (or its alias) - while the
namespaceis optional name of the remote namespace
As a rule of thumb, when a (logical) #namespace in the bucket’s name is omitted we use the global namespace that always exists.
List objects
ais ls is one of those commands that only keeps growing, in terms of supported options and capabilities.
The command:
ais ls PROVIDER:[//BUCKET_NAME] [command options]
can conveniently list buckets (with or without “summarizing” object counts and sizes) and objects.
Notice the optional [//BUCKET_NAME]. When there’s no bucket, ais ls will list buckets. Otherwise, it’ll list objects.
The command’s inline help is also quite extensive, with (inline) examples followed by numerous supported options:
Assorted options
Footer Information:
When listing objects, a footer will be displayed showing:
- Total number of objects listed
- For remote buckets with
--cachedoption: number of objects present in-cluster - For
--pagedoption: current page number - For
--count-onlyoption: time elapsed to fetch the list
Examples of footer variations:
Listed 12345 namesListed 12345 names (in-cluster: 456)Page 123: 1000 names (in-cluster: none)
Examples
List AIS and Cloud buckets with all defaults
1. List objects in the AIS bucket bucket_name.
2. List objects in the remote bucket bucket_name.
3. List objects from a remote AIS cluster with a namespace:
4. List objects with paged output (showing page numbers):
5. List cached objects from a remote bucket:
6. Count objects in a bucket:
7. Count objects with paged output:
Notes:
- When using
--pagedwith remote buckets, the footer will show both page number and in-cluster object count when applicable - The
--diffoption requires remote backends supporting some form of versioning (e.g., object version, checksum, and/or ETag) - For more information on working with archived content, see docs/archive.md
- To fully synchronize in-cluster content with remote backend, see documentation on out-of-band updates
Include all properties
List bucket from AIS remote cluster
List objects in the bucket bucket_name and ml namespace contained on AIS remote cluster with Bghort1l UUID.
With prefix
List objects which match given prefix.
Bucket inventory
Here’s a quick 4-steps sequence to demonstrate the functionality:
1. In the beginning, the bucket is accessible (notice --all) and empty, as far as its in-cluster content
2. The first (remote) list-objects will have the side-effect of loading remote inventory
3. The second and later list-objects will run much faster
4. Finally, observe that at in-cluster content now includes the inventory (.csv) itself
List archived content
List anonymously (i.e., list public-access Cloud bucket)
Use ‘—prefix’ that crosses shard boundary
For starters, we archive all aistore docs:
To list a certain virtual subdirectory inside this newly created shard:
or, same:
Evict remote bucket
AIS supports multiple storage backends:
See Unified Namespace for details on remote AIS clusters.
One major distinction between an AIS bucket (e.g., ais://mybucket) and a remote bucket (e.g., ais://@cluster/mybucket, s3://dataset, etc.) boils down to the fact that - for a variety of real-life reasons - in-cluster content of the remote bucket may be different from its remote content.
Note that the terms in-cluster and cached are used interchangeably throughout the entire documentation and CLI.
Remote buckets can be prefetched and evicted from AIS, entirely or selectively:
Some of the supported functionality can be quickly demonstrated with the following examples:
Here’s a more complete example that lists remote bucket, then reads and evicts a given object:
See also
Move or Rename a bucket
ais bucket mv BUCKET NEW_BUCKET
Move (ie. rename) an AIS bucket.
If the NEW_BUCKET already exists, the mv operation will not proceed.
Cloud bucket move is not supported.
Examples
Move AIS bucket
Move AIS bucket bucket_name to AIS bucket new_bucket_name.
Copy (list, range, and/or prefix) selected objects or entire (in-cluster or remote) buckets
ais cp SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] DST_BUCKET [command options]
Source bucket must exist. When the destination bucket is remote (e.g. in the Cloud) it must also exist and be writeable.
NOTE: there’s no requirement that either of the buckets is present in aistore.
NOTE: not to confuse in-cluster presence and existence. Remote object may exist (remotely), etc.
NOTE: to fully synchronize in-cluster content with remote backend, please refer to out of band updates.
Moreover, when the destination is AIS (ais://) or remote AIS (ais://@remote-alias) bucket, the existence is optional: the destination will be created on the fly, with bucket properties copied from the source (SRC_BUCKET).
NOTE: similar to delete, evict and prefetch operations,
cpalso supports embedded prefix - see disambiguating multi-object operation
Finally, the option to copy remote bucket onto itself is also supported - syntax-wise. Here’s an example that’ll shed some light:
Incidentally, notice the
--cacheddifference:
Examples
Copy non-existing remote bucket to a non-existing in-cluster destination
Copy AIS bucket
Copy AIS bucket src_bucket to AIS bucket dst_bucket.
Copy AIS bucket and wait until the job finishes
The same as above, but wait until copying is finished.
Copy cloud bucket to another cloud bucket
Copy AWS bucket src_bucket to AWS bucket dst_bucket.
Use (list, range, and/or prefix) options to copy selected objects
Example 1. Copy objects obj1.tar and obj1.info from bucket ais://bck1 to ais://bck2, and wait until the operation finishes
Example 2. Copy objects matching Bash brace-expansion obj{2..4}, do not wait for the operation is done.
Example 3. Use --sync option to copy remote virtual subdirectory
In the example, --sync synchronizes destination bucket with its remote (e.g., Cloud) source.
In particular, the option will make sure that aistore has the latest versions of remote objects and may also entail removing of the objects that no longer exist remotely
See also
Example copying buckets
This example demonstrates how to copy objects between buckets using the AIStore CLI, and how to monitor the progress of the copy operation. AIStore supports all possible permutations of copying: Cloud to AIStore, Cloud to another (or same) Cloud, AIStore to Cloud, and between AIStore buckets.
To copy all objects with a common prefix from an S3 bucket to an AIStore bucket:
Note: The “Warning” message is benign and will only appear if the destination bucket does not exist.
Monitoring progress
You can monitor the progress of the copy operation using the ais show job copy command. Add the --refresh flag followed by a time in seconds to get automatic updates:
The output shows statistics for each node in the AIStore cluster:
- NODE: The name of the node
- ID: The job ID
- KIND: The type of operation
- SRC BUCKET: Source bucket
- DST BUCKET: Destination bucket
- OBJECTS: Number of objects processed
- BYTES: Amount of data transferred
- START: Job start time
- END: Job end time (empty if job is still running)
- STATE: Current job state
The output also includes a “Total” row at the bottom that provides cluster-wide aggregated values for the number of objects processed and bytes transferred. The checkmark (✓) indicates that all nodes are reporting byte statistics.
Stopping all jobs
To stop all in-progress jobs:
In our example, there’d be a single job ID
tco-goDbhCxtf
Example copying buckets and multi-objects with simultaneous synchronization
There’s a script that we use for testing. When run, it produces the following output:
The script executes a sequence of steps (above).
Notice a certain limitation (that also shows up as the last step #15):
-
As of the version 3.22, aistore
cpcommands will always synchronize deleted and updated remote content. -
However, to see an out-of-band added content, you currently need to run multi-object copy, with multiple source objects specified using
--listor--template.
See also
ais cp --helpfor the most recently updated options- to fully synchronize in-cluster content with remote backend, please refer to out of band updates
Show bucket summary
`ais storage summary PROVIDER:[//BUCKET_NAME] - show bucket sizes and the respective percentages of used capacity on a per-bucket basis [command options]
ais bucket summary - same as above.
Options
If BUCKET is omitted, the command applies to all AIS buckets.
The output includes the total number of objects in a bucket, the bucket’s size (bytes, megabytes, etc.), and the percentage of the total capacity used by the bucket.
A few additional words must be said about --validate. The option is provided to run integrity checks, namely: locations of objects, replicas, and EC slices in the bucket, the number of replicas (and whether this number agrees with the bucket configuration), and more.
Location of each stored object must at any point in time correspond to the current cluster map and, within each storage target, to the target’s mountpaths. A failure to abide by location rules is called misplacement; misplaced objects - if any - must be migrated to their proper locations via automated processes called
global rebalanceandresilver:
Notes
--validatemay take considerable time to execute (depending, of course, on sizes of the datasets in question and the capabilities of the underlying hardware); non-zero misplaced objects in the (validated) output is a direct indication that the cluster requires rebalancing and/or resilvering; an alternative way to execute validation is to runais storage validateor (simply)ais scrub:
For details and additional examples, please see:
Examples
Start N-way Mirroring
ais start mirror BUCKET --copies <value>
Start an extended action to bring a given bucket to a certain redundancy level (value copies). Read more about this feature here.
Options
Start Erasure Coding
ais start ec-encode BUCKET --data-slices <value> --parity-slices <value>
Start an extended action that encodes and recovers all objects and slices in a given bucket. The action enables erasure coding if it is disabled, and runs the encoding for all objects in the bucket in the background. If erasure coding for the bucket was enabled beforehand, the extended action recovers missing objects and slices if possible.
In case of running the extended action for a bucket that has already erasure coding enabled, you must pass the correct number of parity and data slices in the command-line.
Run ais bucket props show <bucket-name> ec to get the current erasure coding settings.
Read more about this feature here.
Options
All options are required and must be greater than 0.
Show bucket properties
Overall, the topic called “bucket properties” is rather involved and includes sub-topics “bucket property inheritance” and “cluster-wide global defaults”. For background, please first see:
Now, as far as CLI, run the following to list properties of the specified bucket. By default, a certain compact form of bucket props sections is presented.
ais bucket props show BUCKET [PROP_PREFIX] [command options]
When PROP_PREFIX is set, only props that start with PROP_PREFIX will be displayed.
Useful PROP_PREFIX are: access, checksum, ec, lru, mirror, provider, versioning.
ais bucket showis an alias forais show bucket- both can be used interchangeably.
Options
Examples
Show bucket props with provided section
Show only lru section of bucket props for bucket_name bucket.
Set bucket properties
ais bucket props set [OPTIONS] BUCKET JSON_SPECIFICATION|KEY=VALUE [KEY=VALUE...]
Set bucket properties. For the available options, see bucket-properties.
If JSON_SPECIFICATION is used, all properties of the bucket are set based on the values in the JSON object.
Options
When JSON specification is not used, some properties support user-friendly aliases:
Examples
Enable mirroring for a bucket
Set the mirror.enabled and mirror.copies properties to true and 2 respectively, for the bucket bucket_name
Make a bucket read-only
Set read-only access to the bucket bucket_name.
All PUT and DELETE requests will fail.
Configure custom AWS S3 endpoint
When a bucket is hosted by an S3 compliant backend (such as, e.g., minio), we may want to specify an alternative S3 endpoint, so that AIS nodes use it when reading, writing, listing, and generally, performing all operations on remote S3 bucket(s).
Globally, S3 endpoint can be overridden for all S3 buckets via “S3_ENDPOINT” environment. If you decide to make the change, you may need to restart AIS cluster while making sure that “S3_ENDPOINT” is available for the AIS nodes when they are starting up.
But it can be also be done - and will take precedence over the global setting - on a per-bucket basis.
Here are some examples:
Global
export S3_ENDPOINT=...override is static and readonly. Use it with extreme caution as it applies to all buckets.
On the other hand, for any given
s3://bucketits S3 endpoint can be set, unset, and otherwise changed at any time - at runtime. As shown above.
Connect/Disconnect AIS bucket to/from cloud bucket
Set backend bucket for AIS bucket bucket_name to the GCP cloud bucket cloud_bucket.
Once the backend bucket is set, operations (get, put, list, etc.) with ais://bucket_name will be exactly as we would do with gcp://cloud_bucket.
It’s like a symlink to a cloud bucket.
The only difference is that all objects will be cached into ais://bucket_name (and reflected in the cloud as well) instead of gcp://cloud_bucket.
To disconnect cloud bucket do:
Ignore non-critical errors
To create an erasure-encoded bucket or enable EC for an existing bucket, AIS requires at least ec.data_slices + ec.parity_slices + 1 targets.
At the same time, for small objects (size is less than ec.objsize_limit) it is sufficient to have only ec.parity_slices + 1 targets.
Option --force allows creating erasure-encoded buckets when the number of targets is not enough but the number exceeds ec.parity_slices.
Note that if the number of targets is less than ec.data_slices + ec.parity_slices + 1, the cluster accepts only objects smaller than ec.objsize_limit.
Bigger objects are rejected on PUT.
In examples a cluster with 6 targets is used:
Once erasure encoding is enabled for a bucket, the number of data and parity slices cannot be modified.
The minimum object size ec.objsize_limit can be changed on the fly.
To avoid accidental modification when EC for a bucket is enabled, the option --force must be used.
Set bucket properties with JSON
Set all bucket properties for bucket_name bucket based on the provided JSON specification.
If not all properties are mentioned in the JSON, the missing ones are set to zero values (empty / false / nil):
Archive multiple objects
ais archive bucket - Archive selected or matching objects from SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] as (.tar, .tgz or .tar.gz, .zip, .tar.lz4)-formatted object (a.k.a. shard).
See also:
Show and set AWS-specific properties
AIStore supports AWS-specific configuration on a per s3 bucket basis. Any bucket that is backed up by an AWS S3 bucket (**) can be configured to use alternative:
- named AWS profiles (with alternative credentials and/or region)
- alternative s3 endpoints
For background and usage examples, please see AWS-specific bucket configuration.
(**) Terminology-wise, “s3 bucket” is a shortcut phrase indicating a bucket in an AIS cluster that either (A) has the same name (e.g.
s3://abc) or (B) a differently named AIS bucket that hasbackend_bckproperty that specifies the s3 bucket in question.
Reset bucket properties to cluster defaults
ais bucket props reset BUCKET
Reset bucket properties to cluster defaults.
Examples
Show bucket metadata
ais show cluster bmd
Show bucket metadata (BMD).