Bucket operations

View as Markdown

Background and Introduction

A bucket is a named container for objects - monolithic files or chunked representations - with associated metadata. It is the fundamental unit of data organization and data management.

AIS buckets are categorized by their provider and origin. Native ais:// buckets managed by this cluster are always created explicitly (via ais create or the respective Go and/or Python APIs).

Remote buckets (including s3://, gs://, etc., and ais:// buckets in remote AIS clusters) are usually discovered and auto-added on-the-fly on first access.

In a cluster, every bucket is assigned a unique, cluster-wide bucket ID (BID). Same-name remote buckets with different namespaces get different IDs. Every object a) belongs to exactly one bucket and b) is identified by a unique name within that bucket.

Bucket properties define data protection (checksums, mirroring, erasure coding), chunked representation, versioning and synchronization with remote sources, access control, backend linkage, feature flags, rate-limit settings, and more.

For types of supported buckets (AIS, Cloud, remote AIS, etc.), bucket identity, properties, lifecycle, and associated policies, storage services and usage examples, see the comprehensive:

It is easy to see all CLI operations on buckets:

1$ ais bucket <TAB-TAB>
2
3ls validate evict show cp etl rm
4summary lru prefetch create archive mv props

For convenience, a few of the most popular verbs are also aliased:

1$ ais alias | grep bucket
2cp bucket cp
3create bucket create
4evict bucket evict
5ls bucket ls
6rmb bucket rm

Table of Contents

Create bucket

ais create BUCKET [BUCKET...]

Create bucket(s).

1$ ais create --help
2NAME:
3 ais create - (alias for "bucket create") Create AIS buckets or explicitly attach remote buckets with non-default credentials/properties.
4 Normally, AIS auto-adds remote buckets on first access (ls/get/put): when a user references a new bucket,
5 AIS looks it up behind the scenes, confirms its existence and accessibility, and "on-the-fly" updates its
6 cluster-wide global (BMD) metadata containing bucket definitions, management policies, and properties.
7 Use this command when you need to:
8 1) create an ais:// bucket in this cluster;
9 2) create a bucket in a remote AIS cluster (e.g., 'ais://@remais/BUCKET');
10 3) set up a cloud bucket with a custom profile and/or endpoint/region;
11 4) set bucket properties before first access;
12 5) attach multiple same-name cloud buckets under different namespaces (e.g., 's3://#ns1/bucket', 's3://#ns2/bucket');
13 6) and finally, register a cloud bucket that is not (yet) accessible (advanced-usage '--skip-lookup' option).
14 Examples:
15 - ais create ais://mybucket - create AIS bucket 'mybucket' (must be done explicitly);
16 - ais create ais://@remais/BUCKET - create a bucket in a remote AIS cluster referenced by the cluster's alias or UUID;
17 - ais create s3://mybucket - add existing cloud (S3) bucket; normally AIS would auto-add it on first access;
18 - ais create s3://mybucket --props='extra.aws.profile=prod extra.aws.multipart_size=333M' - add S3 bucket using a non-default cloud profile;
19 - ais create s3://#myaccount/mybucket --props='extra.aws.profile=swift extra.aws.endpoint=$S3_ENDPOINT' - attach S3-compatible bucket via namespace '#myaccount';
20 - ais create oc://#phx/mybucket --props='extra.oci.region=us-phoenix-1' - add OCI bucket using a non-default region and namespace '#phx';
21 - ais create s3://mybucket --skip-lookup --props='extra.aws.profile=...' - advanced: register bucket without verifying its existence/accessibility (use with care);
22 - ais create gs://mybucket --skip-lookup --props='extra.gcp.application_creds=/mnt/vault/sa.json' - GCS bucket with per-bucket service-account credentials.
23
24USAGE:
25 ais create BUCKET [BUCKET...] [command options]
26
27OPTIONS:
28 force,f Force execution of the command (caution: advanced usage only)
29 ignore-error Ignore "soft" failures such as "bucket already exists", etc.
30 props Create bucket with the specified (non-default) properties, e.g.:
31 * ais create ais://mmm --props="versioning.validate_warm_get=false versioning.synchronize=true"
32 * ais create ais://nnn --props='mirror.enabled=true mirror.copies=4 checksum.type=md5'
33 * ais create s3://bbb --props='extra.cloud.profile=prod extra.cloud.endpoint=https://s3.example.com'
34 Tips:
35 1) Use '--props' to override properties that a new bucket would normally inherit from cluster config at creation time.
36 2) Use '--props' to set up an existing cloud bucket with a custom profile and/or custom endpoint/region.
37 See also: 'ais bucket props show' and 'ais bucket props set'
38 skip-lookup Do not execute HEAD(bucket) request to lookup remote bucket and its properties; possible usage scenarios include:
39 1) adding remote bucket to aistore without first checking the bucket's accessibility
40 (e.g., to configure the bucket's aistore properties with alternative security profile and/or endpoint)
41 2) listing public-access Cloud buckets where certain operations (e.g., 'HEAD(bucket)') may be disallowed
42 help, h Show help

Examples

Create AIS bucket

Create buckets bucket_name1 and bucket_name2, both with AIS provider.

1$ ais create ais://bucket_name1 ais://bucket_name2
2"ais://bucket_name1" bucket created
3"ais://bucket_name2" bucket created

Create AIS bucket in local namespace

Create bucket bucket_name in ml namespace.

1$ ais create ais://#ml/bucket_name
2"ais://#ml/bucket_name" bucket created

Create bucket in remote AIS cluster

Create bucket bucket_name in global namespace of AIS remote cluster with Bghort1l UUID.

1$ ais create ais://@Bghort1l/bucket_name
2"ais://@Bghort1l/bucket_name" bucket created

Create bucket bucket_name in ml namespace of AIS remote cluster with Bghort1l UUID.

1$ ais create ais://@Bghort1l#ml/bucket_name
2"ais://@Bghort1l#ml/bucket_name" bucket created

Create bucket with custom properties

Create bucket bucket_name with custom properties specified.

1$ # Key-value format
2$ ais create ais://@Bghort1l/bucket_name --props="mirror.enabled=true mirror.copies=2"
3"ais://@Bghort1l/bucket_name" bucket created
4$
5$ # JSON format
6$ ais create ais://@Bghort1l/bucket_name --props='{"versioning": {"enabled": true, "validate_warm_get": true}}'
7"ais://@Bghort1l/bucket_name" bucket created

Incorrect buckets creation

1$ ais create aws://bucket_name
2Create bucket "aws://bucket_name" failed: creating a bucket for any of the cloud or HTTP providers is not supported

See also

Delete bucket

ais bucket rm BUCKET [BUCKET...]

Delete an ais bucket or buckets.

Examples

Remove AIS buckets

Remove AIS buckets bucket_name1 and bucket_name2.

1$ ais bucket rm ais://bucket_name1 ais://bucket_name2
2"ais://bucket_name1" bucket destroyed
3"ais://bucket_name2" bucket destroyed

Remove AIS bucket in local namespace

Remove bucket bucket_name from ml namespace.

1$ ais bucket rm ais://#ml/bucket_name
2"ais://#ml/bucket_name" bucket destroyed

Remove bucket in remote AIS cluster

Remove bucket bucket_name from global namespace of AIS remote cluster with Bghort1l UUID.

1$ ais bucket rm ais://@Bghort1l/bucket_name
2"ais://@Bghort1l/bucket_name" bucket destroyed

Remove bucket bucket_name from ml namespace of AIS remote cluster with Bghort1l UUID.

1$ ais bucket rm ais://@Bghort1l#ml/bucket_name
2"ais://@Bghort1l#ml/bucket_name" bucket destroyed

Incorrect buckets removal

Removing remote buckets is not supported.

1$ ais bucket rm aws://bucket_name
2Operation "destroy-bck" is not supported by "aws://bucket_name"

List buckets

ais ls PROVIDER:[//BUCKET_NAME] [command options]

Notice the optional [//BUCKET_NAME]. When there’s no bucket, ais ls will list buckets. Otherwise, it’ll list objects.

Usage

1$ ais ls --help
2NAME:
3 ais ls - (alias for "bucket ls") List buckets, objects in buckets, and files in (.tar, .tgz, .tar.gz, .zip, .tar.lz4)-formatted objects,
4 e.g.:
5 * ais ls - list all buckets in a cluster (all providers);
6 * ais ls ais://abc -props name,size,copies,location - list objects with only these specific properties;
7 * ais ls ais://abc -props all - list objects with all available properties;
8 * ais ls ais://abc --page-size 20 --refresh 3s - list large bucket (20 items per page), progress every 3s;
9 * ais ls ais://abc --page-size 20 --refresh 3 - same as above;
10 * ais ls ais - list all ais buckets;
11 * ais ls s3 - list all s3 buckets present in the cluster;
12 * ais ls s3 --all - list all s3 buckets (both in-cluster and remote).
13 list archive contents:
14 * ais ls ais://abc/sample.tar --archive - list files inside a tar archive;
15 list in pages (continues until '--max-pages', '--limit', Ctrl-C, or end of bucket):
16 * ais ls s3://abc --paged --limit 1234000 - limited paged output (1234 pages), with default properties;
17 * ais ls s3://abc --paged --limit 1234000 --nr - same as above, non-recursively (skips nested directories);
18 with template, regex, and/or prefix:
19 * ais ls gs: --regex "^abc" --all - list all accessible GCP buckets with names starting with "abc";
20 * ais ls ais://abc --regex "\.md$" --props size,checksum - list markdown files with size and checksum;
21 * ais ls gs://abc --template images/ - list all objects from virtual subdirectory "images";
22 * ais ls gs://abc --prefix images/ - same as above (for more examples, see '--template' below);
23 * ais ls gs://abc/images/ - same as above.
24 with in-cluster vs remote content comparison (diff):
25 * ais ls s3://abc --check-versions - for each remote object: check for identical in-cluster copy
26 → and show missing objects;
27 * ais ls s3://abc --check-versions --cached - for each in-cluster object: check for identical remote copy
28 → and show deleted objects.
29 with summary (bucket sizes and numbers of objects):
30 * ais ls ais://nnn --summary --prefix=aaa/bbb - summarize objects matching the given prefix;
31 * ais ls ais://nnn/aaa/bbb --summary - same as above;
32 * ais ls az://azure-bucket --count-only - fastest way to count objects in a bucket;
33 * ais ls s3 --summary - for each s3 bucket: print object count and total size;
34 * ais ls s3 --summary --all - summary report for all s3 buckets including remote/non-present;
35 * ais ls s3 --summary --all --dont-add - same, without adding non-present buckets to cluster metadata.
36...
37...

Assorted options

The options are numerous. Here’s a non-exhaustive list (for the most recent update, run ais ls --help)

1OPTIONS:
2 --all depending on the context:
3 - all objects in a given bucket, including misplaced and copies, or
4 - all buckets, including accessible (visible) remote buckets that are _not present_ in the cluster
5 --cached list only those objects from a remote bucket that are present ("cached")
6 --name-only faster request to retrieve only the names of objects (if defined, '--props' flag will be ignored)
7 --props value comma-separated list of object properties including name, size, version, copies and more; e.g.:
8 --props all
9 --props name,size,cached
10 --props "ec, copies, custom, location"
11 --regex value regular expression; use it to match either bucket names or objects in a given bucket, e.g.:
12 ais ls --regex "(m|n)" - match buckets such as ais://nnn, s3://mmm, etc.;
13 ais ls ais://nnn --regex "^A" - match object names starting with letter A
14 --summary show object numbers, bucket sizes, and used capacity; applies _only_ to buckets and objects that are _present_ in the cluster
15 --units value show statistics and/or parse command-line specified sizes using one of the following _units of measurement_:
16 iec - IEC format, e.g.: KiB, MiB, GiB (default)
17 si - SI (metric) format, e.g.: KB, MB, GB
18 raw - do not convert to (or from) human-readable format
19 --no-headers, -H display tables without headers
20 --no-footers display tables without footers

ais ls --regex "ngn*"

List all buckets matching the ngn* regex expression.

ais ls aws: or (same) ais ls s3

List all existing buckets for the specific provider.

ais ls aws --all or (same) ais ls s3: --all

List absolutely all buckets that cluster can “see” including those that are not necessarily present in the cluster.

ais ls ais:// or (same) ais ls ais

List all AIS buckets.

ais ls ais://#name

List all buckets for the ais provider and name namespace.

ais ls ais://@uuid#namespace

List all remote AIS buckets that have uuid#namespace namespace. Note that:

  • the uuid must be the remote cluster UUID (or its alias)
  • while the namespace is optional name of the remote namespace

As a rule of thumb, when a (logical) #namespace in the bucket’s name is omitted we use the global namespace that always exists.

List objects

ais ls is one of those commands that only keeps growing, in terms of supported options and capabilities.

The command:

ais ls PROVIDER:[//BUCKET_NAME] [command options]

can conveniently list buckets (with or without “summarizing” object counts and sizes) and objects.

Notice the optional [//BUCKET_NAME]. When there’s no bucket, ais ls will list buckets. Otherwise, it’ll list objects.

The command’s inline help is also quite extensive, with (inline) examples followed by numerous supported options:

1$ ais ls --help
2NAME:
3 ais ls - (alias for "bucket ls") List buckets, objects in buckets, and files in (.tar, .tgz, .tar.gz, .zip, .tar.lz4)-formatted objects,
4 e.g.:
5 * ais ls - list all buckets in a cluster (all providers);
6 * ais ls ais://abc -props name,size,copies,location - list objects with only these specific properties;
7 * ais ls ais://abc -props all - list objects with all available properties;
8 * ais ls ais://abc --page-size 20 --refresh 3s - list large bucket (20 items per page), progress every 3s;
9 * ais ls ais://abc --page-size 20 --refresh 3 - same as above;
10 * ais ls ais - list all ais buckets;
11 * ais ls s3 - list all s3 buckets present in the cluster;
12 * ais ls s3 --all - list all s3 buckets (both in-cluster and remote).
13 list archive contents:
14 * ais ls ais://abc/sample.tar --archive - list files inside a tar archive;
15 list in pages (continues until '--max-pages', '--limit', Ctrl-C, or end of bucket):
16 * ais ls s3://abc --paged --limit 1234000 - limited paged output (1234 pages), with default properties;
17 * ais ls s3://abc --paged --limit 1234000 --nr - same as above, non-recursively (skips nested directories);
18 with template, regex, and/or prefix:
19 * ais ls gs: --regex "^abc" --all - list all accessible GCP buckets with names starting with "abc";
20 * ais ls ais://abc --regex "\.md$" --props size,checksum - list markdown files with size and checksum;
21 * ais ls gs://abc --template images/ - list all objects from virtual subdirectory "images";
22 * ais ls gs://abc --prefix images/ - same as above (for more examples, see '--template' below);
23 * ais ls gs://abc/images/ - same as above.
24 with in-cluster vs remote content comparison (diff):
25 * ais ls s3://abc --check-versions - for each remote object: check for identical in-cluster copy
26 → and show missing objects;
27 * ais ls s3://abc --check-versions --cached - for each in-cluster object: check for identical remote copy
28 → and show deleted objects.
29 with summary (bucket sizes and numbers of objects):
30 * ais ls ais://nnn --summary --prefix=aaa/bbb - summarize objects matching the given prefix;
31 * ais ls ais://nnn/aaa/bbb --summary - same as above;
32 * ais ls az://azure-bucket --count-only - fastest way to count objects in a bucket;
33 * ais ls s3 --summary - for each s3 bucket: print object count and total size;
34 * ais ls s3 --summary --all - summary report for all s3 buckets including remote/non-present;
35 * ais ls s3 --summary --all --dont-add - same, without adding non-present buckets to cluster metadata.
36
37USAGE:
38 ais ls [BUCKET[/PREFIX]] [PROVIDER] [command options]
39
40OPTIONS:
41 --all Depending on the context, list:
42 - all buckets, including accessible (visible) remote buckets that are not in-cluster
43 - all objects in a given accessible (visible) bucket, including remote objects and misplaced copies
44 --archive List archived content (see docs/archive.md for details)
45 --cached Only list in-cluster objects, i.e., objects from the respective remote bucket that are present ("cached") in the cluster
46 --count-only Print only the resulting number of listed objects and elapsed time
47 --diff Perform a bidirectional diff between in-cluster and remote content, which further entails:
48 - detecting remote version changes (a.k.a. out-of-band updates), and
49 - remotely deleted objects (out-of-band deletions (*));
50 the option requires remote backends supporting some form of versioning (e.g., object version, checksum, and/or ETag);
51 see related:
52 (*) options: --cached; --latest
53 commands: 'ais get --latest'; 'ais cp --sync'; 'ais prefetch --latest'
54 --dont-add List remote bucket without adding it to cluster's metadata - e.g.:
55 - let's say, s3://abc is accessible but not present in the cluster (e.g., 'ais ls' returns error);
56 - then, if we ask aistore to list remote buckets: `ais ls s3://abc --all'
57 the bucket will be added (in effect, it'll be created);
58 - to prevent this from happening, either use this '--dont-add' flag or run 'ais evict' command later
59 --dont-wait When _summarizing_ buckets do not wait for the respective job to finish -
60 use the job's UUID to query the results interactively
61 --inv-id value Bucket inventory ID (optional; by default, we use bucket name as the bucket's inventory ID)
62 --inv-name value Bucket inventory name (optional; system default name is '.inventory')
63 --inventory List objects using _bucket inventory_ (docs/s3compat.md); requires s3:// backend; will provide significant performance
64 boost when used with very large s3 buckets; e.g. usage:
65 1) 'ais ls s3://abc --inventory'
66 2) 'ais ls s3://abc --inventory --paged --prefix=subdir/'
67 (see also: docs/s3compat.md)
68 --limit value The maximum number of objects to list, get, or otherwise handle (0 - unlimited; see also '--max-pages'),
69 e.g.:
70 - 'ais ls gs://abc/dir --limit 1234 --cached --props size,custom,atime' - list no more than 1234 objects
71 - 'ais get gs://abc /dev/null --prefix dir --limit 1234' - get --/--
72 - 'ais scrub gs://abc/dir --limit 1234' - scrub --/-- (default: 0)
73 --max-pages value Maximum number of pages to display (see also '--page-size' and '--limit')
74 e.g.: 'ais ls az://abc --paged --page-size 123 --max-pages 7 (default: 0)
75 --name-only Faster request to retrieve only the names of objects (if defined, '--props' flag will be ignored)
76 --no-dirs Do not return virtual subdirectories (applies to remote buckets only)
77 --no-footers, -F Display tables without footers
78 --no-headers, -H Display tables without headers
79 --non-recursive, --nr Non-recursive operation, e.g.:
80 - 'ais ls gs://bucket/prefix --nr' - list objects and/or virtual subdirectories with names starting with the specified prefix;
81 - 'ais ls gs://bucket/prefix/ --nr' - list contained objects and/or immediately nested virtual subdirectories _without_ recursing into the latter;
82 - 'ais prefetch s3://bck/abcd --nr' - prefetch a single named object (see 'ais prefetch --help' for details);
83 - 'ais rmo gs://bucket/prefix --nr' - remove a single object with the specified name (see 'ais rmo --help' for details)
84 --page-size value Maximum number of object names per page; when the flag is omitted or 0 (zero)
85 the maximum is defined by the corresponding backend; see also '--max-pages' and '--paged' (default: 0)
86 --paged List objects page by page - one page at a time (see also '--page-size' and '--limit')
87 note: recommended for use with very large buckets
88 --prefix value List objects with names starting with the specified prefix, e.g.:
89 '--prefix a/b/c' - list virtual directory a/b/c and/or objects from the virtual directory
90 a/b that have their names (relative to this directory) starting with the letter 'c'
91 --props value Comma-separated list of object properties including name, size, version, copies, and more; e.g.:
92 --props all
93 --props name,size,cached
94 --props "ec, copies, custom, location"
95 --refresh value Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
96 valid time units: ns, us (or µs), ms, s (default), m, h
97 --regex value Regular expression; use it to match either bucket names or objects in a given bucket, e.g.:
98 ais ls --regex "(m|n)" - match buckets such as ais://nnn, s3://mmm, etc.;
99 ais ls ais://nnn --regex "^A" - match object names starting with letter A
100 --show-unmatched List also objects that were not matched by regex and/or template (range)
101 --silent Server-side flag, an indication for aistore _not_ to log assorted errors (e.g., HEAD(object) failures)
102 --skip-lookup Do not execute HEAD(bucket) request to lookup remote bucket and its properties; possible usage scenarios include:
103 1) adding remote bucket to aistore without first checking the bucket's accessibility
104 (e.g., to configure the bucket's aistore properties with alternative security profile and/or endpoint)
105 2) listing public-access Cloud buckets where certain operations (e.g., 'HEAD(bucket)') may be disallowed
106 --start-after value List bucket's content alphabetically starting with the first name _after_ the specified
107 --summary Show object numbers, bucket sizes, and used capacity;
108 note: applies only to buckets and objects that are _present_ in the cluster
109 --template value Template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
110 (with optional steps and gaps), e.g.:
111 --template "" # (an empty or '*' template matches everything)
112 --template 'dir/subdir/'
113 --template 'shard-{1000..9999}.tar'
114 --template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
115 and similarly, when specifying files and directories:
116 --template '/home/dir/subdir/'
117 --template "/abc/prefix-{0010..9999..2}-suffix"
118 --units value Show statistics and/or parse command-line specified sizes using one of the following units of measurement:
119 iec - IEC format, e.g.: KiB, MiB, GiB (default)
120 si - SI (metric) format, e.g.: KB, MB, GB
121 raw - do not convert to (or from) human-readable format
122 --help, -h Show help

Assorted options

NameTypeDescriptionDefault
--regexstringregular expression to match and select items in question""
--templatestringtemplate for matching object names, e.g.: ‘shard-{900..999}.tar’""
--prefixstringlist objects matching a given prefix""
--page-sizeintmaximum number of names per page (0 - the maximum is defined by the corresponding backend)0
--propsstringcomma-separated list of object properties including name, size, version, copies, EC data and parity info, custom metadata, location and more; to include all properties, type ‘—props all’ (default: “name,size”)"name,size"
--limitintlimit object name count (0 - unlimited)0
--show-unmatchedboollist objects that were not matched by regex and/or templatefalse
--allbooldepending on context: all objects (including misplaced ones and copies) or all buckets (including remote buckets that are not present in the cluster)false
-no-headers, -Hbooldisplay tables without headersfalse
—no-footersbooldisplay tables without footersfalse
--pagedboollist objects page by page, one page at a time (see also ‘—page-size’ and ‘—limit’)false
--max-pagesintdisplay up to this number pages of bucket objects (default: 0)0
--markerstringlist bucket’s content alphabetically starting with the first name after the specified""
--start-afterstringObject name (marker) after which the listing should start""
--cachedboollist only those objects from a remote bucket that are present (“cached”)false
--skip-lookupboollist public-access Cloud buckets that may disallow certain operations (e.g., HEAD(bucket)); use this option for performance or to read Cloud buckets that allow anonymous accessfalse
--archiveboollist archived contentfalse
--check-versionsboolcheck whether listed remote objects and their in-cluster copies are identical, ie., have the same versions; applies to remote backends that maintain at least some form of versioning information (e.g., version, checksum, ETag)false
--summaryboolshow bucket sizes and used capacity; by default, applies only to the buckets that are present in the cluster (use ‘—all’ option to override)false
--bytesboolshow sizes in bytes (ie., do not convert to KiB, MiB, GiB, etc.)false
--name-onlyboolfast request to retrieve only the names of objects in the bucket; if defined, all comma-separated fields in the --props flag will be ignored with only two exceptions: name and statusfalse

When listing objects, a footer will be displayed showing:

  • Total number of objects listed
  • For remote buckets with --cached option: number of objects present in-cluster
  • For --paged option: current page number
  • For --count-only option: time elapsed to fetch the list

Examples of footer variations:

  • Listed 12345 names
  • Listed 12345 names (in-cluster: 456)
  • Page 123: 1000 names (in-cluster: none)

Examples

List AIS and Cloud buckets with all defaults

1. List objects in the AIS bucket bucket_name.

1$ ais ls ais://bucket_name
2NAME SIZE
3shard-0.tar 16.00KiB
4shard-1.tar 16.00KiB
5...

2. List objects in the remote bucket bucket_name.

1ais ls aws://bucket_name
2NAME SIZE
3shard-0.tar 16.00KiB
4shard-1.tar 16.00KiB
5...

3. List objects from a remote AIS cluster with a namespace:

$ ais ls ais://@Bghort1l#ml/bucket_name
NAME SIZE VERSION
shard-0.tar 16.00KiB 1
shard-1.tar 16.00KiB 1
...

4. List objects with paged output (showing page numbers):

$ ais ls ais://bucket_name --paged --limit 100
[... object listing ...]
Page 1: 100 names

5. List cached objects from a remote bucket:

$ ais ls s3://bucket_name --cached
[... listing of only in-cluster objects ...]
Listed 456789 names

6. Count objects in a bucket:

$ ais ls s3://bucket_name/aprefix --count-only
Listed 28,230 names in 5.62s

7. Count objects with paged output:

$ ais ls s3://bucket_name/bprefix --count-only --paged
Page 1: 1,000 names in 772ms
Page 2: 1,000 names in 180ms
Page 3: 1,000 names in 265ms
...
Page 29: 230 names in 130ms

Notes:

  • When using --paged with remote buckets, the footer will show both page number and in-cluster object count when applicable
  • The --diff option requires remote backends supporting some form of versioning (e.g., object version, checksum, and/or ETag)
  • For more information on working with archived content, see docs/archive.md
  • To fully synchronize in-cluster content with remote backend, see documentation on out-of-band updates

Include all properties

1# ais ls gs://webdataset-abc --skip-lookup --props all
2NAME SIZE CHECKSUM ATIME VERSION CACHED TARGET URL STATUS COPIES
3coco-train2014-seg-000000.tar 958.48MiB bdb89d1b854040b6050319e80ef44dde 1657297128665686 no http://aistore:8081 ok 0
4coco-train2014-seg-000001.tar 958.47MiB 8b94939b7d166114498e794859fb472c 1657297129387272 no http://aistore:8081 ok 0
5coco-train2014-seg-000002.tar 958.47MiB 142a8e81f965f9bcafc8b04eda65a0ce 1657297129904067 no http://aistore:8081 ok 0
6coco-train2014-seg-000003.tar 958.22MiB 113024d5def81365cbb6c404c908efb1 1657297130555590 no http://aistore:8081 ok 0
7...

List bucket from AIS remote cluster

List objects in the bucket bucket_name and ml namespace contained on AIS remote cluster with Bghort1l UUID.

1$ ais ls ais://@Bghort1l#ml/bucket_name
2NAME SIZE VERSION
3shard-0.tar 16.00KiB 1
4shard-1.tar 16.00KiB 1
5...

With prefix

List objects which match given prefix.

1$ ais ls ais://bucket_name --prefix "shard-1"
2NAME SIZE VERSION
3shard-1.tar 16.00KiB 1
4shard-10.tar 16.00KiB 1

Bucket inventory

Here’s a quick 4-steps sequence to demonstrate the functionality:

1. In the beginning, the bucket is accessible (notice --all) and empty, as far as its in-cluster content

1$ ais ls s3://abc --cached --all
2NAME SIZE

2. The first (remote) list-objects will have the side-effect of loading remote inventory

1$ ais ls s3://abc --inventory --count-only
2Note: listing remote objects in s3://abc may take a while
3(Tip: use '--cached' to speed up and/or '--paged' to show pages)
4
5Listed 2,319,231 names in 23.91s

3. The second and later list-objects will run much faster

1$ ais ls s3://abc --inventory --count-only
2Listed 2,319,231 names in 4.18s

4. Finally, observe that at in-cluster content now includes the inventory (.csv) itself

1$ ais ls s3://abc --cached
2NAME SIZE
3.inventory/ais-vm.csv 143.61MiB

List archived content

1$ ais ls ais://abc/ --prefix log
2NAME SIZE
3log.tar.gz 3.11KiB
4
5$ ais ls ais://abc/ --prefix log --archive
6NAME SIZE
7log.tar.gz 3.11KiB
8 log2.tar.gz/t_2021-07-27_14-08-50.log 959B
9 log2.tar.gz/t_2021-07-27_14-10-36.log 959B
10 log2.tar.gz/t_2021-07-27_14-12-18.log 959B
11 log2.tar.gz/t_2021-07-27_14-13-23.log 295B
12 log2.tar.gz/t_2021-07-27_14-13-31.log 1.02KiB
13 log2.tar.gz/t_2021-07-27_14-14-16.log 1.71KiB
14 log2.tar.gz/t_2021-07-27_14-15-15.log 1.90KiB

List anonymously (i.e., list public-access Cloud bucket)

1$ ais ls gs://webdataset-abc --skip-lookup
2NAME SIZE
3coco-train2014-seg-000000.tar 958.48MiB
4coco-train2014-seg-000001.tar 958.47MiB
5coco-train2014-seg-000002.tar 958.47MiB
6coco-train2014-seg-000003.tar 958.22MiB
7coco-train2014-seg-000004.tar 958.56MiB
8coco-train2014-seg-000005.tar 958.19MiB
9...

Use ‘—prefix’ that crosses shard boundary

For starters, we archive all aistore docs:

1$ ais put docs ais://A.tar --archive -r

To list a certain virtual subdirectory inside this newly created shard:

1$ ais archive ls ais://nnn --prefix "A.tar/tutorials"
2NAME SIZE
3 A.tar/tutorials/README.md 561B
4 A.tar/tutorials/etl/compute_md5.md 8.28KiB
5 A.tar/tutorials/etl/etl_imagenet_pytorch.md 4.16KiB
6 A.tar/tutorials/etl/etl_webdataset.md 3.97KiB
7Listed: 4 names

or, same:

1$ ais ls ais://nnn --prefix "A.tar/tutorials" --archive
2NAME SIZE
3 A.tar/tutorials/README.md 561B
4 A.tar/tutorials/etl/compute_md5.md 8.28KiB
5 A.tar/tutorials/etl/etl_imagenet_pytorch.md 4.16KiB
6 A.tar/tutorials/etl/etl_webdataset.md 3.97KiB
7Listed: 4 names

Evict remote bucket

AIS supports multiple storage backends:

TypeDescriptionExample Name
AIS BucketNative bucket managed by AISais://mybucket
Remote AIS BucketBucket in a remote AIS clusterais://@cluster/mybucket
Cloud BucketRemote bucket (e.g., S3, GCS, Azure)s3://dataset
Backend BucketAIS bucket linked to a remote bucketais://cachebucket → s3://x

See Unified Namespace for details on remote AIS clusters.

One major distinction between an AIS bucket (e.g., ais://mybucket) and a remote bucket (e.g., ais://@cluster/mybucket, s3://dataset, etc.) boils down to the fact that - for a variety of real-life reasons - in-cluster content of the remote bucket may be different from its remote content.

Note that the terms in-cluster and cached are used interchangeably throughout the entire documentation and CLI.

Remote buckets can be prefetched and evicted from AIS, entirely or selectively:

Some of the supported functionality can be quickly demonstrated with the following examples:

1$ ais bucket evict aws://abc
2"aws://abc" bucket evicted
3
4# Dry run: the cluster will not be modified
5$ ais bucket evict --dry-run aws://abc
6[DRY RUN] No modifications on the cluster
7EVICT: "aws://abc"
8
9# Only evict the remote bucket's data (AIS will retain the bucket's metadata)
10$ ais bucket evict --keep-md aws://abc
11"aws://abc" bucket evicted

Here’s a more complete example that lists remote bucket, then reads and evicts a given object:

1$ ais ls gs://wrQkliptRt
2NAME SIZE
3TDXBNBEZNl.tar 8.50KiB
4qFpwOOifUe.tar 8.50KiB
5thmdpZXetG.tar 8.50KiB
6
7$ ais get gcp://wrQkliptRt/qFpwOOifUe.tar /tmp/qFpwOOifUe.tar
8GET "qFpwOOifUe.tar" from bucket "gcp://wrQkliptRt" as "/tmp/qFpwOOifUe.tar" [8.50KiB]
9
10$ ais ls gs://wrQkliptRt --props all
11NAME SIZE CHECKSUM ATIME VERSION CACHED STATUS COPIES
12TDXBNBEZNl.tar 8.50KiB 33345a69bade096a30abd42058da4537 1622133976984266 no ok 0
13qFpwOOifUe.tar 8.50KiB 47dd59e41f6b7723 28 May 21 12:02 PDT 1622133846120151 yes ok 1
14thmdpZXetG.tar 8.50KiB cfe0c386e91daa1571d6a659f49b1408 1622137609269706 no ok 0
15
16$ ais bucket evict gcp://wrQkliptRt
17"gcp://wrQkliptRt" bucket evicted
18
19$ ais ls gs://wrQkliptRt --props all
20NAME SIZE CHECKSUM ATIME VERSION CACHED STATUS COPIES
21TDXBNBEZNl.tar 8.50KiB 33345a69bade096a30abd42058da4537 1622133976984266 no ok 0
22qFpwOOifUe.tar 8.50KiB 8b5919c0850a07d931c3c46ed9101eab 1622133846120151 no ok 0
23thmdpZXetG.tar 8.50KiB cfe0c386e91daa1571d6a659f49b1408 1622137609269706 no ok 0

See also

Move or Rename a bucket

ais bucket mv BUCKET NEW_BUCKET

Move (ie. rename) an AIS bucket. If the NEW_BUCKET already exists, the mv operation will not proceed.

Cloud bucket move is not supported.

Examples

Move AIS bucket

Move AIS bucket bucket_name to AIS bucket new_bucket_name.

1$ ais bucket mv ais://bucket_name ais://new_bucket_name
2Moving bucket "ais://bucket_name" to "ais://new_bucket_name" in progress.
3To check the status, run: ais show job xaction mvlb ais://new_bucket_name

Copy (list, range, and/or prefix) selected objects or entire (in-cluster or remote) buckets

ais cp SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] DST_BUCKET [command options]

1$ ais cp --help
2NAME:
3 ais cp - (alias for "bucket cp") Copy entire bucket or selected objects (to select, use '--list', '--template', or '--prefix'),
4 e.g.:
5 - 'ais cp gs://webdataset-coco ais://dst' - copy entire Cloud bucket;
6 - 'ais cp s3://abc ais://nnn --all' - copy Cloud bucket that may _not_ be present in cluster (and create destination if doesn't exist);
7 - 'ais cp s3://abc ais://nnn --all --num-workers 16' - same as above employing 16 concurrent workers;
8 - 'ais cp s3://abc ais://nnn --all --num-workers 16 --prefix dir/subdir/' - same as above, but limit copying to a given virtual subdirectory;
9 - 'ais cp s3://abc gs://xyz --all' - copy Cloud bucket to another Cloud.
10 similar to prefetch:
11 - 'ais cp s3://data s3://data --all' - copy remote source (and create namesake destination in-cluster bucket if doesn't exist).
12 synchronize with out-of-band updates:
13 - 'ais cp s3://abc ais://nnn --latest' - copy Cloud bucket; make sure that already present in-cluster copies are updated to the latest versions;
14 - 'ais cp s3://abc ais://nnn --sync' - same as above, but in addition delete in-cluster copies that do not exist (any longer) in the remote source.
15 with template, prefix, and progress:
16 - 'ais cp s3://abc ais://nnn --prepend backup/' - copy objects into 'backup/' virtual subdirectory in destination bucket;
17 - 'ais cp ais://nnn/111 ais://mmm' - copy all ais://nnn objects that match prefix '111';
18 - 'ais cp gs://webdataset-coco ais:/dst --template d-tokens/shard-{000000..000999}.tar.lz4' - copy up to 1000 objects that share the specified prefix;
19 - 'ais cp gs://webdataset-coco ais:/dst --prefix d-tokens/ --progress --all' - show progress while copying virtual subdirectory 'd-tokens';
20 - 'ais cp gs://webdataset-coco/d-tokens/ ais:/dst --progress --all' - same as above;
21 - 'ais cp s3://abc/dir/ ais://dst --nr' - copy only immediate contents of 'dir/' (non-recursive).
22
23USAGE:
24 ais cp SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] DST_BUCKET [command options]
25
26OPTIONS:
27 --all Copy all objects from a remote bucket including those that are not present (not cached) in cluster
28 --cont-on-err Keep running archiving xaction (job) in presence of errors in a any given multi-object transaction
29 --dry-run Show total size of new objects without really creating them
30 --force, -f Force execution of the command (caution: advanced usage only)
31 --latest Check in-cluster metadata and, possibly, GET, download, prefetch, or otherwise copy the latest object version
32 from the associated remote bucket;
33 the option provides operation-level control over object versioning (and version synchronization)
34 without the need to change the corresponding bucket configuration: 'versioning.validate_warm_get';
35 see also:
36 - 'ais show bucket BUCKET versioning'
37 - 'ais bucket props set BUCKET versioning'
38 - 'ais ls --check-versions'
39 supported commands include:
40 - 'ais cp', 'ais prefetch', 'ais get'
41 --list value Comma-separated list of object or file names, e.g.:
42 --list 'o1,o2,o3'
43 --list "abc/1.tar, abc/1.cls, abc/1.jpeg"
44 or, when listing files and/or directories:
45 --list "/home/docs, /home/abc/1.tar, /home/abc/1.jpeg"
46 --non-recursive, --nr Non-recursive operation, e.g.:
47 - 'ais ls gs://bucket/prefix --nr' - list objects and/or virtual subdirectories with names starting with the specified prefix;
48 - 'ais ls gs://bucket/prefix/ --nr' - list contained objects and/or immediately nested virtual subdirectories _without_ recursing into the latter;
49 - 'ais prefetch s3://bck/abcd --nr' - prefetch a single named object (see 'ais prefetch --help' for details);
50 - 'ais rmo gs://bucket/prefix --nr' - remove a single object with the specified name (see 'ais rmo --help' for details)
51 --non-verbose, --nv Non-verbose (quiet) output, minimized reporting, fewer warnings
52 --num-workers value Number of concurrent workers (readers); defaults to a number of target mountpaths if omitted or zero;
53 use (-1) to indicate single-threaded serial execution (ie., no workers);
54 any positive value will be adjusted _not_ to exceed the number of target CPUs (default: 0)
55 --prefix value Select virtual directories or objects with names starting with the specified prefix, e.g.:
56 '--prefix a/b/c' - matches names 'a/b/c/d', 'a/b/cdef', and similar;
57 '--prefix a/b/c/' - only matches objects from the virtual directory a/b/c/
58 --prepend value Prefix to prepend to every object name during operation (copy or transform), e.g.:
59 --prepend=abc - prefix all object names with "abc"
60 --prepend=abc/ - use "abc" as a virtual directory (note trailing filepath separator)
61 - during 'copy', this flag applies to copied objects
62 - during 'transform', this flag applies to transformed objects
63 --progress Show progress bar(s) and progress of execution in real time
64 --refresh value Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
65 valid time units: ns, us (or µs), ms, s (default), m, h
66 --sync Fully synchronize in-cluster content of a given remote bucket with its (Cloud or remote AIS) source;
67 the option is, effectively, a stronger variant of the '--latest' (option):
68 in addition to bringing existing in-cluster objects in-sync with their respective out-of-band updates (if any)
69 it also entails removing in-cluster objects that are no longer present remotely;
70 like '--latest', this option provides operation-level control over synchronization
71 without requiring to change the corresponding bucket configuration: 'versioning.synchronize';
72 see also:
73 - 'ais show bucket BUCKET versioning'
74 - 'ais bucket props set BUCKET versioning'
75 - 'ais ls --check-versions'
76 --template value Template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
77 (with optional steps and gaps), e.g.:
78 --template "" # (an empty or '*' template matches everything)
79 --template 'dir/subdir/'
80 --template 'shard-{1000..9999}.tar'
81 --template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
82 and similarly, when specifying files and directories:
83 --template '/home/dir/subdir/'
84 --template "/abc/prefix-{0010..9999..2}-suffix"
85 --timeout value Maximum time to wait for a job to finish; if omitted: wait forever or until Ctrl-C;
86 valid time units: ns, us (or µs), ms, s (default), m, h
87 --wait Wait for an asynchronous operation to finish (optionally, use '--timeout' to limit the waiting time)
88 --help, -h Show help

Source bucket must exist. When the destination bucket is remote (e.g. in the Cloud) it must also exist and be writeable.

NOTE: there’s no requirement that either of the buckets is present in aistore.

NOTE: not to confuse in-cluster presence and existence. Remote object may exist (remotely), etc.

NOTE: to fully synchronize in-cluster content with remote backend, please refer to out of band updates.

Moreover, when the destination is AIS (ais://) or remote AIS (ais://@remote-alias) bucket, the existence is optional: the destination will be created on the fly, with bucket properties copied from the source (SRC_BUCKET).

NOTE: similar to delete, evict and prefetch operations, cp also supports embedded prefix - see disambiguating multi-object operation

Finally, the option to copy remote bucket onto itself is also supported - syntax-wise. Here’s an example that’ll shed some light:

1## 1. at first, we don't have any gs:// buckets in the cluster
2
3$ ais ls gs
4No "gs://" buckets in the cluster. Use '--all' option to list matching remote buckets, if any.
5
6## 2. notwithstanding, we go ahead and start copying gs://coco-dataset
7
8$ ais cp gs://coco-dataset gs://coco-dataset --prefix d-tokens --progress --all
9Copied objects: 282/393 [===========================================>------------------] 72 %
10Copied size: 719.48 MiB / 1000.08 MiB [============================================>-----------------] 72 %
11
12## 3. and done: all 393 objects from the remote bucket are now present ("cached") in the cluster
13
14$ ais ls gs://coco-dataset --cached | grep Listed
15Listed: 393 names

Incidentally, notice the --cached difference:

1$ ais ls gs://coco-dataset --cached | grep Listed
2Listed: 393 names
3
4## vs _all_ including remote:
5
6$ ais ls gs://coco-dataset | grep Listed
7Listed: 2,290 names

Examples

Copy non-existing remote bucket to a non-existing in-cluster destination

1$ ais ls s3
2No "s3://" buckets in the cluster. Use '--all' option to list matching remote buckets, if any.
3
4$ ais cp s3://abc ais://nnn --all
5Warning: destination ais://nnn doesn't exist and will be created with configuration copied from the source (s3://abc))
6Copying s3://abc => ais://nnn. To monitor the progress, run 'ais show job tco-JcTKbhvFy'

Copy AIS bucket

Copy AIS bucket src_bucket to AIS bucket dst_bucket.

1$ ais cp ais://src_bucket ais://dst_bucket
2Copying bucket "ais://bucket_name" to "ais://dst_bucket" in progress.
3To check the status, run: ais show job xaction copy-bck ais://dst_bucket

Copy AIS bucket and wait until the job finishes

The same as above, but wait until copying is finished.

1$ ais cp ais://src_bucket ais://dst_bucket --wait

Copy cloud bucket to another cloud bucket

Copy AWS bucket src_bucket to AWS bucket dst_bucket.

1# Make sure that both buckets exist.
2$ ais ls aws://
3AWS Buckets (2)
4 aws://src_bucket
5 aws://dst_bucket
6$ ais cp aws://src_bucket aws://dst_bucket
7Copying bucket "aws://src_bucket" to "aws://dst_bucket" in progress.
8To check the status, run: ais show job xaction copy-bck aws://dst_bucket

Use (list, range, and/or prefix) options to copy selected objects

Example 1. Copy objects obj1.tar and obj1.info from bucket ais://bck1 to ais://bck2, and wait until the operation finishes

1$ ais cp ais://bck1 ais://bck2 --list obj1.tar,obj1.info --wait
2copying objects operation ("ais://bck1" => "ais://bck2") is in progress...
3copying objects operation succeeded.

Example 2. Copy objects matching Bash brace-expansion obj{2..4}, do not wait for the operation is done.

1$ ais cp ais://bck1 ais://bck2 --template "obj{2..4}"
2copying objects operation ("ais://bck1" => "ais://bck2") is in progress...
3To check the status, run: ais show job xaction copy-bck ais://bck2

Example 3. Use --sync option to copy remote virtual subdirectory

1$ ais cp gs://coco-dataset --sync --prefix d-tokens
2Copying objects gs://coco-dataset. To monitor the progress, run 'ais show job tco-kJPUtYJld'

In the example, --sync synchronizes destination bucket with its remote (e.g., Cloud) source.

In particular, the option will make sure that aistore has the latest versions of remote objects and may also entail removing of the objects that no longer exist remotely

See also

Example copying buckets

This example demonstrates how to copy objects between buckets using the AIStore CLI, and how to monitor the progress of the copy operation. AIStore supports all possible permutations of copying: Cloud to AIStore, Cloud to another (or same) Cloud, AIStore to Cloud, and between AIStore buckets.

To copy all objects with a common prefix from an S3 bucket to an AIStore bucket:

1$ ais cp s3://src-bucket/a ais://dst-bucket --all
2
3Warning: destination ais://dst-bucket doesn't exist and will be created with configuration copied from the source (s3://src-bucket))
4Copying objects s3://src-bucket => ais://dst-bucket. To monitor the progress, run 'ais show job tco-goDbhCxtf'

Note: The “Warning” message is benign and will only appear if the destination bucket does not exist.

Monitoring progress

You can monitor the progress of the copy operation using the ais show job copy command. Add the --refresh flag followed by a time in seconds to get automatic updates:

1$ ais show job copy --refresh 10
2
3copy-objects[tco-goDbhCxtf] (ctl: s3://src-bucket=>ais://dst-bucket prefix:a, parallelism: w[6])
4NODE ID KIND SRC BUCKET DST BUCKET OBJECTS BYTES START END STATE
5KactABCD tco-goDbhCxtf copy-listrange s3://src-bucket ais://dst-bucket 82 11.00MiB 18:04:15 - Running
6XXytEFGH tco-goDbhCxtf copy-listrange s3://src-bucket ais://dst-bucket 80 8.00MiB 18:04:15 - Running
7YMjtIJKL tco-goDbhCxtf copy-listrange s3://src-bucket ais://dst-bucket 104 23.00MiB 18:04:15 - Running
8oJXtMNOP tco-goDbhCxtf copy-listrange s3://src-bucket ais://dst-bucket 134 18.00MiB 18:04:15 - Running
9vWrtQRST tco-goDbhCxtf copy-listrange s3://src-bucket ais://dst-bucket 118 12.00MiB 18:04:15 - Running
10ybTtUVWX tco-goDbhCxtf copy-listrange s3://src-bucket ais://dst-bucket 71 10.02MiB 18:04:15 - Running
11 Total: 589 82.02MiB ✓

The output shows statistics for each node in the AIStore cluster:

  • NODE: The name of the node
  • ID: The job ID
  • KIND: The type of operation
  • SRC BUCKET: Source bucket
  • DST BUCKET: Destination bucket
  • OBJECTS: Number of objects processed
  • BYTES: Amount of data transferred
  • START: Job start time
  • END: Job end time (empty if job is still running)
  • STATE: Current job state

The output also includes a “Total” row at the bottom that provides cluster-wide aggregated values for the number of objects processed and bytes transferred. The checkmark (✓) indicates that all nodes are reporting byte statistics.

Stopping all jobs

To stop all in-progress jobs:

1$ ais stop --all
2Stopped copy-listrange[tco-goDbhCxtf]

In our example, there’d be a single job ID tco-goDbhCxtf

Example copying buckets and multi-objects with simultaneous synchronization

There’s a script that we use for testing. When run, it produces the following output:

1$ ./ais/test/scripts/cp-sync-remais-out-of-band.sh --bucket gs://abc
2
3 1. generate and write 500 random shards => gs://abc
4 2. copy gs://abc => ais://dst-9408
5 3. remove 10 shards from the source
6 4. copy gs://abc => ais://dst-9408 w/ synchronization ('--sync' option)
7 5. remove another 10 shards
8 6. copy multiple objects using bash-expansion defined range and '--sync'
9 #
10 # out of band DELETE using remote AIS (remais)
11 #
12 7. use remote AIS cluster ("remais") to out-of-band remove 10 shards from the source
13 8. copy gs://abc => ais://dst-9408 w/ --sync
14 9. when copying, we always synchronize content of the in-cluster source as well
1510. use remais to out-of-band remove 10 more shards from gs://abc source
1611. copy a range of shards from gs://abc to ais://dst-9408, and compare
1712. and again: when copying, we always synchronize content of the in-cluster source as well
18 #
19 # out of band ADD using remote AIS (remais)
20 #
2113. use remais to out-of-band add (i.e., PUT) 17 new shards
2214. copy a range of shards from gs://abc to ais://dst-9408, and check whether the destination has new shards
2315. compare the contents but NOTE: as of v3.22, this part requires multi-object copy (using '--list' or '--template')

The script executes a sequence of steps (above).

Notice a certain limitation (that also shows up as the last step #15):

  • As of the version 3.22, aistore cp commands will always synchronize deleted and updated remote content.

  • However, to see an out-of-band added content, you currently need to run multi-object copy, with multiple source objects specified using --list or --template.

See also

  • ais cp --help for the most recently updated options
  • to fully synchronize in-cluster content with remote backend, please refer to out of band updates

Show bucket summary

`ais storage summary PROVIDER:[//BUCKET_NAME] - show bucket sizes and the respective percentages of used capacity on a per-bucket basis [command options]

ais bucket summary - same as above.

Options

1$ ais storage summary --help
2
3NAME:
4 ais storage summary - Show bucket sizes and %% of used capacity on a per-bucket basis
5
6USAGE:
7 ais storage summary [BUCKET[/PREFIX]] [PROVIDER] [command options]
8
9OPTIONS:
10 --cached Only list in-cluster objects, i.e., objects from the respective remote bucket that are present ("cached") in the cluster
11 --count value Used together with '--refresh' to limit the number of generated reports, e.g.:
12 '--refresh 10 --count 5' - run 5 times with 10s interval (default: 0)
13 --dont-wait When _summarizing_ buckets do not wait for the respective job to finish -
14 use the job's UUID to query the results interactively
15 --no-headers, -H Display tables without headers
16 --prefix value For each bucket, select only those objects (names) that start with the specified prefix, e.g.:
17 '--prefix a/b/c' - sum up sizes of the virtual directory a/b/c and objects from the virtual directory
18 a/b that have names (relative to this directory) starting with the letter c
19 --refresh value Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
20 valid time units: ns, us (or µs), ms, s (default), m, h
21 --units value Show statistics and/or parse command-line specified sizes using one of the following units of measurement:
22 iec - IEC format, e.g.: KiB, MiB, GiB (default)
23 si - SI (metric) format, e.g.: KB, MB, GB
24 raw - do not convert to (or from) human-readable format
25 --verbose, -v Verbose output
26 --help, -h Show help

If BUCKET is omitted, the command applies to all AIS buckets.

The output includes the total number of objects in a bucket, the bucket’s size (bytes, megabytes, etc.), and the percentage of the total capacity used by the bucket.

A few additional words must be said about --validate. The option is provided to run integrity checks, namely: locations of objects, replicas, and EC slices in the bucket, the number of replicas (and whether this number agrees with the bucket configuration), and more.

Location of each stored object must at any point in time correspond to the current cluster map and, within each storage target, to the target’s mountpaths. A failure to abide by location rules is called misplacement; misplaced objects - if any - must be migrated to their proper locations via automated processes called global rebalance and resilver:

Notes

--validate may take considerable time to execute (depending, of course, on sizes of the datasets in question and the capabilities of the underlying hardware); non-zero misplaced objects in the (validated) output is a direct indication that the cluster requires rebalancing and/or resilvering; an alternative way to execute validation is to run ais storage validate or (simply) ais scrub:

1$ ais scrub --help
2
3NAME:
4 ais scrub - (alias for "storage validate") Check in-cluster content for misplaced objects, objects that have insufficient numbers of copies, zero size, and more
5 e.g.:
6 * ais storage validate - validate all in-cluster buckets;
7 * ais scrub - same as above;
8 * ais storage validate ais - validate (a.k.a. scrub) all ais:// buckets;
9 * ais scrub s3 - ditto, all s3:// buckets;
10 * ais scrub s3 --refresh 10 - same as above while refreshing runtime counter(s) every 10s;
11 * ais scrub gs://abc/images/ - validate part of the gcp bucket under 'images/`;
12 * ais scrub gs://abc --prefix images/ - same as above.
13
14USAGE:
15 ais scrub [BUCKET[/PREFIX]] [PROVIDER] [command options]
16
17OPTIONS:
18 --all-columns Show all columns, including those with only zero values
19 --cached Only visit in-cluster objects, i.e., objects from the respective remote bucket that are present ("cached") in the cluster
20 --count value Used together with '--refresh' to limit the number of generated reports, e.g.:
21 '--refresh 10 --count 5' - run 5 times with 10s interval (default: 0)
22 --large-size value Count and report all objects that are larger or equal in size (e.g.: 4mb, 1MiB, 1048576, 128k; default: 5 GiB)
23 --limit value The maximum number of objects to list, get, or otherwise handle (0 - unlimited; see also '--max-pages'),
24 e.g.:
25 - 'ais ls gs://abc/dir --limit 1234 --cached --props size,custom,atime' - list no more than 1234 objects
26 - 'ais get gs://abc /dev/null --prefix dir --limit 1234' - get --/--
27 - 'ais scrub gs://abc/dir --limit 1234' - scrub --/-- (default: 0)
28 --max-pages value Maximum number of pages to display (see also '--page-size' and '--limit')
29 e.g.: 'ais ls az://abc --paged --page-size 123 --max-pages 7 (default: 0)
30 --no-headers, -H Display tables without headers
31 --non-recursive, --nr Non-recursive operation, e.g.:
32 - 'ais ls gs://bucket/prefix --nr' - list objects and/or virtual subdirectories with names starting with the specified prefix;
33 - 'ais ls gs://bucket/prefix/ --nr' - list contained objects and/or immediately nested virtual subdirectories _without_ recursing into the latter;
34 - 'ais prefetch s3://bck/abcd --nr' - prefetch a single named object (see 'ais prefetch --help' for details);
35 - 'ais rmo gs://bucket/prefix --nr' - remove a single object with the specified name (see 'ais rmo --help' for details)
36 --page-size value Maximum number of object names per page; when the flag is omitted or 0
37 the maximum is defined by the corresponding backend; see also '--max-pages' and '--paged' (default: 0)
38 --prefix value For each bucket, select only those objects (names) that start with the specified prefix, e.g.:
39 '--prefix a/b/c' - sum up sizes of the virtual directory a/b/c and objects from the virtual directory
40 a/b that have names (relative to this directory) starting with the letter c
41 --refresh value Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
42 valid time units: ns, us (or µs), ms, s (default), m, h
43 --small-size value Count and report all objects that are smaller or equal in size (e.g.: 4, 4b, 1k, 128kib; default: 0)
44 --help, -h Show help

For details and additional examples, please see:

Examples

1# 1. show summary for a specific bucket
2$ ais bucket summary ais://abc
3NAME OBJECTS SIZE ON DISK USAGE(%)
4ais://abc 10902 5.38GiB 1%
5
6For min/avg/max object sizes, use `--fast=false`.
1# 2. "summarize" all buckets(*)
2$ ais bucket summary
3NAME OBJECTS SIZE ON DISK USAGE(%)
4ais://abc 10902 5.38GiB 1%
5ais://nnn 49873 200.00MiB 0%
1# 3. "summarize" all s3:// buckets; count both "cached" and remote objects:
2$ ais bucket summary s3: --all
1# 4. same as above with progress updates every 3 seconds:
2$ ais bucket summary s3: --all --refresh 3
1# 4. "summarize" a given gs:// bucket; start the job and exit without waiting for it to finish
2# (see prompt below):
3$ ais bucket summary gs://abc --all --dont-wait
4
5Job summary[wl-s5lIWA] has started. To monitor, run 'ais storage summary gs://abc wl-s5lIWA --dont-wait' or 'ais show job wl-s5lIWA;
6see '--help' for details'

Start N-way Mirroring

ais start mirror BUCKET --copies <value>

Start an extended action to bring a given bucket to a certain redundancy level (value copies). Read more about this feature here.

Options

1$ ais start mirror --help
2
3NAME:
4 ais start mirror - Configure (or unconfigure) bucket as n-way mirror, and run the corresponding batch job, e.g.:
5 - 'ais start mirror ais://m --copies 3' - configure ais://m as a 3-way mirror;
6 - 'ais start mirror ais://m --copies 1' - configure ais://m for no redundancy (no extra copies).
7 (see also: 'ais start ec-encode')
8
9USAGE:
10 ais start mirror BUCKET [command options]
11
12OPTIONS:
13 --copies value Number of object replicas (default: 1)
14 --non-verbose, --nv Non-verbose (quiet) output, minimized reporting, fewer warnings
15 --help, -h Show help

Start Erasure Coding

ais start ec-encode BUCKET --data-slices <value> --parity-slices <value>

Start an extended action that encodes and recovers all objects and slices in a given bucket. The action enables erasure coding if it is disabled, and runs the encoding for all objects in the bucket in the background. If erasure coding for the bucket was enabled beforehand, the extended action recovers missing objects and slices if possible.

In case of running the extended action for a bucket that has already erasure coding enabled, you must pass the correct number of parity and data slices in the command-line. Run ais bucket props show <bucket-name> ec to get the current erasure coding settings. Read more about this feature here.

Options

1$ ais start ec-encode --help
2
3NAME:
4 ais start ec-encode - Erasure code entire bucket, e.g.:
5 - 'ais start ec-encode ais://nnn -d 8 -p 2' - erasure-code ais://nnn for 8 data and 2 parity slices;
6 - 'ais start ec-encode ais://nnn --data-slices 8 --parity-slices 2' - same as above;
7 - 'ais start ec-encode ais://nnn --recover' - check and make sure that every ais://nnn object is properly erasure-coded.
8 see also: 'ais start mirror'
9
10USAGE:
11 ais start ec-encode BUCKET [command options]
12
13OPTIONS:
14 --data-slices value, -d value Number of data slices (default: 2)
15 --non-verbose, --nv Non-verbose (quiet) output, minimized reporting, fewer warnings
16 --parity-slices value, -p value Number of parity slices (default: 2)
17 --recover Check and make sure that each and every object is properly erasure coded
18 --help, -h Show help

All options are required and must be greater than 0.

Show bucket properties

Overall, the topic called “bucket properties” is rather involved and includes sub-topics “bucket property inheritance” and “cluster-wide global defaults”. For background, please first see:

Now, as far as CLI, run the following to list properties of the specified bucket. By default, a certain compact form of bucket props sections is presented.

ais bucket props show BUCKET [PROP_PREFIX] [command options]

When PROP_PREFIX is set, only props that start with PROP_PREFIX will be displayed. Useful PROP_PREFIX are: access, checksum, ec, lru, mirror, provider, versioning.

ais bucket show is an alias for ais show bucket - both can be used interchangeably.

Options

1$ ais bucket props show --help
2
3NAME:
4 ais bucket props show - Show bucket properties
5
6USAGE:
7 ais bucket props show BUCKET [PROP_PREFIX] [command options]
8
9OPTIONS:
10 --add Add remote bucket to cluster's metadata
11 - let's say, s3://abc is accessible but not present in the cluster (e.g., 'ais ls' returns error);
12 - most of the time, there's no need to worry about it as aistore handles presence/non-presence
13 transparently behind the scenes;
14 - but if you do want to (explicltly) add the bucket, you could also use '--add' option
15 --compact, -c Display properties grouped in human-readable mode
16 --json, -j JSON input/output
17 --no-headers, -H Display tables without headers
18 --help, -h Show help

Examples

Show bucket props with provided section

Show only lru section of bucket props for bucket_name bucket.

1$ ais bucket props show s3://bucket-name --compact
2PROPERTY VALUE
3access GET,HEAD-OBJECT,PUT,APPEND,DELETE-OBJECT,MOVE-OBJECT,PROMOTE,UPDATE-OBJECT,HEAD-BUCKET,LIST-OBJECTS,PATCH,SET-BUCKET-ACL,LIST-BUCKETS,SHOW-CLUSTER,CREATE-BUCKET,DESTROY-BUCKET,MOVE-BUCKET,ADMIN
4checksum Type: xxhash | Validate: Nothing
5created 2024-01-31T15:42:59-08:00
6ec Disabled
7lru lru.dont_evict_time=2h0m, lru.capacity_upd_time=10m
8mirror Disabled
9present yes
10provider aws
11versioning Disabled
12
13$ ais bucket props show s3://bucket_name lru --compact
14PROPERTY VALUE
15lru lru.dont_evict_time=2h0m, lru.capacity_upd_time=10m
16
17$ ais bucket props show s3://ais-abhishek lru
18PROPERTY VALUE
19lru.capacity_upd_time 10m
20lru.dont_evict_time 2h0m
21lru.enabled true

Set bucket properties

ais bucket props set [OPTIONS] BUCKET JSON_SPECIFICATION|KEY=VALUE [KEY=VALUE...]

Set bucket properties. For the available options, see bucket-properties.

If JSON_SPECIFICATION is used, all properties of the bucket are set based on the values in the JSON object.

Options

1$ ais bucket props set --help
2
3NAME:
4 ais bucket props set - Update bucket properties; the command accepts both JSON-formatted input and plain Name=Value pairs, e.g.:
5 * ais bucket props set ais://nnn backend_bck=s3://mmm
6 * ais bucket props set ais://nnn backend_bck=none
7 * ais bucket props set gs://vvv versioning.validate_warm_get=false versioning.synchronize=true
8 * ais bucket props set gs://vvv mirror.enabled=true mirror.copies=4 checksum.type=md5
9 * ais bucket props set s3://mmm ec.enabled true ec.data_slices 6 ec.parity_slices 4 --force
10 References:
11 * for details and many more examples, see docs/cli/bucket.md
12 * to show bucket properties (names and current values), use 'ais bucket show'
13
14USAGE:
15 ais bucket props set BUCKET JSON-formatted-KEY-VALUE | KEY=VALUE [KEY=VALUE...] [command options]
16
17OPTIONS:
18 --force, -f Force execution of the command (caution: advanced usage only)
19 --skip-lookup Do not execute HEAD(bucket) request to lookup remote bucket and its properties; possible usage scenarios include:
20 1) adding remote bucket to aistore without first checking the bucket's accessibility
21 (e.g., to configure the bucket's aistore properties with alternative security profile and/or endpoint)
22 2) listing public-access Cloud buckets where certain operations (e.g., 'HEAD(bucket)') may be disallowed
23 --help, -h Show help

When JSON specification is not used, some properties support user-friendly aliases:

PropertyValue aliasDescription
accessroDisables bucket modifications: denies PUT, DELETE, and ColdGET requests
accessrwEnables object modifications: allows PUT, DELETE, and ColdGET requests
accesssuEnables full access: all rw permissions, bucket deletion, and changing bucket permissions

Examples

Enable mirroring for a bucket

Set the mirror.enabled and mirror.copies properties to true and 2 respectively, for the bucket bucket_name

1$ ais bucket props set ais://bucket_name 'mirror.enabled=true' 'mirror.copies=2'
2Bucket props successfully updated
3"mirror.enabled" set to:"true" (was:"false")

Make a bucket read-only

Set read-only access to the bucket bucket_name. All PUT and DELETE requests will fail.

1$ ais bucket props set ais://bucket_name 'access=ro'
2Bucket props successfully updated
3"access" set to:"GET,HEAD-OBJECT,HEAD-BUCKET,LIST-OBJECTS" (was:"<PREV_ACCESS_LIST>")

Configure custom AWS S3 endpoint

When a bucket is hosted by an S3 compliant backend (such as, e.g., minio), we may want to specify an alternative S3 endpoint, so that AIS nodes use it when reading, writing, listing, and generally, performing all operations on remote S3 bucket(s).

Globally, S3 endpoint can be overridden for all S3 buckets via “S3_ENDPOINT” environment. If you decide to make the change, you may need to restart AIS cluster while making sure that “S3_ENDPOINT” is available for the AIS nodes when they are starting up.

But it can be also be done - and will take precedence over the global setting - on a per-bucket basis.

Here are some examples:

1# Let's say, there exists a bucket called s3://abc:
2$ ais ls s3://abc
3NAME SIZE
4README.md 8.96KiB
5
6# First, we override empty the endpoint property in the bucket's configuration.
7# To see that a non-empty value *applies* and works, we will use the default AWS S3 endpoint: https://s3.amazonaws.com
8$ ais bucket props set s3://abc extra.aws.endpoint=s3.amazonaws.com
9Bucket "aws://abc": property "extra.aws.endpoint=s3.amazonaws.com", nothing to do
10$ ais ls s3://abc
11NAME SIZE
12README.md 8.96KiB
13
14# Second, set the endpoint=foo (or, it could be any other invalid value), and observe that the bucket becomes unreachable:
15$ ais bucket props set s3://abc extra.aws.endpoint=foo
16Bucket props successfully updated
17"extra.aws.endpoint" set to: "foo" (was: "s3.amazonaws.com")
18$ ais ls s3://abc
19RequestError: send request failed: dial tcp: lookup abc.foo: no such host
20
21# Finally, revert the endpoint back to empty, and check that the bucket is visible again:
22$ ais bucket props set s3://abc extra.aws.endpoint=""
23Bucket props successfully updated
24"extra.aws.endpoint" set to: "" (was: "foo")
25$ ais ls s3://abc
26NAME SIZE
27README.md 8.96KiB

Global export S3_ENDPOINT=... override is static and readonly. Use it with extreme caution as it applies to all buckets.

On the other hand, for any given s3://bucket its S3 endpoint can be set, unset, and otherwise changed at any time - at runtime. As shown above.

Connect/Disconnect AIS bucket to/from cloud bucket

Set backend bucket for AIS bucket bucket_name to the GCP cloud bucket cloud_bucket. Once the backend bucket is set, operations (get, put, list, etc.) with ais://bucket_name will be exactly as we would do with gcp://cloud_bucket. It’s like a symlink to a cloud bucket. The only difference is that all objects will be cached into ais://bucket_name (and reflected in the cloud as well) instead of gcp://cloud_bucket.

1$ ais bucket props set ais://bucket_name backend_bck=gcp://cloud_bucket
2Bucket props successfully updated
3"backend_bck.name" set to: "cloud_bucket" (was: "")
4"backend_bck.provider" set to: "gcp" (was: "")

To disconnect cloud bucket do:

1$ ais bucket props set ais://bucket_name backend_bck=none
2Bucket props successfully updated
3"backend_bck.name" set to: "" (was: "cloud_bucket")
4"backend_bck.provider" set to: "" (was: "gcp")

Ignore non-critical errors

To create an erasure-encoded bucket or enable EC for an existing bucket, AIS requires at least ec.data_slices + ec.parity_slices + 1 targets. At the same time, for small objects (size is less than ec.objsize_limit) it is sufficient to have only ec.parity_slices + 1 targets. Option --force allows creating erasure-encoded buckets when the number of targets is not enough but the number exceeds ec.parity_slices.

Note that if the number of targets is less than ec.data_slices + ec.parity_slices + 1, the cluster accepts only objects smaller than ec.objsize_limit. Bigger objects are rejected on PUT.

In examples a cluster with 6 targets is used:

1$ # Creating a bucket
2$ ais create ais://bck --props "ec.enabled=true ec.data_slices=6 ec.parity_slices=4"
3Create bucket "ais://bck" failed: EC config (6 data, 4 parity) slices requires at least 11 targets (have 6)
4$
5$ ais create ais://bck --props "ec.enabled=true ec.data_slices=6 ec.parity_slices=4" --force
6"ais://bck" bucket created
7$
8$ # If the number of targets is less than or equal to ec.parity_slices even `--force` does not help
9$
10$ ais bucket props set ais://bck ec.enabled true ec.data_slices 6 ec.parity_slices 8
11EC config (6 data, 8 parity)slices requires at least 15 targets (have 6). To show bucket properties, run "ais show bucket BUCKET -v".
12$
13$ ais bucket props set ais://bck ec.enabled true ec.data_slices 6 ec.parity_slices 8 --force
14EC config (6 data, 8 parity)slices requires at least 15 targets (have 6). To show bucket properties, run "ais show bucket BUCKET -v".
15$
16$ # Use force to enable EC if the number of target is sufficient to keep `ec.parity_slices+1` replicas
17$
18$ ais bucket props set ais://bck ec.enabled true ec.data_slices 6 ec.parity_slices 4
19EC config (6 data, 8 parity)slices requires at least 11 targets (have 6). To show bucket properties, run "ais show bucket BUCKET -v".
20$
21$ ais bucket props set ais://bck ec.enabled true ec.data_slices 6 ec.parity_slices 4 --force
22Bucket props successfully updated
23"ec.enabled" set to: "true" (was: "false")
24"ec.parity_slices" set to: "4" (was: "2")

Once erasure encoding is enabled for a bucket, the number of data and parity slices cannot be modified. The minimum object size ec.objsize_limit can be changed on the fly. To avoid accidental modification when EC for a bucket is enabled, the option --force must be used.

1$ ais bucket props set ais://bck ec.enabled true
2Bucket props successfully updated
3"ec.enabled" set to: "true" (was: "false")
4$
5$ ais bucket props set ais://bck ec.objsize_limit 320000
6P[dBbfp8080]: once enabled, EC configuration can be only disabled but cannot change. To show bucket properties, run "ais show bucket BUCKET -v".
7$
8$ ais bucket props set ais://bck ec.objsize_limit 320000 --force
9Bucket props successfully updated
10"ec.objsize_limit" set to:"320000" (was:"262144")

Set bucket properties with JSON

Set all bucket properties for bucket_name bucket based on the provided JSON specification.

$$ ais bucket props set ais://bucket_name '{
> "provider": "ais",
> "versioning": {
> "enabled": true,
> "validate_warm_get": false
> },
> "checksum": {
> "type": "xxhash",
> "validate_cold_get": true,
> "validate_warm_get": false,
> "validate_obj_move": false,
> "enable_read_range": false
> },
> "lru": {
> "dont_evict_time": "20m",
> "capacity_upd_time": "1m",
> "enabled": true
> },
> "mirror": {
> "copies": 2,
> "burst_buffer": 512,
> "enabled": false
> },
> "ec": {
> "objsize_limit": 256000,
> "data_slices": 2,
> "parity_slices": 2,
> "enabled": true
> },
> "access": "255"
>}'
$"access" set to: "GET,HEAD-OBJECT,PUT,APPEND,DELETE-OBJECT,MOVE-OBJECT,PROMOTE,UPDATE-OBJECT" (was: "GET,HEAD-OBJECT,PUT,APPEND,DELETE-OBJECT,MOVE-OBJECT,PROMOTE,UPDATE-OBJECT,HEAD-BUCKET,LIST-OBJECTS,PATCH,SET-BUCKET-ACL,LIST-BUCKETS,SHOW-CLUSTER,CREATE-BUCKET,DESTROY-BUCKET,MOVE-BUCKET,ADMIN")
$"ec.enabled" set to: "true" (was: "false")
$"ec.objsize_limit" set to: "256000" (was: "262144")
$"lru.capacity_upd_time" set to: "1m" (was: "10m")
$"lru.dont_evict_time" set to: "20m" (was: "1s")
$"lru.enabled" set to: "true" (was: "false")
$"mirror.enabled" set to: "false" (was: "true")
$
$Bucket props successfully updated.
1$ ais show bucket ais://bucket_name --compact
2PROPERTY VALUE
3access GET,HEAD-OBJECT,PUT,APPEND,DELETE-OBJECT,MOVE-OBJECT,PROMOTE,UPDATE-OBJECT
4checksum Type: xxhash | Validate: ColdGET
5created 2024-02-02T12:57:17-08:00
6ec 2:2 (250KiB)
7lru lru.dont_evict_time=20m, lru.capacity_upd_time=1m
8mirror Disabled
9present yes
10provider ais
11versioning Enabled | Validate on WarmGET: no

If not all properties are mentioned in the JSON, the missing ones are set to zero values (empty / false / nil):

$$ ais bucket props set ais://bucket-name '{
> "mirror": {
> "enabled": true,
> "copies": 2
> },
> "versioning": {
> "enabled": true,
> "validate_warm_get": true
> }
>}'
$"mirror.enabled" set to: "true" (was: "false")
$"versioning.validate_warm_get" set to: "true" (was: "false")
$
$Bucket props successfully updated.
$
$$ ais show bucket ais://bucket-name --compact
$PROPERTY VALUE
$access GET,HEAD-OBJECT,PUT,APPEND,DELETE-OBJECT,MOVE-OBJECT,PROMOTE,UPDATE-OBJECT,HEAD-BUCKET,LIST-OBJECTS,PATCH,SET-BUCKET-ACL,LIST-BUCKETS,SHOW-CLUSTER,CREATE-BUCKET,DESTROY-BUCKET,MOVE-BUCKET,ADMIN
$checksum Type: xxhash | Validate: Nothing
$created 2024-02-02T12:52:30-08:00
$ec Disabled
$lru lru.dont_evict_time=2h0m, lru.capacity_upd_time=10m
$mirror 2 copies
$present yes
$provider ais
$versioning Enabled | Validate on WarmGET: yes

Archive multiple objects

ais archive bucket - Archive selected or matching objects from SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] as (.tar, .tgz or .tar.gz, .zip, .tar.lz4)-formatted object (a.k.a. shard).

1$ ais archive bucket --help
2NAME:
3 ais archive bucket - Archive selected or matching objects from SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] as
4 (.tar, .tgz or .tar.gz, .zip, .tar.lz4)-formatted object (a.k.a. "shard"):
5 - 'ais archive bucket ais://src gs://dst/a.tar.lz4 --template "trunk-{001..997}"' - archive (prefix+range) matching objects from ais://src;
6 - 'ais archive bucket "ais://src/trunk-{001..997}" gs://dst/a.tar.lz4' - same as above (notice double quotes);
7 - 'ais archive bucket "ais://src/trunk-{998..999}" gs://dst/a.tar.lz4 --append-or-put' - add two more objects to an existing shard;
8 - 'ais archive bucket s3://src/trunk-00 ais://dst/b.tar' - archive "trunk-00" prefixed objects from an s3 bucket as a given TAR destinati
9on
10
11USAGE:
12 ais archive bucket SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] DST_BUCKET/SHARD_NAME [command options]
13
14OPTIONS:
15 append-or-put Append to an existing destination object ("archive", "shard") iff exists; otherwise PUT a new archive (shard);
16 note that PUT (with subsequent overwrite if the destination exists) is the default behavior when the flag is omitted
17 cont-on-err Keep running archiving xaction (job) in presence of errors in a any given multi-object transaction
18 dry-run Preview the results without really running the action
19 include-src-bck Prefix the names of archived files with the source bucket name
20 list Comma-separated list of object or file names, e.g.:
21 --list 'o1,o2,o3'
22 --list "abc/1.tar, abc/1.cls, abc/1.jpeg"
23 or, when listing files and/or directories:
24 --list "/home/docs, /home/abc/1.tar, /home/abc/1.jpeg"
25 non-recursive,nr Non-recursive operation, e.g.:
26 - 'ais ls gs://bck/sub --nr' - list objects and/or virtual subdirectories with names starting with the specified prefix;
27 - 'ais ls gs://bck/sub/ --nr' - list only immediate contents of 'sub/' subdirectory (non-recursive);
28 - 'ais prefetch s3://bck/abcd --nr' - prefetch a single named object;
29 - 'ais evict gs://bck/sub/ --nr' - evict only immediate contents of 'sub/' subdirectory (non-recursive);
30 - 'ais evict gs://bck --prefix=sub/ --nr' - same as above
31 prefix Select virtual directories or objects with names starting with the specified prefix, e.g.:
32 '--prefix a/b/c' - matches names 'a/b/c/d', 'a/b/cdef', and similar;
33 '--prefix a/b/c/' - only matches objects from the virtual directory a/b/c/
34 skip-lookup Skip checking source and destination buckets' existence (trading off extra lookup for performance)
35
36 template Template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
37 (with optional steps and gaps), e.g.:
38 --template "" # (an empty or '*' template matches everything)
39 --template 'dir/subdir/'
40 --template 'shard-{1000..9999}.tar'
41 --template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
42 and similarly, when specifying files and directories:
43 --template '/home/dir/subdir/'
44 --template "/abc/prefix-{0010..9999..2}-suffix"
45 wait Wait for an asynchronous operation to finish (optionally, use '--timeout' to limit the waiting time)
46 help, h Show help

See also:

Show and set AWS-specific properties

AIStore supports AWS-specific configuration on a per s3 bucket basis. Any bucket that is backed up by an AWS S3 bucket (**) can be configured to use alternative:

  • named AWS profiles (with alternative credentials and/or region)
  • alternative s3 endpoints

For background and usage examples, please see AWS-specific bucket configuration.

(**) Terminology-wise, “s3 bucket” is a shortcut phrase indicating a bucket in an AIS cluster that either (A) has the same name (e.g. s3://abc) or (B) a differently named AIS bucket that has backend_bck property that specifies the s3 bucket in question.

Reset bucket properties to cluster defaults

ais bucket props reset BUCKET

Reset bucket properties to cluster defaults.

Examples

1$ ais bucket props reset bucket_name
2Bucket props successfully reset

Show bucket metadata

ais show cluster bmd

Show bucket metadata (BMD).

Examples

1$ ais show cluster bmd
2PROVIDER NAMESPACE NAME BACKEND COPIES EC(D/P, minsize) CREATED
3ais test 2 25 Mar 21 18:28 PDT
4ais validation 25 Mar 21 18:29 PDT
5ais train 25 Mar 21 18:28 PDT
6
7Version: 9
8UUID: jcUfFDyTN