Working with Archives (Shards)

In AIStore, archives (also called shards) are special objects that contain multiple files packaged together in formats like TAR, TGZ, ZIP, or TAR.LZ4. Working with archives is essential for efficiently managing collections of related files and for operations like distributed sorting.

In this document:

Commands to read, write, extract, and list archives - objects formatted as TAR, TGZ (or TAR.GZ) , ZIP, or TAR.LZ4.

For the most recently updated list of supported archival formats, please refer to this source.

The corresponding subset of CLI commands starts with ais archive, from where you can <TAB-TAB> to the actual (reading, writing, etc.) operation.

Subcommands

The corresponding subset of subcommands starts with ais archive, from where you can <TAB-TAB> to the actual operation:

ais archive ls		# List archived content
ais archive put		# Archive files from local filesystem
ais archive get		# Extract content from archives
ais archive bucket	# Archive objects from a bucket
ais archive gen-shards	# Generate test archives

For detailed help on any command, use the --help option:

Archive files and directories (`ais archive put`)

Archive multiple files.

1 $ ais archive put --help
2 
3 NAME:
4    ais archive put - Archive a file, a directory, or multiple files and/or directories as
5      (.tar, .tgz or .tar.gz, .zip, .tar.lz4)-formatted object - aka "shard".
6      Both APPEND (to an existing shard) and PUT (a new version of the shard) are supported.
7      Examples:
8      - 'local-file s3://q/shard-00123.tar.lz4 --append --archpath name-in-archive' - append file to a given shard,
9         optionally, rename it (inside archive) as specified;
10      - 'local-file s3://q/shard-00123.tar.lz4 --append-or-put --archpath name-in-archive' - append file to a given shard if exists,
11         otherwise, create a new shard (and name it shard-00123.tar.lz4, as specified);
12      - 'src-dir gs://w/shard-999.zip --append' - archive entire 'src-dir' directory; iff the destination .zip doesn't exist create a new one;
13      - '"sys, docs" ais://dst/CCC.tar --dry-run -y -r --archpath ggg/' - dry-run to recursively archive two directories.
14      Tips:
15      - use '--dry-run' if in doubt;
16      - to archive objects from a ais:// or remote bucket, run 'ais archive bucket' (see --help for details).
17 
18 USAGE:
19    ais archive put [-|FILE|DIRECTORY[/PATTERN]] BUCKET/SHARD_NAME [command options]
20 
21 OPTIONS:
22    --append             Add newly archived content to the destination object ("archive", "shard") that must exist
23    --append-or-put      Append to an existing destination object ("archive", "shard") iff exists; otherwise PUT a new archive (shard);
24                         note that PUT (with subsequent overwrite if the destination exists) is the default behavior when the flag is omitted
25    --archpath value     Filename in an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4
26    --cont-on-err        Keep running archiving xaction (job) in presence of errors in a any given multi-object transaction
27    --dry-run            Preview the results without really running the action
28    --include-src-dir    Prefix the names of archived files with the (root) source directory
29    --list value         Comma-separated list of object or file names, e.g.:
30                         --list 'o1,o2,o3'
31                         --list "abc/1.tar, abc/1.cls, abc/1.jpeg"
32                         or, when listing files and/or directories:
33                         --list "/home/docs, /home/abc/1.tar, /home/abc/1.jpeg"
34    --num-workers value  Number of concurrent client-side workers (to execute PUT or append requests);
35                         use (-1) to indicate single-threaded serial execution (ie., no workers);
36                         any positive value will be adjusted _not_ to exceed twice the number of client CPUs (default: 10)
37    --progress           Show progress bar(s) and progress of execution in real time
38    --recursive, -r      Recursive operation
39    --refresh value      Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
40                         valid time units: ns, us (or µs), ms, s (default), m, h
41    --skip-vc            Skip loading object metadata (and the associated checksum & version related processing)
42    --template value     Template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
43                         (with optional steps and gaps), e.g.:
44                         --template "" # (an empty or '*' template matches everything)
45                         --template 'dir/subdir/'
46                         --template 'shard-{1000..9999}.tar'
47                         --template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
48                         and similarly, when specifying files and directories:
49                         --template '/home/dir/subdir/'
50                         --template "/abc/prefix-{0010..9999..2}-suffix"
51    --timeout value      Maximum time to wait for a job to finish; if omitted: wait forever or until Ctrl-C;
52                         valid time units: ns, us (or µs), ms, s (default), m, h
53    --units value        Show statistics and/or parse command-line specified sizes using one of the following units of measurement:
54                         iec - IEC format, e.g.: KiB, MiB, GiB (default)
55                         si  - SI (metric) format, e.g.: KB, MB, GB
56                         raw - do not convert to (or from) human-readable format
57    --verbose, -v        Verbose output
58    --wait               Wait for an asynchronous operation to finish (optionally, use '--timeout' to limit the waiting time)
59    --yes, -y            Assume 'yes' to all questions
60    --help, -h           Show help

The operation accepts either an explicitly defined list or template-defined range of file names (to archive).

NOTE:

ais archive put works with locally accessible (source) files and shall not be confused with ais archive bucket command (below).

Also, note that ais put command with its --archpath option provides an alternative way to archive multiple objects:

For the most recently updated list of supported archival formats, please see:

this source.

Append files and directories to an existing archive

APPEND operation provides for appending files to existing archives (shards). As such, APPEND is a variation of PUT (above) with additional two boolean flags:

Name	Description
`--append`	add newly archived content to the destination object (“archive”, “shard”) that must exist
`--append-or-put`	if destination object (“archive”, “shard”) exists append to it, otherwise archive a new one

Example 1: add file to archive

step 1. create archive (by archiving a given source dir)

1 $ ais archive put sys ais://nnn/sys.tar.lz4
2 Warning: multi-file 'archive put' operation requires either '--append' or '--append-or-put' option
3 Proceed to execute 'archive put --append-or-put'? [Y/N]: y
4 Files to upload:
5 EXTENSION        COUNT   SIZE
6 .go              11      17.46KiB
7 TOTAL            11      17.46KiB
8 APPEND 11 files (one directory, non-recursive) => ais://nnn/sys.tar.lz4? [Y/N]: y
9 Done

step 2. add a single file to existing archive

1 $ ais archive put README.md ais://nnn/sys.tar.lz4 --archpath=docs/README --append
2 APPEND README.md to ais://nnn/sys.tar.lz4 as "docs/README"

step 3. list entire bucket with an `--archive` option to show all archived entries

1 $ ais ls ais://nnn --archive
2 NAME                             SIZE
3 sys.tar.lz4                      16.84KiB
4     sys.tar.lz4/api_linux.go     1.07KiB
5     sys.tar.lz4/cpu.go           1.07KiB
6     sys.tar.lz4/cpu_darwin.go    802B
7     sys.tar.lz4/cpu_linux.go     2.14KiB
8     sys.tar.lz4/docs/README      13.85KiB
9     sys.tar.lz4/mem.go           1.16KiB
10     sys.tar.lz4/mem_darwin.go    2.04KiB
11     sys.tar.lz4/mem_linux.go     2.81KiB
12     sys.tar.lz4/proc.go          784B
13     sys.tar.lz4/proc_darwin.go   369B
14     sys.tar.lz4/proc_linux.go    1.40KiB
15     sys.tar.lz4/sys_test.go      3.88KiB
16 Listed: 13 names

Alternatively, use regex to select:

1 $ ais ls ais://nnn --archive --regex docs
2 NAME                             SIZE
3     sys.tar.lz4/docs/README      13.85KiB

Example 2: use `--template` flag to add source files

Generally, the --template option combines (an optional) prefix and/or one or more ranges (e.g., bash brace expansions).

In this case, the template we use is a simple prefix with no ranges.

1 $ ls -l /tmp/w
2 total 32
3 -rw-r--r-- 1 root root 14180 Dec 11 18:18 111
4 -rw-r--r-- 1 root root 14180 Dec 11 18:18 222
5 
6 $ ais archive put ais://nnn/shard-001.tar --template /tmp/w/ --append
7 Files to upload:
8 EXTENSION        COUNT   SIZE
9                  2       27.70KiB
10 TOTAL            2       27.70KiB
11 APPEND 2 files (one directory, non-recursive) => ais://nnn/shard-001.tar? [Y/N]: y
12 Done
13 $ ais ls ais://nnn/shard-001.tar --archive
14 NAME                                             SIZE
15 shard-001.tar                                    37.50KiB
16     shard-001.tar/111                            13.85KiB
17     shard-001.tar/222                            13.85KiB
18     shard-001.tar/23ed44d8bf3952a35484-1.test    1.00KiB
19     shard-001.tar/452938788ebb87807043-4.test    1.00KiB
20     shard-001.tar/7925bc9b5eb1daa12ed0-2.test    1.00KiB
21     shard-001.tar/8264574b49bd188a4b27-0.test    1.00KiB
22     shard-001.tar/f1f25e52c5edd768e0ec-3.test    1.00KiB

Example 3: add file to archive

In this example, we assume that arch.tar already exists.

1 # contents _before_:
2 $ ais archive ls ais://abc/arch.tar
3 NAME                SIZE
4 arch.tar            4.5KiB
5     arch.tar/obj1   1.0KiB
6     arch.tar/obj2   1.0KiB
7 
8 # add file to existing archive:
9 $ ais archive put /tmp/obj1.bin ais://abc/arch.tar --archpath bin/obj1
10 APPEND "/tmp/obj1.bin" to object "ais://abc/arch.tar[/bin/obj1]"
11 
12 # contents _after_:
13 $ ais archive ls ais://abc/arch.tar
14 NAME                    SIZE
15 arch.tar                6KiB
16     arch.tar/bin/obj1   2.KiB
17     arch.tar/obj1       1.0KiB
18     arch.tar/obj2       1.0KiB

Example 4: add file to archive

1 # contents _before_:
2 
3 $ ais archive ls ais://nnn/shard-2.tar
4 NAME                                             SIZE
5 shard-2.tar                                      5.50KiB
6     shard-2.tar/0379f37cbb0415e7eaea-3.test      1.00KiB
7     shard-2.tar/504c563d14852368575b-5.test      1.00KiB
8     shard-2.tar/c7bcb7014568b5e7d13b-4.test      1.00KiB
9 
10 # append and note that `--archpath` can specify a fully qualified destination name
11 
12 $ ais archive put LICENSE ais://nnn/shard-2.tar --archpath shard-2.tar/license.test
13 APPEND "/go/src/github.com/NVIDIA/aistore/LICENSE" to "ais://nnn/shard-2.tar[/shard-2.tar/license.test]"
14 
15 # contents _after_:
16 $ ais archive ls ais://nnn/shard-2.tar
17 NAME                                             SIZE
18 shard-2.tar                                      7.50KiB
19     shard-2.tar/0379f37cbb0415e7eaea-3.test      1.00KiB
20     shard-2.tar/504c563d14852368575b-5.test      1.00KiB
21     shard-2.tar/c7bcb7014568b5e7d13b-4.test      1.00KiB
22     shard-2.tar/license.test                     1.05KiB

Archive multiple objects (`ais archive bucket`)

The ais archive bucket command creates archives (shards) from multiple objects stored in a bucket. This is a powerful operation that:

Takes objects from a specified source bucket
Archives them as a single shard in the specified destination bucket

Features

Source and destination buckets can be the same or different
Supports multiple selection methods (lists, templates, prefixes)
Supports all backend providers
Supports various archival formats (.tar, .tar.gz/.tgz, .zip, .tar.lz4)
Executes asynchronously and in parallel across all AIS nodes for maximum performance

Usage

1 $ ais archive bucket --help
2 NAME:
3    ais archive bucket - Archive selected or matching objects from SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] as
4    (.tar, .tgz or .tar.gz, .zip, .tar.lz4)-formatted object (a.k.a. "shard"):
5      - 'ais archive bucket ais://src gs://dst/a.tar.lz4 --template "trunk-{001..997}"'       - archive (prefix+range) matching objects from ais://src;
6      - 'ais archive bucket "ais://src/trunk-{001..997}" gs://dst/a.tar.lz4'                  - same as above (notice double quotes);
7      - 'ais archive bucket "ais://src/trunk-{998..999}" gs://dst/a.tar.lz4 --append-or-put'  - add two more objects to an existing shard;
8      - 'ais archive bucket s3://src/trunk-00 ais://dst/b.tar'                                - archive "trunk-00" prefixed objects from an s3 bucket as a given TAR destinati
9 on
10 
11 USAGE:
12    ais archive bucket SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] DST_BUCKET/SHARD_NAME [command options]
13 
14 OPTIONS:
15    append-or-put     Append to an existing destination object ("archive", "shard") iff exists; otherwise PUT a new archive (shard);
16                      note that PUT (with subsequent overwrite if the destination exists) is the default behavior when the flag is omitted
17    cont-on-err       Keep running archiving xaction (job) in presence of errors in a any given multi-object transaction
18    dry-run           Preview the results without really running the action
19    include-src-bck   Prefix the names of archived files with the source bucket name
20    list              Comma-separated list of object or file names, e.g.:
21                      --list 'o1,o2,o3'
22                      --list "abc/1.tar, abc/1.cls, abc/1.jpeg"
23                      or, when listing files and/or directories:
24                      --list "/home/docs, /home/abc/1.tar, /home/abc/1.jpeg"
25    non-recursive,nr  Non-recursive operation, e.g.:
26                      - 'ais ls gs://bck/sub --nr'               - list objects and/or virtual subdirectories with names starting with the specified prefix;
27                      - 'ais ls gs://bck/sub/ --nr'              - list only immediate contents of 'sub/' subdirectory (non-recursive);
28                      - 'ais prefetch s3://bck/abcd --nr'        - prefetch a single named object;
29                      - 'ais evict gs://bck/sub/ --nr'           - evict only immediate contents of 'sub/' subdirectory (non-recursive);
30                      - 'ais evict gs://bck --prefix=sub/ --nr'  - same as above
31    prefix            Select virtual directories or objects with names starting with the specified prefix, e.g.:
32                      '--prefix a/b/c'   - matches names 'a/b/c/d', 'a/b/cdef', and similar;
33                      '--prefix a/b/c/'  - only matches objects from the virtual directory a/b/c/
34    skip-lookup       Skip checking source and destination buckets' existence (trading off extra lookup for performance)
35 
36    template   Template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
37               (with optional steps and gaps), e.g.:
38               --template "" # (an empty or '*' template matches everything)
39               --template 'dir/subdir/'
40               --template 'shard-{1000..9999}.tar'
41               --template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
42               and similarly, when specifying files and directories:
43               --template '/home/dir/subdir/'
44               --template "/abc/prefix-{0010..9999..2}-suffix"
45    wait       Wait for an asynchronous operation to finish (optionally, use '--timeout' to limit the waiting time)
46    help, h    Show help

Selection Options

The command provides multiple ways to select objects for archiving:

Template matching: Use patterns with ranges to select objects

ais archive bucket ais://src gs://dst/a.tar --template "trunk-{001..997}"

List-based selection: Specify a comma-separated list of objects

ais archive bucket ais://bck/arch.tar --list obj1,obj2,obj3

Prefix-based selection: Select objects that share a common prefix

ais archive bucket ais://src ais://dst/archive.tar --prefix data/logs/

Non-Recursive Option (—nr)

The --nr (or --non-recursive) flag limits the scope of the archiving operation to only include objects at the specified directory level, without descending into subdirectories.

Examples with Non-Recursive Flag

Archive only the files directly in a directory (not its subdirectories):

ais archive bucket ais://nnn/aaa/ ais://dst/archive.tar --nr

This will only archive objects directly in the aaa/ directory, skipping any objects in subdirectories like aaa/bbb/.

Compare with recursive archiving (default behavior):

ais archive bucket ais://nnn/aaa/ ais://dst/archive.tar

This will archive all objects under the aaa/ prefix, including those in subdirectories like aaa/bbb/.

Visual Example

For a bucket with this structure:

ais://nnn
├── aaa/
│   ├── 777
│   ├── 888
│   ├── 999
│   └── bbb/
│       ├── 111
│       ├── 222
│       └── 333

With --nr flag:

ais archive bucket ais://nnn/aaa/ ais://dst/f.tar --nr

Result:

f.tar
├── aaa/777
├── aaa/888
└── aaa/999

Without --nr flag:

ais archive bucket ais://nnn/aaa/ ais://dst/g.tar

Result:

g.tar
├── aaa/777
├── aaa/888
├── aaa/999
├── aaa/bbb/111
├── aaa/bbb/222
└── aaa/bbb/333

Additional Options

--append-or-put: Append to an existing archive if it exists; otherwise create new
--cont-on-err: Continue archiving despite errors in multi-object transactions
--dry-run: Preview the results without executing
--include-src-bck: Prefix archived file names with the source bucket name
--skip-lookup: Skip checking bucket existence for better performance
--wait: Wait for the asynchronous operation to complete

Complete Examples

1. Archive objects with a specific prefix, non-recursively:

1 $ ais archive bucket s3://src-bck/aaa/ ais://dst/example.tar --nr
2 Archived s3://src-bck/aaa/ => ais://dst/example.tar
3 
4 $ ais ls ais://dst/example.tar --archive
5 NAME				  SIZE
6 example.tar			  106.00KiB
7     example.tar/aaa/777	          16.84KiB
8     example.tar/aaa/888	          16.84KiB
9     example.tar/aaa/999	          16.84KiB
10     example.tar/aaa/trunk-777     16.84KiB
11     example.tar/aaa/trunk-888     16.84KiB
12     example.tar/aaa/trunk-999     16.84KiB

2. Archive objects using a template range:

1 $ ais archive bucket ais://src ais://dst/range.tar --template "obj-{0..9}"
2 Archiving "ais://dst/range.tar" ...
3 
4 $ ais archive ls ais://dst/range.tar
5 
6 NAME                     SIZE
7 range.tar                92.60KiB
8     range.tar/obj-0      9.26KiB
9     range.tar/obj-1      9.26KiB
10     ...
11     range.tar/obj-9      9.26KiB

3. Incrementally append to an existing archive:

1 $ ais archive bucket ais://bck/incremental.tar --template "obj{1..3}"
2 Archived "ais://bck/incremental.tar" ...
3 
4 $ ais archive bucket ais://bck/incremental.tar --template "obj{4..5}" --append
5 Archived "ais://bck/incremental.tar"

4. Archive a list of objects from a given bucket:

1 $ ais archive bucket ais://bck/arch.tar --list obj1,obj2
2 Archiving "ais://bck/arch.tar" ...

Resulting ais://bck/arch.tar contains objects ais://bck/obj1 and ais://bck/obj2.

5. Archive objects from a different bucket, use template (range):

1 $ ais archive bucket ais://src ais://dst/arch.tar --template "obj-{0..9}"
2 
3 Archiving "ais://dst/arch.tar" ...

ais://dst/arch.tar now contains 10 objects from bucket ais://src: ais://src/obj-0, ais://src/obj-1 … ais://src/obj-9.

6. Archive 3 objects and then append 2 more:

1 $ ais archive bucket ais://bck/arch1.tar --template "obj{1..3}"
2 Archived "ais://bck/arch1.tar" ...
3 $ ais archive ls ais://bck/arch1.tar
4 NAME                     SIZE
5 arch1.tar                31.00KiB
6     arch1.tar/obj1       9.26KiB
7     arch1.tar/obj2       9.26KiB
8     arch1.tar/obj3       9.26KiB
9 
10 $ ais archive bucket ais://bck/arch1.tar --template "obj{4..5}" --append
11 Archived "ais://bck/arch1.tar"
12 
13 $ ais archive ls ais://bck/arch1.tar
14 NAME                     SIZE
15 arch1.tar                51.00KiB
16     arch1.tar/obj1       9.26KiB
17     arch1.tar/obj2       9.26KiB
18     arch1.tar/obj3       9.26KiB
19     arch1.tar/obj4       9.26KiB
20     arch1.tar/obj5       9.26KiB

Notes

ais archive bucket must not be confused with ais archive put
- archive bucket archives objects in the cluster
  - more precisely, objects accessible by the cluster
- archive put archives files from your local or locally accessible (NFS, SMB) directories
The operation runs asynchronously
- use --wait to wait for completion; see --help for details
When using the --nr (non-recursive) flag, only the immediate contents of the specified virtual directory is archived
For more information on multi-object operations, please see:
- operations on lists and ranges documentation.

List archived content

1 NAME:
2    ais archive ls - list archived content (supported formats: .tar, .tgz or .tar.gz, .zip, .tar.lz4)
3 
4 USAGE:
5    ais archive ls BUCKET[/SHARD_NAME] [command options]

List archived content as a tree with archive (“shard”) name as a root and archived files as leaves. Filenames are always sorted alphabetically.

Options

1 $ ais archive ls --help
2 
3 NAME:
4    ais archive ls - List archived content (supported formats: .tar, .tgz or .tar.gz, .zip, .tar.lz4)
5 
6 USAGE:
7    ais archive ls BUCKET[/SHARD_NAME] [command options]
8 
9 OPTIONS:
10    --all                  Depending on the context, list:
11                           - all buckets, including accessible (visible) remote buckets that are not in-cluster
12                           - all objects in a given accessible (visible) bucket, including remote objects and misplaced copies
13    --cached               Only list in-cluster objects, i.e., objects from the respective remote bucket that are present ("cached") in the cluster
14    --count-only           Print only the resulting number of listed objects and elapsed time
15    --diff                 Perform a bidirectional diff between in-cluster and remote content, which further entails:
16                           - detecting remote version changes (a.k.a. out-of-band updates), and
17                           - remotely deleted objects (out-of-band deletions (*));
18                             the option requires remote backends supporting some form of versioning (e.g., object version, checksum, and/or ETag);
19                           see related:
20                                (*) options: --cached; --latest
21                                commands:    'ais get --latest'; 'ais cp --sync'; 'ais prefetch --latest'
22    --dont-add             List remote bucket without adding it to cluster's metadata - e.g.:
23                             - let's say, s3://abc is accessible but not present in the cluster (e.g., 'ais ls' returns error);
24                             - then, if we ask aistore to list remote buckets: `ais ls s3://abc --all'
25                               the bucket will be added (in effect, it'll be created);
26                             - to prevent this from happening, either use this '--dont-add' flag or run 'ais evict' command later
27    --dont-wait            When _summarizing_ buckets do not wait for the respective job to finish -
28                           use the job's UUID to query the results interactively
29    --inv-id value         Bucket inventory ID (optional; by default, we use bucket name as the bucket's inventory ID)
30    --inv-name value       Bucket inventory name (optional; system default name is '.inventory')
31    --inventory            List objects using _bucket inventory_ (docs/s3compat.md); requires s3:// backend; will provide significant performance
32                           boost when used with very large s3 buckets; e.g. usage:
33                             1) 'ais ls s3://abc --inventory'
34                             2) 'ais ls s3://abc --inventory --paged --prefix=subdir/'
35                           (see also: docs/s3compat.md)
36    --limit value          The maximum number of objects to list, get, or otherwise handle (0 - unlimited; see also '--max-pages'),
37                           e.g.:
38                           - 'ais ls gs://abc/dir --limit 1234 --cached --props size,custom,atime'  - list no more than 1234 objects
39                           - 'ais get gs://abc /dev/null --prefix dir --limit 1234'                 - get --/--
40                           - 'ais scrub gs://abc/dir --limit 1234'                                  - scrub --/-- (default: 0)
41    --max-pages value      Maximum number of pages to display (see also '--page-size' and '--limit')
42                           e.g.: 'ais ls az://abc --paged --page-size 123 --max-pages 7 (default: 0)
43    --name-only            Faster request to retrieve only the names of objects (if defined, '--props' flag will be ignored)
44    --no-dirs              Do not return virtual subdirectories (applies to remote buckets only)
45    --no-footers, -F       Display tables without footers
46    --no-headers, -H       Display tables without headers
47    --non-recursive, --nr  Non-recursive operation, e.g.:
48                           - 'ais ls gs://bucket/prefix --nr'   - list objects and/or virtual subdirectories with names starting with the specified prefix;
49                           - 'ais ls gs://bucket/prefix/ --nr'  - list contained objects and/or immediately nested virtual subdirectories _without_ recursing into the latter;
50                           - 'ais prefetch s3://bck/abcd --nr'  - prefetch a single named object (see 'ais prefetch --help' for details);
51                           - 'ais rmo gs://bucket/prefix --nr'  - remove a single object with the specified name (see 'ais rmo --help' for details)
52    --page-size value      Maximum number of object names per page; when the flag is omitted or 0
53                           the maximum is defined by the corresponding backend; see also '--max-pages' and '--paged' (default: 0)
54    --paged                List objects page by page - one page at a time (see also '--page-size' and '--limit')
55                           note: recommended for use with very large buckets
56    --prefix value         List objects with names starting with the specified prefix, e.g.:
57                           '--prefix a/b/c' - list virtual directory a/b/c and/or objects from the virtual directory
58                           a/b that have their names (relative to this directory) starting with the letter 'c'
59    --props value          Comma-separated list of object properties including name, size, version, copies, and more; e.g.:
60                           --props all
61                           --props name,size,cached
62                           --props "ec, copies, custom, location"
63    --refresh value        Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
64                           valid time units: ns, us (or µs), ms, s (default), m, h
65    --regex value          Regular expression; use it to match either bucket names or objects in a given bucket, e.g.:
66                           ais ls --regex "(m|n)"         - match buckets such as ais://nnn, s3://mmm, etc.;
67                           ais ls ais://nnn --regex "^A"  - match object names starting with letter A
68    --show-unmatched       List also objects that were not matched by regex and/or template (range)
69    --silent               Server-side flag, an indication for aistore _not_ to log assorted errors (e.g., HEAD(object) failures)
70    --skip-lookup          Do not execute HEAD(bucket) request to lookup remote bucket and its properties; possible usage scenarios include:
71                            1) adding remote bucket to aistore without first checking the bucket's accessibility
72                               (e.g., to configure the bucket's aistore properties with alternative security profile and/or endpoint)
73                            2) listing public-access Cloud buckets where certain operations (e.g., 'HEAD(bucket)') may be disallowed
74    --start-after value    List bucket's content alphabetically starting with the first name _after_ the specified
75    --summary              Show object numbers, bucket sizes, and used capacity;
76                           note: applies only to buckets and objects that are _present_ in the cluster
77    --template value       Template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
78                           (with optional steps and gaps), e.g.:
79                           --template "" # (an empty or '*' template matches everything)
80                           --template 'dir/subdir/'
81                           --template 'shard-{1000..9999}.tar'
82                           --template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
83                           and similarly, when specifying files and directories:
84                           --template '/home/dir/subdir/'
85                           --template "/abc/prefix-{0010..9999..2}-suffix"
86    --units value          Show statistics and/or parse command-line specified sizes using one of the following units of measurement:
87                           iec - IEC format, e.g.: KiB, MiB, GiB (default)
88                           si  - SI (metric) format, e.g.: KB, MB, GB
89                           raw - do not convert to (or from) human-readable format
90    --help, -h             Show help

Examples

1 $ ais archive ls ais://bck/arch.tar
2 NAME                SIZE
3 arch.tar            4.5KiB
4     arch.tar/obj1   1.0KiB
5     arch.tar/obj2   1.0KiB

Example: use ‘—prefix’ that crosses shard boundary

For starters, we recursively archive all aistore docs:

1 $ ais put docs ais://A.tar --archive -r

To list a virtual subdirectory inside this newly created shard (e.g.):

1 $ ais archive ls ais://nnn --prefix "A.tar/tutorials"
2 NAME                                             SIZE
3     A.tar/tutorials/README.md                    561B
4     A.tar/tutorials/etl/compute_md5.md           8.28KiB
5     A.tar/tutorials/etl/etl_imagenet_pytorch.md  4.16KiB
6     A.tar/tutorials/etl/etl_webdataset.md        3.97KiB
7 Listed: 4 names

or, same:

1 $ ais ls ais://nnn --prefix "A.tar/tutorials" --archive
2 NAME                                             SIZE
3     A.tar/tutorials/README.md                    561B
4     A.tar/tutorials/etl/compute_md5.md           8.28KiB
5     A.tar/tutorials/etl/etl_imagenet_pytorch.md  4.16KiB
6     A.tar/tutorials/etl/etl_webdataset.md        3.97KiB
7 Listed: 4 names

Get archived content (‘ais archive get`)

1 $ ais archive get --help
2 NAME:
3    ais archive get - Get a shard and extract its content; get an archived file;
4               write the content locally with destination options including: filename, directory, STDOUT ('-'), or '/dev/null' (discard);
5               assorted options further include:
6               - '--prefix' to get multiple shards in one shot (empty prefix for the entire bucket);
7               - '--progress' and '--refresh' to watch progress bar;
8               - '-v' to produce verbose output when getting multiple objects.
9    'ais archive get' examples:
10               - ais://abc/trunk-0123.tar.lz4 /tmp/out - get and extract entire shard to /tmp/out/trunk/*
11               - ais://abc/trunk-0123.tar.lz4 --archpath file45.jpeg /tmp/out - extract one named file
12               - ais://abc/trunk-0123.tar.lz4/file45.jpeg /tmp/out - same as above (and note that '--archpath' is implied)
13               - ais://abc/trunk-0123.tar.lz4/file45 /tmp/out/file456.new - same as above, with destination explicitly (re)named
14    'ais archive get' multi-selection examples:
15               - ais://abc/trunk-0123.tar 111.tar --archregx=jpeg --archmode=suffix - return 111.tar with all *.jpeg files from a given shard
16               - ais://abc/trunk-0123.tar 222.tar --archregx=file45 --archmode=wdskey - return 222.tar with all file45.* files --/--
17               - ais://abc/trunk-0123.tar 333.tar --archregx=subdir/ --archmode=prefix - 333.tar with all subdir/* files --/--
18 
19 USAGE:
20    ais archive get BUCKET[/SHARD_NAME] [OUT_FILE|OUT_DIR|-] [command options]
21 
22 OPTIONS:
23    archive         List archived content (see docs/archive.md for details)
24    archmime        Expected format (mime type) of an object ("shard") formatted as .tar, .tgz or .tar.gz, .zip, .tar.lz4;
25                    especially usable for shards with non-standard extensions
26    archmode        Enumerated "matching mode" that tells aistore how to handle '--archregx', one of:
27                      * regexp - general purpose regular expression;
28                      * prefix - matching filename starts with;
29                      * suffix - matching filename ends with;
30                      * substr - matching filename contains;
31                      * wdskey - WebDataset key
32                    example:
33                      given a shard containing (subdir/aaa.jpg, subdir/aaa.json, subdir/bbb.jpg, subdir/bbb.json, ...)
34                      and wdskey=subdir/aaa, aistore will match and return (subdir/aaa.jpg, subdir/aaa.json)
35    archpath        Extract the specified file from an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4;
36                    see also: '--archregx'
37    archregx        Specifies prefix, suffix, substring, WebDataset key, _or_ a general-purpose regular expression
38                    to select possibly multiple matching archived files from a given shard;
39                    is used in combination with '--archmode' ("matching mode") option
40    blob-download   Utilize built-in blob-downloader (and the corresponding alternative datapath) to read very large remote objects
41    cached          Only get in-cluster objects, i.e., objects from the respective remote bucket that are present ("cached") in the cluster
42    checksum        Validate checksum
43    chunk-size      Chunk size in IEC or SI units, or "raw" bytes (e.g.: 4mb, 1MiB, 1048576, 128k; see '--units')
44    encode-objname  Encode object names that contain special symbols (; : ' " < > / \ | ? #) that may otherwise break shell parsing or URL interpretation
45    extract,x       Extract all files from archive(s)
46    inv-id          Bucket inventory ID (optional; by default, we use bucket name as the bucket's inventory ID)
47    inv-name        Bucket inventory name (optional; system default name is '.inventory')
48    inventory       List objects using _bucket inventory_ (docs/s3compat.md); requires s3:// backend; will provide significant performance
49                    boost when used with very large s3 buckets; e.g. usage:
50                      1) 'ais ls s3://abc --inventory'
51                      2) 'ais ls s3://abc --inventory --paged --prefix=subdir/'
52                    (see also: docs/s3compat.md)
53    latest          Check in-cluster metadata and, possibly, GET, download, prefetch, or otherwise copy the latest object version
54                    from the associated remote bucket;
55                    the option provides operation-level control over object versioning (and version synchronization)
56                    without the need to change the corresponding bucket configuration: 'versioning.validate_warm_get';
57                    see also:
58                      - 'ais show bucket BUCKET versioning'
59                      - 'ais bucket props set BUCKET versioning'
60                      - 'ais ls --check-versions'
61                    supported commands include:
62                      - 'ais cp', 'ais prefetch', 'ais get'
63    limit           The maximum number of objects to list, get, or otherwise handle (0 - unlimited; see also '--max-pages'),
64                    e.g.:
65                    - 'ais ls gs://abc/dir --limit 1234 --cached --props size,custom,atime'  - list no more than 1234 objects
66                    - 'ais get gs://abc /dev/null --prefix dir --limit 1234'                 - get --/--
67                    - 'ais scrub gs://abc/dir --limit 1234'                                  - scrub --/--
68    num-workers     Number of concurrent blob-downloading workers (readers); system default when omitted or zero
69    prefix          Get objects with names starting with the specified prefix, e.g.:
70                    '--prefix a/b/c' - get objects from the virtual directory a/b/c and objects from the virtual directory
71                    a/b that have their names (relative to this directory) starting with 'c';
72                    '--prefix ""' - get entire bucket (all objects)
73    progress        Show progress bar(s) and progress of execution in real time
74    refresh         Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
75                    valid time units: ns, us (or µs), ms, s (default), m, h
76    silent          Server-side flag, an indication for aistore _not_ to log assorted errors (e.g., HEAD(object) failures)
77    skip-lookup     Do not execute HEAD(bucket) request to lookup remote bucket and its properties; possible usage scenarios include:
78                     1) adding remote bucket to aistore without first checking the bucket's accessibility
79                        (e.g., to configure the bucket's aistore properties with alternative security profile and/or endpoint)
80                     2) listing public-access Cloud buckets where certain operations (e.g., 'HEAD(bucket)') may be disallowed
81    units           Show statistics and/or parse command-line specified sizes using one of the following units of measurement:
82                    iec - IEC format, e.g.: KiB, MiB, GiB (default)
83                    si  - SI (metric) format, e.g.: KB, MB, GB
84                    raw - do not convert to (or from) human-readable format
85    verbose,v       Verbose output
86    yes,y           Assume 'yes' to all questions
87    help, h         Show help

Example: extract one file

1 $ ais archive get ais://dst/A.tar.gz /tmp/w --archpath 111.ext1
2 GET 111.ext1 from ais://dst/A.tar.gz as "/tmp/w/111.ext1" (12.56KiB)
3 
4 $ ls /tmp/w
5 111.ext1

Alternatively, use fully qualified name:

1 $ ais archive get ais://dst/A.tar.gz/111.ext1 /tmp/w

Example: extract one file using its fully-qualified name::

1 $ ais archive get ais://nnn/A.tar/tutorials/README.md /tmp/out

Example: extract all files from a single shard

Let’s say, we have a certain shard in a certain bucket:

1 $ ais ls ais://dst --archive
2 NAME                     SIZE
3 A.tar.gz                 5.18KiB
4     A.tar.gz/111.ext1    12.56KiB
5     A.tar.gz/222.ext1    12.56KiB
6     A.tar.gz/333.ext2    12.56KiB

We can then go ahead to GET and extract it to local directory, e.g.:

1 $ ais archive get ais://dst/A.tar.gz /tmp/www --extract
2 GET A.tar.gz from ais://dst as "/tmp/www/A.tar.gz" (5.18KiB) and extract to /tmp/www/A/
3 
4 $ ls /tmp/www/A
5 111.ext1  222.ext1  333.ext2

But here’s an alternative syntax to achieve the same:

1 $ ais get ais://dst --archive --prefix A.tar.gz /tmp/www

or even:

1 $ ais get ais://dst --archive --prefix A.tar.gz /tmp/www --progress --refresh 1 -y
2 
3 GET 51 objects from ais://dst/tmp/ggg (total size 1.08MiB)
4 Objects:                   51/51 [==============================================================] 100 %
5 Total size:  1.08 MiB / 1.08 MiB [==============================================================] 100 %

The difference is that:

in the first case we ask for a specific shard,
while in the second (and third) we filter bucket’s content using a certain prefix
and the fact (the convention) that archived filenames are prefixed with their parent (shard) name.

Example: extract all files from all shards (with a given prefix)

Let’s say, there’s a bucket ais://dst with a virtual directory abc/ that in turn contains:

1 $ ais ls ais://dst
2 NAME             SIZE
3 A.tar.gz         5.18KiB
4 B.tar.lz4        247.88KiB
5 C.tar.zip        4.15KiB
6 D.tar            2.00KiB

Next, we GET and extract them all in the respective sub-directories (note --verbose option):

1 $ ais archive get ais://dst /tmp/w --prefix "" --extract -v
2 
3 GET 4 objects from ais://dst to /tmp/w (total size 259.21KiB) [Y/N]: y
4 GET D.tar from ais://dst as "/tmp/w/D.tar" (2.00KiB) and extract as /tmp/w/D
5 GET A.tar.gz from ais://dst as "/tmp/w/A.tar.gz" (5.18KiB) and extract as /tmp/w/A
6 GET C.tar.zip from ais://dst as "/tmp/w/C.tar.zip" (4.15KiB) and extract as /tmp/w/C
7 GET B.tar.lz4 from ais://dst as "/tmp/w/B.tar.lz4" (247.88KiB) and extract as /tmp/w/B

Example: use ‘—prefix’ that crosses shard boundary

For starters, we recursively archive all aistore docs:

1 $ ais put docs ais://A.tar --archive -r

To list a virtual subdirectory inside this newly created shard (e.g.):

1 $ ais archive ls ais://nnn --prefix A.tar/tutorials
2 NAME                                             SIZE
3     A.tar/tutorials/README.md                    561B
4     A.tar/tutorials/etl/compute_md5.md           8.28KiB
5     A.tar/tutorials/etl/etl_imagenet_pytorch.md  4.16KiB
6     A.tar/tutorials/etl/etl_webdataset.md        3.97KiB
7 Listed: 4 names

Now, extract matching files from the bucket to /tmp/out:

1 $ ais archive get ais://nnn --prefix A.tar/tutorials /tmp/out
2 GET 6 objects from ais://nnn/tmp/out (total size 17.81MiB) [Y/N]: y
3 
4 $ ls -al /tmp/out/tutorials/
5 total 20
6 drwxr-x--- 4 root root 4096 May 13 20:05 ./
7 drwxr-xr-x 3 root root 4096 May 13 20:05 ../
8 drwxr-x--- 2 root root 4096 May 13 20:05 etl/
9 -rw-r--r-- 1 root root  561 May 13 20:05 README.md
10 drwxr-x--- 2 root root 4096 May 13 20:05 various/

Get archived content: multiple selection

Generally, both single and multi-selection from a given source shard is realized using one of the following 4 (four) options:

1    --archpath value     extract the specified file from an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4;
2                         see also: '--archregx'
3    --archmime value     expected format (mime type) of an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4;
4                         especially usable for shards with non-standard extensions
5    --archregx value     string that specifies prefix, suffix, substring, WebDataset key, _or_ a general-purpose regular expression
6                         to select possibly multiple matching archived files from a given shard;
7                         is used in combination with '--archmode' ("matching mode") option
8    --archmode value     enumerated "matching mode" that tells aistore how to handle '--archregx', one of:
9                           * regexp - general purpose regular expression;
10                           * prefix - matching filename starts with;
11                           * suffix - matching filename ends with;
12                           * substr - matching filename contains;
13                           * wdskey - WebDataset key
14                         example:
15                           given a shard containing (subdir/aaa.jpg, subdir/aaa.json, subdir/bbb.jpg, subdir/bbb.json, ...)
16                           and wdskey=subdir/aaa, aistore will match and return (subdir/aaa.jpg, subdir/aaa.json)

In particular, ‘—archregx’ and ‘—archmode’ pair defines multiple selection that can be further demonstrated on the following examples.

But first, note that in all multi-selection cases, the result is (currently) invariably formatted as .TAR (that contains the aforementioned selection).

Example: suffix match

Select all *.jpeg files from a given shard and return them all as 111.tar:

1 $ ais archive get ais://abc/trunk-0123.tar 111.tar --archregx=jpeg --archmode=suffix

Example: WebDataset key

Select all files that have a given WebDataset key; return the result as 222.tar:

1 $ ais archive get ais://abc/trunk-0123.tar 222.tar --archregx=file45 --archmode=wdskey

Example: prefix match

Similar to the above except that in this case ‘—archregx’ value specifies virtual subdirectory inside a given named shard:

1 $ ais archive get ais://abc/trunk-0123.tar 333.tar --archregx=subdir/ --archmode=prefix

Generate shards

ais archive gen-shards "BUCKET/TEMPLATE.EXT"

Put randomly generated shards that can be used for dSort testing. The TEMPLATE must be bash-like brace expansion (see examples) and .EXT must be one of: .tar, .tar.gz.

Warning: Remember to always quote the argument ("...") otherwise the brace expansion will happen in terminal.

Options

1 $ ais archive gen-shards --help
2 
3 NAME:
4    ais archive gen-shards - Generate random (.tar, .tgz or .tar.gz, .zip, .tar.lz4)-formatted objects ("shards"), e.g.:
5               - gen-shards 'ais://bucket1/shard-{001..999}.tar' - write 999 random shards (default sizes) to ais://bucket1
6               - gen-shards "gs://bucket2/shard-{01..20..2}.tgz" - 10 random gzipped tarfiles to Cloud bucket
7               (notice quotation marks in both cases)
8 
9 USAGE:
10    ais archive gen-shards "BUCKET/TEMPLATE.EXT" [command options]
11 
12 OPTIONS:
13    --cleanup            Remove old bucket and create it again (warning: removes the entire content of the old bucket)
14    --fcount value       Number of files in a shard (default: 5)
15    --fext value         Comma-separated list of file extensions (default ".test"), e.g.:
16                         --fext .mp3
17                         --fext '.mp3,.json,.cls' (or, same: ".mp3,  .json,  .cls")
18    --fsize value        Size of the files in a shard (default: "1024")
19    --num-workers value  Limits the number of shards created concurrently (default: 10)
20    --tform value        TAR file format selection (one of "Unknown", "USTAR", "PAX", or "GNU")
21    --help, -h           Show help

Examples

Generate shards with varying numbers of files and file sizes

Generate 10 shards each containing 100 files of size 256KB and put them inside ais://dsort-testing bucket (creates it if it does not exist). Shards will be named: shard-0.tar, shard-1.tar, …, shard-9.tar.

1 $ ais archive gen-shards "ais://dsort-testing/shard-{0..9}.tar" --fsize 262144 --fcount 100
2 Shards created: 10/10 [==============================================================] 100 %
3 $ ais ls ais://dsort-testing
4 NAME		SIZE		VERSION
5 shard-0.tar	25.05MiB	1
6 shard-1.tar	25.05MiB	1
7 shard-2.tar	25.05MiB	1
8 shard-3.tar	25.05MiB	1
9 shard-4.tar	25.05MiB	1
10 shard-5.tar	25.05MiB	1
11 shard-6.tar	25.05MiB	1
12 shard-7.tar	25.05MiB	1
13 shard-8.tar	25.05MiB	1
14 shard-9.tar	25.05MiB	1

Generate shards using custom naming template

Generates 100 shards each containing 5 files of size 256KB and put them inside dsort-testing bucket. Shards will be compressed and named: super_shard_000_last.tgz, super_shard_001_last.tgz, …, super_shard_099_last.tgz

1 $ ais archive gen-shards "ais://dsort-testing/super_shard_{000..099}_last.tar" --fsize 262144 --cleanup
2 Shards created: 100/100 [==============================================================] 100 %
3 $ ais ls ais://dsort-testing
4 NAME				SIZE	VERSION
5 super_shard_000_last.tgz	1.25MiB	1
6 super_shard_001_last.tgz	1.25MiB	1
7 super_shard_002_last.tgz	1.25MiB	1
8 super_shard_003_last.tgz	1.25MiB	1
9 super_shard_004_last.tgz	1.25MiB	1
10 super_shard_005_last.tgz	1.25MiB	1
11 super_shard_006_last.tgz	1.25MiB	1
12 super_shard_007_last.tgz	1.25MiB	1
13 ...

Multi-extension example

1 $ ais archive gen-shards 'ais://nnn/shard-{01..99}.tar' -fext ".mp3,  .json,  .cls"
2 
3 $ ais archive ls ais://nnn | head -n 20
4 NAME                                             SIZE
5 shard-01.tar                                     23.50KiB
6     shard-01.tar/541701ae863f76d0f7e0-0.cls      1.00KiB
7     shard-01.tar/541701ae863f76d0f7e0-0.json     1.00KiB
8     shard-01.tar/541701ae863f76d0f7e0-0.mp3      1.00KiB
9     shard-01.tar/8f8c5fa2934c90138833-1.cls      1.00KiB
10     shard-01.tar/8f8c5fa2934c90138833-1.json     1.00KiB
11     shard-01.tar/8f8c5fa2934c90138833-1.mp3      1.00KiB
12     shard-01.tar/9a42bd12d810d890ea86-3.cls      1.00KiB
13     shard-01.tar/9a42bd12d810d890ea86-3.json     1.00KiB
14     shard-01.tar/9a42bd12d810d890ea86-3.mp3      1.00KiB
15     shard-01.tar/c5bd7c7a34e12ebf3ad3-2.cls      1.00KiB
16     shard-01.tar/c5bd7c7a34e12ebf3ad3-2.json     1.00KiB
17     shard-01.tar/c5bd7c7a34e12ebf3ad3-2.mp3      1.00KiB
18     shard-01.tar/f13522533ecafbad4fe5-4.cls      1.00KiB
19     shard-01.tar/f13522533ecafbad4fe5-4.json     1.00KiB
20     shard-01.tar/f13522533ecafbad4fe5-4.mp3      1.00KiB
21 shard-02.tar                                     23.50KiB
22     shard-02.tar/095e6ae644ff4fd1778b-7.cls      1.00KiB
23     shard-02.tar/095e6ae644ff4fd1778b-7.json     1.00KiB
24 ...

In this document:

Commands to read, write, extract, and list archives - objects formatted as TAR, TGZ (or TAR.GZ) , ZIP, or TAR.LZ4.

For the most recently updated list of supported archival formats, please refer to this source.

The corresponding subset of CLI commands starts with ais archive, from where you can <TAB-TAB> to the actual (reading, writing, etc.) operation.

Subcommands

The corresponding subset of subcommands starts with ais archive, from where you can <TAB-TAB> to the actual operation:

ais archive ls		# List archived content
ais archive put		# Archive files from local filesystem
ais archive get		# Extract content from archives
ais archive bucket	# Archive objects from a bucket
ais archive gen-shards	# Generate test archives

For detailed help on any command, use the --help option:

Archive files and directories (`ais archive put`)

Archive multiple files.

1 $ ais archive put --help
2 
3 NAME:
4    ais archive put - Archive a file, a directory, or multiple files and/or directories as
5      (.tar, .tgz or .tar.gz, .zip, .tar.lz4)-formatted object - aka "shard".
6      Both APPEND (to an existing shard) and PUT (a new version of the shard) are supported.
7      Examples:
8      - 'local-file s3://q/shard-00123.tar.lz4 --append --archpath name-in-archive' - append file to a given shard,
9         optionally, rename it (inside archive) as specified;
10      - 'local-file s3://q/shard-00123.tar.lz4 --append-or-put --archpath name-in-archive' - append file to a given shard if exists,
11         otherwise, create a new shard (and name it shard-00123.tar.lz4, as specified);
12      - 'src-dir gs://w/shard-999.zip --append' - archive entire 'src-dir' directory; iff the destination .zip doesn't exist create a new one;
13      - '"sys, docs" ais://dst/CCC.tar --dry-run -y -r --archpath ggg/' - dry-run to recursively archive two directories.
14      Tips:
15      - use '--dry-run' if in doubt;
16      - to archive objects from a ais:// or remote bucket, run 'ais archive bucket' (see --help for details).
17 
18 USAGE:
19    ais archive put [-|FILE|DIRECTORY[/PATTERN]] BUCKET/SHARD_NAME [command options]
20 
21 OPTIONS:
22    --append             Add newly archived content to the destination object ("archive", "shard") that must exist
23    --append-or-put      Append to an existing destination object ("archive", "shard") iff exists; otherwise PUT a new archive (shard);
24                         note that PUT (with subsequent overwrite if the destination exists) is the default behavior when the flag is omitted
25    --archpath value     Filename in an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4
26    --cont-on-err        Keep running archiving xaction (job) in presence of errors in a any given multi-object transaction
27    --dry-run            Preview the results without really running the action
28    --include-src-dir    Prefix the names of archived files with the (root) source directory
29    --list value         Comma-separated list of object or file names, e.g.:
30                         --list 'o1,o2,o3'
31                         --list "abc/1.tar, abc/1.cls, abc/1.jpeg"
32                         or, when listing files and/or directories:
33                         --list "/home/docs, /home/abc/1.tar, /home/abc/1.jpeg"
34    --num-workers value  Number of concurrent client-side workers (to execute PUT or append requests);
35                         use (-1) to indicate single-threaded serial execution (ie., no workers);
36                         any positive value will be adjusted _not_ to exceed twice the number of client CPUs (default: 10)
37    --progress           Show progress bar(s) and progress of execution in real time
38    --recursive, -r      Recursive operation
39    --refresh value      Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
40                         valid time units: ns, us (or µs), ms, s (default), m, h
41    --skip-vc            Skip loading object metadata (and the associated checksum & version related processing)
42    --template value     Template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
43                         (with optional steps and gaps), e.g.:
44                         --template "" # (an empty or '*' template matches everything)
45                         --template 'dir/subdir/'
46                         --template 'shard-{1000..9999}.tar'
47                         --template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
48                         and similarly, when specifying files and directories:
49                         --template '/home/dir/subdir/'
50                         --template "/abc/prefix-{0010..9999..2}-suffix"
51    --timeout value      Maximum time to wait for a job to finish; if omitted: wait forever or until Ctrl-C;
52                         valid time units: ns, us (or µs), ms, s (default), m, h
53    --units value        Show statistics and/or parse command-line specified sizes using one of the following units of measurement:
54                         iec - IEC format, e.g.: KiB, MiB, GiB (default)
55                         si  - SI (metric) format, e.g.: KB, MB, GB
56                         raw - do not convert to (or from) human-readable format
57    --verbose, -v        Verbose output
58    --wait               Wait for an asynchronous operation to finish (optionally, use '--timeout' to limit the waiting time)
59    --yes, -y            Assume 'yes' to all questions
60    --help, -h           Show help

The operation accepts either an explicitly defined list or template-defined range of file names (to archive).

NOTE:

ais archive put works with locally accessible (source) files and shall not be confused with ais archive bucket command (below).

Also, note that ais put command with its --archpath option provides an alternative way to archive multiple objects:

For the most recently updated list of supported archival formats, please see:

this source.

Append files and directories to an existing archive

APPEND operation provides for appending files to existing archives (shards). As such, APPEND is a variation of PUT (above) with additional two boolean flags:

Name	Description
`--append`	add newly archived content to the destination object (“archive”, “shard”) that must exist
`--append-or-put`	if destination object (“archive”, “shard”) exists append to it, otherwise archive a new one

Example 1: add file to archive

step 1. create archive (by archiving a given source dir)

1 $ ais archive put sys ais://nnn/sys.tar.lz4
2 Warning: multi-file 'archive put' operation requires either '--append' or '--append-or-put' option
3 Proceed to execute 'archive put --append-or-put'? [Y/N]: y
4 Files to upload:
5 EXTENSION        COUNT   SIZE
6 .go              11      17.46KiB
7 TOTAL            11      17.46KiB
8 APPEND 11 files (one directory, non-recursive) => ais://nnn/sys.tar.lz4? [Y/N]: y
9 Done

step 2. add a single file to existing archive

1 $ ais archive put README.md ais://nnn/sys.tar.lz4 --archpath=docs/README --append
2 APPEND README.md to ais://nnn/sys.tar.lz4 as "docs/README"

step 3. list entire bucket with an `--archive` option to show all archived entries

1 $ ais ls ais://nnn --archive
2 NAME                             SIZE
3 sys.tar.lz4                      16.84KiB
4     sys.tar.lz4/api_linux.go     1.07KiB
5     sys.tar.lz4/cpu.go           1.07KiB
6     sys.tar.lz4/cpu_darwin.go    802B
7     sys.tar.lz4/cpu_linux.go     2.14KiB
8     sys.tar.lz4/docs/README      13.85KiB
9     sys.tar.lz4/mem.go           1.16KiB
10     sys.tar.lz4/mem_darwin.go    2.04KiB
11     sys.tar.lz4/mem_linux.go     2.81KiB
12     sys.tar.lz4/proc.go          784B
13     sys.tar.lz4/proc_darwin.go   369B
14     sys.tar.lz4/proc_linux.go    1.40KiB
15     sys.tar.lz4/sys_test.go      3.88KiB
16 Listed: 13 names

Alternatively, use regex to select:

1 $ ais ls ais://nnn --archive --regex docs
2 NAME                             SIZE
3     sys.tar.lz4/docs/README      13.85KiB

Example 2: use `--template` flag to add source files

Generally, the --template option combines (an optional) prefix and/or one or more ranges (e.g., bash brace expansions).

In this case, the template we use is a simple prefix with no ranges.

1 $ ls -l /tmp/w
2 total 32
3 -rw-r--r-- 1 root root 14180 Dec 11 18:18 111
4 -rw-r--r-- 1 root root 14180 Dec 11 18:18 222
5 
6 $ ais archive put ais://nnn/shard-001.tar --template /tmp/w/ --append
7 Files to upload:
8 EXTENSION        COUNT   SIZE
9                  2       27.70KiB
10 TOTAL            2       27.70KiB
11 APPEND 2 files (one directory, non-recursive) => ais://nnn/shard-001.tar? [Y/N]: y
12 Done
13 $ ais ls ais://nnn/shard-001.tar --archive
14 NAME                                             SIZE
15 shard-001.tar                                    37.50KiB
16     shard-001.tar/111                            13.85KiB
17     shard-001.tar/222                            13.85KiB
18     shard-001.tar/23ed44d8bf3952a35484-1.test    1.00KiB
19     shard-001.tar/452938788ebb87807043-4.test    1.00KiB
20     shard-001.tar/7925bc9b5eb1daa12ed0-2.test    1.00KiB
21     shard-001.tar/8264574b49bd188a4b27-0.test    1.00KiB
22     shard-001.tar/f1f25e52c5edd768e0ec-3.test    1.00KiB

Example 3: add file to archive

In this example, we assume that arch.tar already exists.

1 # contents _before_:
2 $ ais archive ls ais://abc/arch.tar
3 NAME                SIZE
4 arch.tar            4.5KiB
5     arch.tar/obj1   1.0KiB
6     arch.tar/obj2   1.0KiB
7 
8 # add file to existing archive:
9 $ ais archive put /tmp/obj1.bin ais://abc/arch.tar --archpath bin/obj1
10 APPEND "/tmp/obj1.bin" to object "ais://abc/arch.tar[/bin/obj1]"
11 
12 # contents _after_:
13 $ ais archive ls ais://abc/arch.tar
14 NAME                    SIZE
15 arch.tar                6KiB
16     arch.tar/bin/obj1   2.KiB
17     arch.tar/obj1       1.0KiB
18     arch.tar/obj2       1.0KiB

Example 4: add file to archive

1 # contents _before_:
2 
3 $ ais archive ls ais://nnn/shard-2.tar
4 NAME                                             SIZE
5 shard-2.tar                                      5.50KiB
6     shard-2.tar/0379f37cbb0415e7eaea-3.test      1.00KiB
7     shard-2.tar/504c563d14852368575b-5.test      1.00KiB
8     shard-2.tar/c7bcb7014568b5e7d13b-4.test      1.00KiB
9 
10 # append and note that `--archpath` can specify a fully qualified destination name
11 
12 $ ais archive put LICENSE ais://nnn/shard-2.tar --archpath shard-2.tar/license.test
13 APPEND "/go/src/github.com/NVIDIA/aistore/LICENSE" to "ais://nnn/shard-2.tar[/shard-2.tar/license.test]"
14 
15 # contents _after_:
16 $ ais archive ls ais://nnn/shard-2.tar
17 NAME                                             SIZE
18 shard-2.tar                                      7.50KiB
19     shard-2.tar/0379f37cbb0415e7eaea-3.test      1.00KiB
20     shard-2.tar/504c563d14852368575b-5.test      1.00KiB
21     shard-2.tar/c7bcb7014568b5e7d13b-4.test      1.00KiB
22     shard-2.tar/license.test                     1.05KiB

Archive multiple objects (`ais archive bucket`)

The ais archive bucket command creates archives (shards) from multiple objects stored in a bucket. This is a powerful operation that:

Takes objects from a specified source bucket
Archives them as a single shard in the specified destination bucket

Features

Source and destination buckets can be the same or different
Supports multiple selection methods (lists, templates, prefixes)
Supports all backend providers
Supports various archival formats (.tar, .tar.gz/.tgz, .zip, .tar.lz4)
Executes asynchronously and in parallel across all AIS nodes for maximum performance

Usage

1 $ ais archive bucket --help
2 NAME:
3    ais archive bucket - Archive selected or matching objects from SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] as
4    (.tar, .tgz or .tar.gz, .zip, .tar.lz4)-formatted object (a.k.a. "shard"):
5      - 'ais archive bucket ais://src gs://dst/a.tar.lz4 --template "trunk-{001..997}"'       - archive (prefix+range) matching objects from ais://src;
6      - 'ais archive bucket "ais://src/trunk-{001..997}" gs://dst/a.tar.lz4'                  - same as above (notice double quotes);
7      - 'ais archive bucket "ais://src/trunk-{998..999}" gs://dst/a.tar.lz4 --append-or-put'  - add two more objects to an existing shard;
8      - 'ais archive bucket s3://src/trunk-00 ais://dst/b.tar'                                - archive "trunk-00" prefixed objects from an s3 bucket as a given TAR destinati
9 on
10 
11 USAGE:
12    ais archive bucket SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] DST_BUCKET/SHARD_NAME [command options]
13 
14 OPTIONS:
15    append-or-put     Append to an existing destination object ("archive", "shard") iff exists; otherwise PUT a new archive (shard);
16                      note that PUT (with subsequent overwrite if the destination exists) is the default behavior when the flag is omitted
17    cont-on-err       Keep running archiving xaction (job) in presence of errors in a any given multi-object transaction
18    dry-run           Preview the results without really running the action
19    include-src-bck   Prefix the names of archived files with the source bucket name
20    list              Comma-separated list of object or file names, e.g.:
21                      --list 'o1,o2,o3'
22                      --list "abc/1.tar, abc/1.cls, abc/1.jpeg"
23                      or, when listing files and/or directories:
24                      --list "/home/docs, /home/abc/1.tar, /home/abc/1.jpeg"
25    non-recursive,nr  Non-recursive operation, e.g.:
26                      - 'ais ls gs://bck/sub --nr'               - list objects and/or virtual subdirectories with names starting with the specified prefix;
27                      - 'ais ls gs://bck/sub/ --nr'              - list only immediate contents of 'sub/' subdirectory (non-recursive);
28                      - 'ais prefetch s3://bck/abcd --nr'        - prefetch a single named object;
29                      - 'ais evict gs://bck/sub/ --nr'           - evict only immediate contents of 'sub/' subdirectory (non-recursive);
30                      - 'ais evict gs://bck --prefix=sub/ --nr'  - same as above
31    prefix            Select virtual directories or objects with names starting with the specified prefix, e.g.:
32                      '--prefix a/b/c'   - matches names 'a/b/c/d', 'a/b/cdef', and similar;
33                      '--prefix a/b/c/'  - only matches objects from the virtual directory a/b/c/
34    skip-lookup       Skip checking source and destination buckets' existence (trading off extra lookup for performance)
35 
36    template   Template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
37               (with optional steps and gaps), e.g.:
38               --template "" # (an empty or '*' template matches everything)
39               --template 'dir/subdir/'
40               --template 'shard-{1000..9999}.tar'
41               --template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
42               and similarly, when specifying files and directories:
43               --template '/home/dir/subdir/'
44               --template "/abc/prefix-{0010..9999..2}-suffix"
45    wait       Wait for an asynchronous operation to finish (optionally, use '--timeout' to limit the waiting time)
46    help, h    Show help

Selection Options

The command provides multiple ways to select objects for archiving:

Template matching: Use patterns with ranges to select objects

ais archive bucket ais://src gs://dst/a.tar --template "trunk-{001..997}"

List-based selection: Specify a comma-separated list of objects

ais archive bucket ais://bck/arch.tar --list obj1,obj2,obj3

Prefix-based selection: Select objects that share a common prefix

ais archive bucket ais://src ais://dst/archive.tar --prefix data/logs/

Non-Recursive Option (—nr)

The --nr (or --non-recursive) flag limits the scope of the archiving operation to only include objects at the specified directory level, without descending into subdirectories.

Examples with Non-Recursive Flag

Archive only the files directly in a directory (not its subdirectories):

ais archive bucket ais://nnn/aaa/ ais://dst/archive.tar --nr

This will only archive objects directly in the aaa/ directory, skipping any objects in subdirectories like aaa/bbb/.

Compare with recursive archiving (default behavior):

ais archive bucket ais://nnn/aaa/ ais://dst/archive.tar

This will archive all objects under the aaa/ prefix, including those in subdirectories like aaa/bbb/.

Visual Example

For a bucket with this structure:

ais://nnn
├── aaa/
│   ├── 777
│   ├── 888
│   ├── 999
│   └── bbb/
│       ├── 111
│       ├── 222
│       └── 333

With --nr flag:

ais archive bucket ais://nnn/aaa/ ais://dst/f.tar --nr

Result:

f.tar
├── aaa/777
├── aaa/888
└── aaa/999

Without --nr flag:

ais archive bucket ais://nnn/aaa/ ais://dst/g.tar

Result:

g.tar
├── aaa/777
├── aaa/888
├── aaa/999
├── aaa/bbb/111
├── aaa/bbb/222
└── aaa/bbb/333

Additional Options

--append-or-put: Append to an existing archive if it exists; otherwise create new
--cont-on-err: Continue archiving despite errors in multi-object transactions
--dry-run: Preview the results without executing
--include-src-bck: Prefix archived file names with the source bucket name
--skip-lookup: Skip checking bucket existence for better performance
--wait: Wait for the asynchronous operation to complete

Complete Examples

1. Archive objects with a specific prefix, non-recursively:

1 $ ais archive bucket s3://src-bck/aaa/ ais://dst/example.tar --nr
2 Archived s3://src-bck/aaa/ => ais://dst/example.tar
3 
4 $ ais ls ais://dst/example.tar --archive
5 NAME				  SIZE
6 example.tar			  106.00KiB
7     example.tar/aaa/777	          16.84KiB
8     example.tar/aaa/888	          16.84KiB
9     example.tar/aaa/999	          16.84KiB
10     example.tar/aaa/trunk-777     16.84KiB
11     example.tar/aaa/trunk-888     16.84KiB
12     example.tar/aaa/trunk-999     16.84KiB

2. Archive objects using a template range:

1 $ ais archive bucket ais://src ais://dst/range.tar --template "obj-{0..9}"
2 Archiving "ais://dst/range.tar" ...
3 
4 $ ais archive ls ais://dst/range.tar
5 
6 NAME                     SIZE
7 range.tar                92.60KiB
8     range.tar/obj-0      9.26KiB
9     range.tar/obj-1      9.26KiB
10     ...
11     range.tar/obj-9      9.26KiB

3. Incrementally append to an existing archive:

1 $ ais archive bucket ais://bck/incremental.tar --template "obj{1..3}"
2 Archived "ais://bck/incremental.tar" ...
3 
4 $ ais archive bucket ais://bck/incremental.tar --template "obj{4..5}" --append
5 Archived "ais://bck/incremental.tar"

4. Archive a list of objects from a given bucket:

1 $ ais archive bucket ais://bck/arch.tar --list obj1,obj2
2 Archiving "ais://bck/arch.tar" ...

Resulting ais://bck/arch.tar contains objects ais://bck/obj1 and ais://bck/obj2.

5. Archive objects from a different bucket, use template (range):

1 $ ais archive bucket ais://src ais://dst/arch.tar --template "obj-{0..9}"
2 
3 Archiving "ais://dst/arch.tar" ...

ais://dst/arch.tar now contains 10 objects from bucket ais://src: ais://src/obj-0, ais://src/obj-1 … ais://src/obj-9.

6. Archive 3 objects and then append 2 more:

1 $ ais archive bucket ais://bck/arch1.tar --template "obj{1..3}"
2 Archived "ais://bck/arch1.tar" ...
3 $ ais archive ls ais://bck/arch1.tar
4 NAME                     SIZE
5 arch1.tar                31.00KiB
6     arch1.tar/obj1       9.26KiB
7     arch1.tar/obj2       9.26KiB
8     arch1.tar/obj3       9.26KiB
9 
10 $ ais archive bucket ais://bck/arch1.tar --template "obj{4..5}" --append
11 Archived "ais://bck/arch1.tar"
12 
13 $ ais archive ls ais://bck/arch1.tar
14 NAME                     SIZE
15 arch1.tar                51.00KiB
16     arch1.tar/obj1       9.26KiB
17     arch1.tar/obj2       9.26KiB
18     arch1.tar/obj3       9.26KiB
19     arch1.tar/obj4       9.26KiB
20     arch1.tar/obj5       9.26KiB

Notes

ais archive bucket must not be confused with ais archive put
- archive bucket archives objects in the cluster
  - more precisely, objects accessible by the cluster
- archive put archives files from your local or locally accessible (NFS, SMB) directories
The operation runs asynchronously
- use --wait to wait for completion; see --help for details
When using the --nr (non-recursive) flag, only the immediate contents of the specified virtual directory is archived
For more information on multi-object operations, please see:
- operations on lists and ranges documentation.

List archived content

1 NAME:
2    ais archive ls - list archived content (supported formats: .tar, .tgz or .tar.gz, .zip, .tar.lz4)
3 
4 USAGE:
5    ais archive ls BUCKET[/SHARD_NAME] [command options]

List archived content as a tree with archive (“shard”) name as a root and archived files as leaves. Filenames are always sorted alphabetically.

Options

1 $ ais archive ls --help
2 
3 NAME:
4    ais archive ls - List archived content (supported formats: .tar, .tgz or .tar.gz, .zip, .tar.lz4)
5 
6 USAGE:
7    ais archive ls BUCKET[/SHARD_NAME] [command options]
8 
9 OPTIONS:
10    --all                  Depending on the context, list:
11                           - all buckets, including accessible (visible) remote buckets that are not in-cluster
12                           - all objects in a given accessible (visible) bucket, including remote objects and misplaced copies
13    --cached               Only list in-cluster objects, i.e., objects from the respective remote bucket that are present ("cached") in the cluster
14    --count-only           Print only the resulting number of listed objects and elapsed time
15    --diff                 Perform a bidirectional diff between in-cluster and remote content, which further entails:
16                           - detecting remote version changes (a.k.a. out-of-band updates), and
17                           - remotely deleted objects (out-of-band deletions (*));
18                             the option requires remote backends supporting some form of versioning (e.g., object version, checksum, and/or ETag);
19                           see related:
20                                (*) options: --cached; --latest
21                                commands:    'ais get --latest'; 'ais cp --sync'; 'ais prefetch --latest'
22    --dont-add             List remote bucket without adding it to cluster's metadata - e.g.:
23                             - let's say, s3://abc is accessible but not present in the cluster (e.g., 'ais ls' returns error);
24                             - then, if we ask aistore to list remote buckets: `ais ls s3://abc --all'
25                               the bucket will be added (in effect, it'll be created);
26                             - to prevent this from happening, either use this '--dont-add' flag or run 'ais evict' command later
27    --dont-wait            When _summarizing_ buckets do not wait for the respective job to finish -
28                           use the job's UUID to query the results interactively
29    --inv-id value         Bucket inventory ID (optional; by default, we use bucket name as the bucket's inventory ID)
30    --inv-name value       Bucket inventory name (optional; system default name is '.inventory')
31    --inventory            List objects using _bucket inventory_ (docs/s3compat.md); requires s3:// backend; will provide significant performance
32                           boost when used with very large s3 buckets; e.g. usage:
33                             1) 'ais ls s3://abc --inventory'
34                             2) 'ais ls s3://abc --inventory --paged --prefix=subdir/'
35                           (see also: docs/s3compat.md)
36    --limit value          The maximum number of objects to list, get, or otherwise handle (0 - unlimited; see also '--max-pages'),
37                           e.g.:
38                           - 'ais ls gs://abc/dir --limit 1234 --cached --props size,custom,atime'  - list no more than 1234 objects
39                           - 'ais get gs://abc /dev/null --prefix dir --limit 1234'                 - get --/--
40                           - 'ais scrub gs://abc/dir --limit 1234'                                  - scrub --/-- (default: 0)
41    --max-pages value      Maximum number of pages to display (see also '--page-size' and '--limit')
42                           e.g.: 'ais ls az://abc --paged --page-size 123 --max-pages 7 (default: 0)
43    --name-only            Faster request to retrieve only the names of objects (if defined, '--props' flag will be ignored)
44    --no-dirs              Do not return virtual subdirectories (applies to remote buckets only)
45    --no-footers, -F       Display tables without footers
46    --no-headers, -H       Display tables without headers
47    --non-recursive, --nr  Non-recursive operation, e.g.:
48                           - 'ais ls gs://bucket/prefix --nr'   - list objects and/or virtual subdirectories with names starting with the specified prefix;
49                           - 'ais ls gs://bucket/prefix/ --nr'  - list contained objects and/or immediately nested virtual subdirectories _without_ recursing into the latter;
50                           - 'ais prefetch s3://bck/abcd --nr'  - prefetch a single named object (see 'ais prefetch --help' for details);
51                           - 'ais rmo gs://bucket/prefix --nr'  - remove a single object with the specified name (see 'ais rmo --help' for details)
52    --page-size value      Maximum number of object names per page; when the flag is omitted or 0
53                           the maximum is defined by the corresponding backend; see also '--max-pages' and '--paged' (default: 0)
54    --paged                List objects page by page - one page at a time (see also '--page-size' and '--limit')
55                           note: recommended for use with very large buckets
56    --prefix value         List objects with names starting with the specified prefix, e.g.:
57                           '--prefix a/b/c' - list virtual directory a/b/c and/or objects from the virtual directory
58                           a/b that have their names (relative to this directory) starting with the letter 'c'
59    --props value          Comma-separated list of object properties including name, size, version, copies, and more; e.g.:
60                           --props all
61                           --props name,size,cached
62                           --props "ec, copies, custom, location"
63    --refresh value        Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
64                           valid time units: ns, us (or µs), ms, s (default), m, h
65    --regex value          Regular expression; use it to match either bucket names or objects in a given bucket, e.g.:
66                           ais ls --regex "(m|n)"         - match buckets such as ais://nnn, s3://mmm, etc.;
67                           ais ls ais://nnn --regex "^A"  - match object names starting with letter A
68    --show-unmatched       List also objects that were not matched by regex and/or template (range)
69    --silent               Server-side flag, an indication for aistore _not_ to log assorted errors (e.g., HEAD(object) failures)
70    --skip-lookup          Do not execute HEAD(bucket) request to lookup remote bucket and its properties; possible usage scenarios include:
71                            1) adding remote bucket to aistore without first checking the bucket's accessibility
72                               (e.g., to configure the bucket's aistore properties with alternative security profile and/or endpoint)
73                            2) listing public-access Cloud buckets where certain operations (e.g., 'HEAD(bucket)') may be disallowed
74    --start-after value    List bucket's content alphabetically starting with the first name _after_ the specified
75    --summary              Show object numbers, bucket sizes, and used capacity;
76                           note: applies only to buckets and objects that are _present_ in the cluster
77    --template value       Template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
78                           (with optional steps and gaps), e.g.:
79                           --template "" # (an empty or '*' template matches everything)
80                           --template 'dir/subdir/'
81                           --template 'shard-{1000..9999}.tar'
82                           --template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
83                           and similarly, when specifying files and directories:
84                           --template '/home/dir/subdir/'
85                           --template "/abc/prefix-{0010..9999..2}-suffix"
86    --units value          Show statistics and/or parse command-line specified sizes using one of the following units of measurement:
87                           iec - IEC format, e.g.: KiB, MiB, GiB (default)
88                           si  - SI (metric) format, e.g.: KB, MB, GB
89                           raw - do not convert to (or from) human-readable format
90    --help, -h             Show help

Examples

1 $ ais archive ls ais://bck/arch.tar
2 NAME                SIZE
3 arch.tar            4.5KiB
4     arch.tar/obj1   1.0KiB
5     arch.tar/obj2   1.0KiB

Example: use ‘—prefix’ that crosses shard boundary

For starters, we recursively archive all aistore docs:

1 $ ais put docs ais://A.tar --archive -r

To list a virtual subdirectory inside this newly created shard (e.g.):

1 $ ais archive ls ais://nnn --prefix "A.tar/tutorials"
2 NAME                                             SIZE
3     A.tar/tutorials/README.md                    561B
4     A.tar/tutorials/etl/compute_md5.md           8.28KiB
5     A.tar/tutorials/etl/etl_imagenet_pytorch.md  4.16KiB
6     A.tar/tutorials/etl/etl_webdataset.md        3.97KiB
7 Listed: 4 names

or, same:

1 $ ais ls ais://nnn --prefix "A.tar/tutorials" --archive
2 NAME                                             SIZE
3     A.tar/tutorials/README.md                    561B
4     A.tar/tutorials/etl/compute_md5.md           8.28KiB
5     A.tar/tutorials/etl/etl_imagenet_pytorch.md  4.16KiB
6     A.tar/tutorials/etl/etl_webdataset.md        3.97KiB
7 Listed: 4 names

Get archived content (‘ais archive get`)

1 $ ais archive get --help
2 NAME:
3    ais archive get - Get a shard and extract its content; get an archived file;
4               write the content locally with destination options including: filename, directory, STDOUT ('-'), or '/dev/null' (discard);
5               assorted options further include:
6               - '--prefix' to get multiple shards in one shot (empty prefix for the entire bucket);
7               - '--progress' and '--refresh' to watch progress bar;
8               - '-v' to produce verbose output when getting multiple objects.
9    'ais archive get' examples:
10               - ais://abc/trunk-0123.tar.lz4 /tmp/out - get and extract entire shard to /tmp/out/trunk/*
11               - ais://abc/trunk-0123.tar.lz4 --archpath file45.jpeg /tmp/out - extract one named file
12               - ais://abc/trunk-0123.tar.lz4/file45.jpeg /tmp/out - same as above (and note that '--archpath' is implied)
13               - ais://abc/trunk-0123.tar.lz4/file45 /tmp/out/file456.new - same as above, with destination explicitly (re)named
14    'ais archive get' multi-selection examples:
15               - ais://abc/trunk-0123.tar 111.tar --archregx=jpeg --archmode=suffix - return 111.tar with all *.jpeg files from a given shard
16               - ais://abc/trunk-0123.tar 222.tar --archregx=file45 --archmode=wdskey - return 222.tar with all file45.* files --/--
17               - ais://abc/trunk-0123.tar 333.tar --archregx=subdir/ --archmode=prefix - 333.tar with all subdir/* files --/--
18 
19 USAGE:
20    ais archive get BUCKET[/SHARD_NAME] [OUT_FILE|OUT_DIR|-] [command options]
21 
22 OPTIONS:
23    archive         List archived content (see docs/archive.md for details)
24    archmime        Expected format (mime type) of an object ("shard") formatted as .tar, .tgz or .tar.gz, .zip, .tar.lz4;
25                    especially usable for shards with non-standard extensions
26    archmode        Enumerated "matching mode" that tells aistore how to handle '--archregx', one of:
27                      * regexp - general purpose regular expression;
28                      * prefix - matching filename starts with;
29                      * suffix - matching filename ends with;
30                      * substr - matching filename contains;
31                      * wdskey - WebDataset key
32                    example:
33                      given a shard containing (subdir/aaa.jpg, subdir/aaa.json, subdir/bbb.jpg, subdir/bbb.json, ...)
34                      and wdskey=subdir/aaa, aistore will match and return (subdir/aaa.jpg, subdir/aaa.json)
35    archpath        Extract the specified file from an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4;
36                    see also: '--archregx'
37    archregx        Specifies prefix, suffix, substring, WebDataset key, _or_ a general-purpose regular expression
38                    to select possibly multiple matching archived files from a given shard;
39                    is used in combination with '--archmode' ("matching mode") option
40    blob-download   Utilize built-in blob-downloader (and the corresponding alternative datapath) to read very large remote objects
41    cached          Only get in-cluster objects, i.e., objects from the respective remote bucket that are present ("cached") in the cluster
42    checksum        Validate checksum
43    chunk-size      Chunk size in IEC or SI units, or "raw" bytes (e.g.: 4mb, 1MiB, 1048576, 128k; see '--units')
44    encode-objname  Encode object names that contain special symbols (; : ' " < > / \ | ? #) that may otherwise break shell parsing or URL interpretation
45    extract,x       Extract all files from archive(s)
46    inv-id          Bucket inventory ID (optional; by default, we use bucket name as the bucket's inventory ID)
47    inv-name        Bucket inventory name (optional; system default name is '.inventory')
48    inventory       List objects using _bucket inventory_ (docs/s3compat.md); requires s3:// backend; will provide significant performance
49                    boost when used with very large s3 buckets; e.g. usage:
50                      1) 'ais ls s3://abc --inventory'
51                      2) 'ais ls s3://abc --inventory --paged --prefix=subdir/'
52                    (see also: docs/s3compat.md)
53    latest          Check in-cluster metadata and, possibly, GET, download, prefetch, or otherwise copy the latest object version
54                    from the associated remote bucket;
55                    the option provides operation-level control over object versioning (and version synchronization)
56                    without the need to change the corresponding bucket configuration: 'versioning.validate_warm_get';
57                    see also:
58                      - 'ais show bucket BUCKET versioning'
59                      - 'ais bucket props set BUCKET versioning'
60                      - 'ais ls --check-versions'
61                    supported commands include:
62                      - 'ais cp', 'ais prefetch', 'ais get'
63    limit           The maximum number of objects to list, get, or otherwise handle (0 - unlimited; see also '--max-pages'),
64                    e.g.:
65                    - 'ais ls gs://abc/dir --limit 1234 --cached --props size,custom,atime'  - list no more than 1234 objects
66                    - 'ais get gs://abc /dev/null --prefix dir --limit 1234'                 - get --/--
67                    - 'ais scrub gs://abc/dir --limit 1234'                                  - scrub --/--
68    num-workers     Number of concurrent blob-downloading workers (readers); system default when omitted or zero
69    prefix          Get objects with names starting with the specified prefix, e.g.:
70                    '--prefix a/b/c' - get objects from the virtual directory a/b/c and objects from the virtual directory
71                    a/b that have their names (relative to this directory) starting with 'c';
72                    '--prefix ""' - get entire bucket (all objects)
73    progress        Show progress bar(s) and progress of execution in real time
74    refresh         Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
75                    valid time units: ns, us (or µs), ms, s (default), m, h
76    silent          Server-side flag, an indication for aistore _not_ to log assorted errors (e.g., HEAD(object) failures)
77    skip-lookup     Do not execute HEAD(bucket) request to lookup remote bucket and its properties; possible usage scenarios include:
78                     1) adding remote bucket to aistore without first checking the bucket's accessibility
79                        (e.g., to configure the bucket's aistore properties with alternative security profile and/or endpoint)
80                     2) listing public-access Cloud buckets where certain operations (e.g., 'HEAD(bucket)') may be disallowed
81    units           Show statistics and/or parse command-line specified sizes using one of the following units of measurement:
82                    iec - IEC format, e.g.: KiB, MiB, GiB (default)
83                    si  - SI (metric) format, e.g.: KB, MB, GB
84                    raw - do not convert to (or from) human-readable format
85    verbose,v       Verbose output
86    yes,y           Assume 'yes' to all questions
87    help, h         Show help

Example: extract one file

1 $ ais archive get ais://dst/A.tar.gz /tmp/w --archpath 111.ext1
2 GET 111.ext1 from ais://dst/A.tar.gz as "/tmp/w/111.ext1" (12.56KiB)
3 
4 $ ls /tmp/w
5 111.ext1

Alternatively, use fully qualified name:

1 $ ais archive get ais://dst/A.tar.gz/111.ext1 /tmp/w

Example: extract one file using its fully-qualified name::

1 $ ais archive get ais://nnn/A.tar/tutorials/README.md /tmp/out

Example: extract all files from a single shard

Let’s say, we have a certain shard in a certain bucket:

1 $ ais ls ais://dst --archive
2 NAME                     SIZE
3 A.tar.gz                 5.18KiB
4     A.tar.gz/111.ext1    12.56KiB
5     A.tar.gz/222.ext1    12.56KiB
6     A.tar.gz/333.ext2    12.56KiB

We can then go ahead to GET and extract it to local directory, e.g.:

1 $ ais archive get ais://dst/A.tar.gz /tmp/www --extract
2 GET A.tar.gz from ais://dst as "/tmp/www/A.tar.gz" (5.18KiB) and extract to /tmp/www/A/
3 
4 $ ls /tmp/www/A
5 111.ext1  222.ext1  333.ext2

But here’s an alternative syntax to achieve the same:

1 $ ais get ais://dst --archive --prefix A.tar.gz /tmp/www

or even:

1 $ ais get ais://dst --archive --prefix A.tar.gz /tmp/www --progress --refresh 1 -y
2 
3 GET 51 objects from ais://dst/tmp/ggg (total size 1.08MiB)
4 Objects:                   51/51 [==============================================================] 100 %
5 Total size:  1.08 MiB / 1.08 MiB [==============================================================] 100 %

The difference is that:

in the first case we ask for a specific shard,
while in the second (and third) we filter bucket’s content using a certain prefix
and the fact (the convention) that archived filenames are prefixed with their parent (shard) name.

Example: extract all files from all shards (with a given prefix)

Let’s say, there’s a bucket ais://dst with a virtual directory abc/ that in turn contains:

1 $ ais ls ais://dst
2 NAME             SIZE
3 A.tar.gz         5.18KiB
4 B.tar.lz4        247.88KiB
5 C.tar.zip        4.15KiB
6 D.tar            2.00KiB

Next, we GET and extract them all in the respective sub-directories (note --verbose option):

1 $ ais archive get ais://dst /tmp/w --prefix "" --extract -v
2 
3 GET 4 objects from ais://dst to /tmp/w (total size 259.21KiB) [Y/N]: y
4 GET D.tar from ais://dst as "/tmp/w/D.tar" (2.00KiB) and extract as /tmp/w/D
5 GET A.tar.gz from ais://dst as "/tmp/w/A.tar.gz" (5.18KiB) and extract as /tmp/w/A
6 GET C.tar.zip from ais://dst as "/tmp/w/C.tar.zip" (4.15KiB) and extract as /tmp/w/C
7 GET B.tar.lz4 from ais://dst as "/tmp/w/B.tar.lz4" (247.88KiB) and extract as /tmp/w/B

Example: use ‘—prefix’ that crosses shard boundary

For starters, we recursively archive all aistore docs:

1 $ ais put docs ais://A.tar --archive -r

To list a virtual subdirectory inside this newly created shard (e.g.):

1 $ ais archive ls ais://nnn --prefix A.tar/tutorials
2 NAME                                             SIZE
3     A.tar/tutorials/README.md                    561B
4     A.tar/tutorials/etl/compute_md5.md           8.28KiB
5     A.tar/tutorials/etl/etl_imagenet_pytorch.md  4.16KiB
6     A.tar/tutorials/etl/etl_webdataset.md        3.97KiB
7 Listed: 4 names

Now, extract matching files from the bucket to /tmp/out:

1 $ ais archive get ais://nnn --prefix A.tar/tutorials /tmp/out
2 GET 6 objects from ais://nnn/tmp/out (total size 17.81MiB) [Y/N]: y
3 
4 $ ls -al /tmp/out/tutorials/
5 total 20
6 drwxr-x--- 4 root root 4096 May 13 20:05 ./
7 drwxr-xr-x 3 root root 4096 May 13 20:05 ../
8 drwxr-x--- 2 root root 4096 May 13 20:05 etl/
9 -rw-r--r-- 1 root root  561 May 13 20:05 README.md
10 drwxr-x--- 2 root root 4096 May 13 20:05 various/

Get archived content: multiple selection

Generally, both single and multi-selection from a given source shard is realized using one of the following 4 (four) options:

1    --archpath value     extract the specified file from an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4;
2                         see also: '--archregx'
3    --archmime value     expected format (mime type) of an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4;
4                         especially usable for shards with non-standard extensions
5    --archregx value     string that specifies prefix, suffix, substring, WebDataset key, _or_ a general-purpose regular expression
6                         to select possibly multiple matching archived files from a given shard;
7                         is used in combination with '--archmode' ("matching mode") option
8    --archmode value     enumerated "matching mode" that tells aistore how to handle '--archregx', one of:
9                           * regexp - general purpose regular expression;
10                           * prefix - matching filename starts with;
11                           * suffix - matching filename ends with;
12                           * substr - matching filename contains;
13                           * wdskey - WebDataset key
14                         example:
15                           given a shard containing (subdir/aaa.jpg, subdir/aaa.json, subdir/bbb.jpg, subdir/bbb.json, ...)
16                           and wdskey=subdir/aaa, aistore will match and return (subdir/aaa.jpg, subdir/aaa.json)

In particular, ‘—archregx’ and ‘—archmode’ pair defines multiple selection that can be further demonstrated on the following examples.

But first, note that in all multi-selection cases, the result is (currently) invariably formatted as .TAR (that contains the aforementioned selection).

Example: suffix match

Select all *.jpeg files from a given shard and return them all as 111.tar:

1 $ ais archive get ais://abc/trunk-0123.tar 111.tar --archregx=jpeg --archmode=suffix

Example: WebDataset key

Select all files that have a given WebDataset key; return the result as 222.tar:

1 $ ais archive get ais://abc/trunk-0123.tar 222.tar --archregx=file45 --archmode=wdskey

Example: prefix match

Similar to the above except that in this case ‘—archregx’ value specifies virtual subdirectory inside a given named shard:

1 $ ais archive get ais://abc/trunk-0123.tar 333.tar --archregx=subdir/ --archmode=prefix

Generate shards

ais archive gen-shards "BUCKET/TEMPLATE.EXT"

Put randomly generated shards that can be used for dSort testing. The TEMPLATE must be bash-like brace expansion (see examples) and .EXT must be one of: .tar, .tar.gz.

Warning: Remember to always quote the argument ("...") otherwise the brace expansion will happen in terminal.

Options

1 $ ais archive gen-shards --help
2 
3 NAME:
4    ais archive gen-shards - Generate random (.tar, .tgz or .tar.gz, .zip, .tar.lz4)-formatted objects ("shards"), e.g.:
5               - gen-shards 'ais://bucket1/shard-{001..999}.tar' - write 999 random shards (default sizes) to ais://bucket1
6               - gen-shards "gs://bucket2/shard-{01..20..2}.tgz" - 10 random gzipped tarfiles to Cloud bucket
7               (notice quotation marks in both cases)
8 
9 USAGE:
10    ais archive gen-shards "BUCKET/TEMPLATE.EXT" [command options]
11 
12 OPTIONS:
13    --cleanup            Remove old bucket and create it again (warning: removes the entire content of the old bucket)
14    --fcount value       Number of files in a shard (default: 5)
15    --fext value         Comma-separated list of file extensions (default ".test"), e.g.:
16                         --fext .mp3
17                         --fext '.mp3,.json,.cls' (or, same: ".mp3,  .json,  .cls")
18    --fsize value        Size of the files in a shard (default: "1024")
19    --num-workers value  Limits the number of shards created concurrently (default: 10)
20    --tform value        TAR file format selection (one of "Unknown", "USTAR", "PAX", or "GNU")
21    --help, -h           Show help

Examples

Generate shards with varying numbers of files and file sizes

1 $ ais archive gen-shards "ais://dsort-testing/shard-{0..9}.tar" --fsize 262144 --fcount 100
2 Shards created: 10/10 [==============================================================] 100 %
3 $ ais ls ais://dsort-testing
4 NAME		SIZE		VERSION
5 shard-0.tar	25.05MiB	1
6 shard-1.tar	25.05MiB	1
7 shard-2.tar	25.05MiB	1
8 shard-3.tar	25.05MiB	1
9 shard-4.tar	25.05MiB	1
10 shard-5.tar	25.05MiB	1
11 shard-6.tar	25.05MiB	1
12 shard-7.tar	25.05MiB	1
13 shard-8.tar	25.05MiB	1
14 shard-9.tar	25.05MiB	1

Generate shards using custom naming template

1 $ ais archive gen-shards "ais://dsort-testing/super_shard_{000..099}_last.tar" --fsize 262144 --cleanup
2 Shards created: 100/100 [==============================================================] 100 %
3 $ ais ls ais://dsort-testing
4 NAME				SIZE	VERSION
5 super_shard_000_last.tgz	1.25MiB	1
6 super_shard_001_last.tgz	1.25MiB	1
7 super_shard_002_last.tgz	1.25MiB	1
8 super_shard_003_last.tgz	1.25MiB	1
9 super_shard_004_last.tgz	1.25MiB	1
10 super_shard_005_last.tgz	1.25MiB	1
11 super_shard_006_last.tgz	1.25MiB	1
12 super_shard_007_last.tgz	1.25MiB	1
13 ...

Multi-extension example

1 $ ais archive gen-shards 'ais://nnn/shard-{01..99}.tar' -fext ".mp3,  .json,  .cls"
2 
3 $ ais archive ls ais://nnn | head -n 20
4 NAME                                             SIZE
5 shard-01.tar                                     23.50KiB
6     shard-01.tar/541701ae863f76d0f7e0-0.cls      1.00KiB
7     shard-01.tar/541701ae863f76d0f7e0-0.json     1.00KiB
8     shard-01.tar/541701ae863f76d0f7e0-0.mp3      1.00KiB
9     shard-01.tar/8f8c5fa2934c90138833-1.cls      1.00KiB
10     shard-01.tar/8f8c5fa2934c90138833-1.json     1.00KiB
11     shard-01.tar/8f8c5fa2934c90138833-1.mp3      1.00KiB
12     shard-01.tar/9a42bd12d810d890ea86-3.cls      1.00KiB
13     shard-01.tar/9a42bd12d810d890ea86-3.json     1.00KiB
14     shard-01.tar/9a42bd12d810d890ea86-3.mp3      1.00KiB
15     shard-01.tar/c5bd7c7a34e12ebf3ad3-2.cls      1.00KiB
16     shard-01.tar/c5bd7c7a34e12ebf3ad3-2.json     1.00KiB
17     shard-01.tar/c5bd7c7a34e12ebf3ad3-2.mp3      1.00KiB
18     shard-01.tar/f13522533ecafbad4fe5-4.cls      1.00KiB
19     shard-01.tar/f13522533ecafbad4fe5-4.json     1.00KiB
20     shard-01.tar/f13522533ecafbad4fe5-4.mp3      1.00KiB
21 shard-02.tar                                     23.50KiB
22     shard-02.tar/095e6ae644ff4fd1778b-7.cls      1.00KiB
23     shard-02.tar/095e6ae644ff4fd1778b-7.json     1.00KiB
24 ...

1	$ ais archive put --help
2
3	NAME:
4	ais archive put - Archive a file, a directory, or multiple files and/or directories as
5	(.tar, .tgz or .tar.gz, .zip, .tar.lz4)-formatted object - aka "shard".
6	Both APPEND (to an existing shard) and PUT (a new version of the shard) are supported.
7	Examples:
8	- 'local-file s3://q/shard-00123.tar.lz4 --append --archpath name-in-archive' - append file to a given shard,
9	optionally, rename it (inside archive) as specified;
10	- 'local-file s3://q/shard-00123.tar.lz4 --append-or-put --archpath name-in-archive' - append file to a given shard if exists,
11	otherwise, create a new shard (and name it shard-00123.tar.lz4, as specified);
12	- 'src-dir gs://w/shard-999.zip --append' - archive entire 'src-dir' directory; iff the destination .zip doesn't exist create a new one;
13	- '"sys, docs" ais://dst/CCC.tar --dry-run -y -r --archpath ggg/' - dry-run to recursively archive two directories.
14	Tips:
15	- use '--dry-run' if in doubt;
16	- to archive objects from a ais:// or remote bucket, run 'ais archive bucket' (see --help for details).
17
18	USAGE:
19	ais archive put [-\|FILE\|DIRECTORY[/PATTERN]] BUCKET/SHARD_NAME [command options]
20
21	OPTIONS:
22	--append Add newly archived content to the destination object ("archive", "shard") that must exist
23	--append-or-put Append to an existing destination object ("archive", "shard") iff exists; otherwise PUT a new archive (shard);
24	note that PUT (with subsequent overwrite if the destination exists) is the default behavior when the flag is omitted
25	--archpath value Filename in an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4
26	--cont-on-err Keep running archiving xaction (job) in presence of errors in a any given multi-object transaction
27	--dry-run Preview the results without really running the action
28	--include-src-dir Prefix the names of archived files with the (root) source directory
29	--list value Comma-separated list of object or file names, e.g.:
30	--list 'o1,o2,o3'
31	--list "abc/1.tar, abc/1.cls, abc/1.jpeg"
32	or, when listing files and/or directories:
33	--list "/home/docs, /home/abc/1.tar, /home/abc/1.jpeg"
34	--num-workers value Number of concurrent client-side workers (to execute PUT or append requests);
35	use (-1) to indicate single-threaded serial execution (ie., no workers);
36	any positive value will be adjusted _not_ to exceed twice the number of client CPUs (default: 10)
37	--progress Show progress bar(s) and progress of execution in real time
38	--recursive, -r Recursive operation
39	--refresh value Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
40	valid time units: ns, us (or µs), ms, s (default), m, h
41	--skip-vc Skip loading object metadata (and the associated checksum & version related processing)
42	--template value Template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
43	(with optional steps and gaps), e.g.:
44	--template "" # (an empty or '*' template matches everything)
45	--template 'dir/subdir/'
46	--template 'shard-{1000..9999}.tar'
47	--template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
48	and similarly, when specifying files and directories:
49	--template '/home/dir/subdir/'
50	--template "/abc/prefix-{0010..9999..2}-suffix"
51	--timeout value Maximum time to wait for a job to finish; if omitted: wait forever or until Ctrl-C;
52	valid time units: ns, us (or µs), ms, s (default), m, h
53	--units value Show statistics and/or parse command-line specified sizes using one of the following units of measurement:
54	iec - IEC format, e.g.: KiB, MiB, GiB (default)
55	si - SI (metric) format, e.g.: KB, MB, GB
56	raw - do not convert to (or from) human-readable format
57	--verbose, -v Verbose output
58	--wait Wait for an asynchronous operation to finish (optionally, use '--timeout' to limit the waiting time)
59	--yes, -y Assume 'yes' to all questions
60	--help, -h Show help

1	$ ais archive put sys ais://nnn/sys.tar.lz4
2	Warning: multi-file 'archive put' operation requires either '--append' or '--append-or-put' option
3	Proceed to execute 'archive put --append-or-put'? [Y/N]: y
4	Files to upload:
5	EXTENSION COUNT SIZE
6	.go 11 17.46KiB
7	TOTAL 11 17.46KiB
8	APPEND 11 files (one directory, non-recursive) => ais://nnn/sys.tar.lz4? [Y/N]: y
9	Done

1	$ ais archive put README.md ais://nnn/sys.tar.lz4 --archpath=docs/README --append
2	APPEND README.md to ais://nnn/sys.tar.lz4 as "docs/README"

1	$ ais ls ais://nnn --archive
2	NAME SIZE
3	sys.tar.lz4 16.84KiB
4	sys.tar.lz4/api_linux.go 1.07KiB
5	sys.tar.lz4/cpu.go 1.07KiB
6	sys.tar.lz4/cpu_darwin.go 802B
7	sys.tar.lz4/cpu_linux.go 2.14KiB
8	sys.tar.lz4/docs/README 13.85KiB
9	sys.tar.lz4/mem.go 1.16KiB
10	sys.tar.lz4/mem_darwin.go 2.04KiB
11	sys.tar.lz4/mem_linux.go 2.81KiB
12	sys.tar.lz4/proc.go 784B
13	sys.tar.lz4/proc_darwin.go 369B
14	sys.tar.lz4/proc_linux.go 1.40KiB
15	sys.tar.lz4/sys_test.go 3.88KiB
16	Listed: 13 names

1	$ ais ls ais://nnn --archive --regex docs
2	NAME SIZE
3	sys.tar.lz4/docs/README 13.85KiB

1	$ ls -l /tmp/w
2	total 32
3	-rw-r--r-- 1 root root 14180 Dec 11 18:18 111
4	-rw-r--r-- 1 root root 14180 Dec 11 18:18 222
5
6	$ ais archive put ais://nnn/shard-001.tar --template /tmp/w/ --append
7	Files to upload:
8	EXTENSION COUNT SIZE
9	2 27.70KiB
10	TOTAL 2 27.70KiB
11	APPEND 2 files (one directory, non-recursive) => ais://nnn/shard-001.tar? [Y/N]: y
12	Done
13	$ ais ls ais://nnn/shard-001.tar --archive
14	NAME SIZE
15	shard-001.tar 37.50KiB
16	shard-001.tar/111 13.85KiB
17	shard-001.tar/222 13.85KiB
18	shard-001.tar/23ed44d8bf3952a35484-1.test 1.00KiB
19	shard-001.tar/452938788ebb87807043-4.test 1.00KiB
20	shard-001.tar/7925bc9b5eb1daa12ed0-2.test 1.00KiB
21	shard-001.tar/8264574b49bd188a4b27-0.test 1.00KiB
22	shard-001.tar/f1f25e52c5edd768e0ec-3.test 1.00KiB

1	# contents _before_:
2	$ ais archive ls ais://abc/arch.tar
3	NAME SIZE
4	arch.tar 4.5KiB
5	arch.tar/obj1 1.0KiB
6	arch.tar/obj2 1.0KiB
7
8	# add file to existing archive:
9	$ ais archive put /tmp/obj1.bin ais://abc/arch.tar --archpath bin/obj1
10	APPEND "/tmp/obj1.bin" to object "ais://abc/arch.tar[/bin/obj1]"
11
12	# contents _after_:
13	$ ais archive ls ais://abc/arch.tar
14	NAME SIZE
15	arch.tar 6KiB
16	arch.tar/bin/obj1 2.KiB
17	arch.tar/obj1 1.0KiB
18	arch.tar/obj2 1.0KiB

1	# contents _before_:
2
3	$ ais archive ls ais://nnn/shard-2.tar
4	NAME SIZE
5	shard-2.tar 5.50KiB
6	shard-2.tar/0379f37cbb0415e7eaea-3.test 1.00KiB
7	shard-2.tar/504c563d14852368575b-5.test 1.00KiB
8	shard-2.tar/c7bcb7014568b5e7d13b-4.test 1.00KiB
9
10	# append and note that `--archpath` can specify a fully qualified destination name
11
12	$ ais archive put LICENSE ais://nnn/shard-2.tar --archpath shard-2.tar/license.test
13	APPEND "/go/src/github.com/NVIDIA/aistore/LICENSE" to "ais://nnn/shard-2.tar[/shard-2.tar/license.test]"
14
15	# contents _after_:
16	$ ais archive ls ais://nnn/shard-2.tar
17	NAME SIZE
18	shard-2.tar 7.50KiB
19	shard-2.tar/0379f37cbb0415e7eaea-3.test 1.00KiB
20	shard-2.tar/504c563d14852368575b-5.test 1.00KiB
21	shard-2.tar/c7bcb7014568b5e7d13b-4.test 1.00KiB
22	shard-2.tar/license.test 1.05KiB

1	$ ais archive bucket --help
2	NAME:
3	ais archive bucket - Archive selected or matching objects from SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] as
4	(.tar, .tgz or .tar.gz, .zip, .tar.lz4)-formatted object (a.k.a. "shard"):
5	- 'ais archive bucket ais://src gs://dst/a.tar.lz4 --template "trunk-{001..997}"' - archive (prefix+range) matching objects from ais://src;
6	- 'ais archive bucket "ais://src/trunk-{001..997}" gs://dst/a.tar.lz4' - same as above (notice double quotes);
7	- 'ais archive bucket "ais://src/trunk-{998..999}" gs://dst/a.tar.lz4 --append-or-put' - add two more objects to an existing shard;
8	- 'ais archive bucket s3://src/trunk-00 ais://dst/b.tar' - archive "trunk-00" prefixed objects from an s3 bucket as a given TAR destinati
9	on
10
11	USAGE:
12	ais archive bucket SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] DST_BUCKET/SHARD_NAME [command options]
13
14	OPTIONS:
15	append-or-put Append to an existing destination object ("archive", "shard") iff exists; otherwise PUT a new archive (shard);
16	note that PUT (with subsequent overwrite if the destination exists) is the default behavior when the flag is omitted
17	cont-on-err Keep running archiving xaction (job) in presence of errors in a any given multi-object transaction
18	dry-run Preview the results without really running the action
19	include-src-bck Prefix the names of archived files with the source bucket name
20	list Comma-separated list of object or file names, e.g.:
21	--list 'o1,o2,o3'
22	--list "abc/1.tar, abc/1.cls, abc/1.jpeg"
23	or, when listing files and/or directories:
24	--list "/home/docs, /home/abc/1.tar, /home/abc/1.jpeg"
25	non-recursive,nr Non-recursive operation, e.g.:
26	- 'ais ls gs://bck/sub --nr' - list objects and/or virtual subdirectories with names starting with the specified prefix;
27	- 'ais ls gs://bck/sub/ --nr' - list only immediate contents of 'sub/' subdirectory (non-recursive);
28	- 'ais prefetch s3://bck/abcd --nr' - prefetch a single named object;
29	- 'ais evict gs://bck/sub/ --nr' - evict only immediate contents of 'sub/' subdirectory (non-recursive);
30	- 'ais evict gs://bck --prefix=sub/ --nr' - same as above
31	prefix Select virtual directories or objects with names starting with the specified prefix, e.g.:
32	'--prefix a/b/c' - matches names 'a/b/c/d', 'a/b/cdef', and similar;
33	'--prefix a/b/c/' - only matches objects from the virtual directory a/b/c/
34	skip-lookup Skip checking source and destination buckets' existence (trading off extra lookup for performance)
35
36	template Template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
37	(with optional steps and gaps), e.g.:
38	--template "" # (an empty or '*' template matches everything)
39	--template 'dir/subdir/'
40	--template 'shard-{1000..9999}.tar'
41	--template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
42	and similarly, when specifying files and directories:
43	--template '/home/dir/subdir/'
44	--template "/abc/prefix-{0010..9999..2}-suffix"
45	wait Wait for an asynchronous operation to finish (optionally, use '--timeout' to limit the waiting time)
46	help, h Show help

1	$ ais archive bucket s3://src-bck/aaa/ ais://dst/example.tar --nr
2	Archived s3://src-bck/aaa/ => ais://dst/example.tar
3
4	$ ais ls ais://dst/example.tar --archive
5	NAME SIZE
6	example.tar 106.00KiB
7	example.tar/aaa/777 16.84KiB
8	example.tar/aaa/888 16.84KiB
9	example.tar/aaa/999 16.84KiB
10	example.tar/aaa/trunk-777 16.84KiB
11	example.tar/aaa/trunk-888 16.84KiB
12	example.tar/aaa/trunk-999 16.84KiB

1	$ ais archive bucket ais://src ais://dst/range.tar --template "obj-{0..9}"
2	Archiving "ais://dst/range.tar" ...
3
4	$ ais archive ls ais://dst/range.tar
5
6	NAME SIZE
7	range.tar 92.60KiB
8	range.tar/obj-0 9.26KiB
9	range.tar/obj-1 9.26KiB
10	...
11	range.tar/obj-9 9.26KiB

1	$ ais archive bucket ais://bck/incremental.tar --template "obj{1..3}"
2	Archived "ais://bck/incremental.tar" ...
3
4	$ ais archive bucket ais://bck/incremental.tar --template "obj{4..5}" --append
5	Archived "ais://bck/incremental.tar"

1	$ ais archive bucket ais://bck/arch.tar --list obj1,obj2
2	Archiving "ais://bck/arch.tar" ...

1	$ ais archive bucket ais://src ais://dst/arch.tar --template "obj-{0..9}"
2
3	Archiving "ais://dst/arch.tar" ...

1	$ ais archive bucket ais://bck/arch1.tar --template "obj{1..3}"
2	Archived "ais://bck/arch1.tar" ...
3	$ ais archive ls ais://bck/arch1.tar
4	NAME SIZE
5	arch1.tar 31.00KiB
6	arch1.tar/obj1 9.26KiB
7	arch1.tar/obj2 9.26KiB
8	arch1.tar/obj3 9.26KiB
9
10	$ ais archive bucket ais://bck/arch1.tar --template "obj{4..5}" --append
11	Archived "ais://bck/arch1.tar"
12
13	$ ais archive ls ais://bck/arch1.tar
14	NAME SIZE
15	arch1.tar 51.00KiB
16	arch1.tar/obj1 9.26KiB
17	arch1.tar/obj2 9.26KiB
18	arch1.tar/obj3 9.26KiB
19	arch1.tar/obj4 9.26KiB
20	arch1.tar/obj5 9.26KiB

1	NAME:
2	ais archive ls - list archived content (supported formats: .tar, .tgz or .tar.gz, .zip, .tar.lz4)
3
4	USAGE:
5	ais archive ls BUCKET[/SHARD_NAME] [command options]

1	$ ais archive ls --help
2
3	NAME:
4	ais archive ls - List archived content (supported formats: .tar, .tgz or .tar.gz, .zip, .tar.lz4)
5
6	USAGE:
7	ais archive ls BUCKET[/SHARD_NAME] [command options]
8
9	OPTIONS:
10	--all Depending on the context, list:
11	- all buckets, including accessible (visible) remote buckets that are not in-cluster
12	- all objects in a given accessible (visible) bucket, including remote objects and misplaced copies
13	--cached Only list in-cluster objects, i.e., objects from the respective remote bucket that are present ("cached") in the cluster
14	--count-only Print only the resulting number of listed objects and elapsed time
15	--diff Perform a bidirectional diff between in-cluster and remote content, which further entails:
16	- detecting remote version changes (a.k.a. out-of-band updates), and
17	- remotely deleted objects (out-of-band deletions (*));
18	the option requires remote backends supporting some form of versioning (e.g., object version, checksum, and/or ETag);
19	see related:
20	(*) options: --cached; --latest
21	commands: 'ais get --latest'; 'ais cp --sync'; 'ais prefetch --latest'
22	--dont-add List remote bucket without adding it to cluster's metadata - e.g.:
23	- let's say, s3://abc is accessible but not present in the cluster (e.g., 'ais ls' returns error);
24	- then, if we ask aistore to list remote buckets: `ais ls s3://abc --all'
25	the bucket will be added (in effect, it'll be created);
26	- to prevent this from happening, either use this '--dont-add' flag or run 'ais evict' command later
27	--dont-wait When _summarizing_ buckets do not wait for the respective job to finish -
28	use the job's UUID to query the results interactively
29	--inv-id value Bucket inventory ID (optional; by default, we use bucket name as the bucket's inventory ID)
30	--inv-name value Bucket inventory name (optional; system default name is '.inventory')
31	--inventory List objects using _bucket inventory_ (docs/s3compat.md); requires s3:// backend; will provide significant performance
32	boost when used with very large s3 buckets; e.g. usage:
33	1) 'ais ls s3://abc --inventory'
34	2) 'ais ls s3://abc --inventory --paged --prefix=subdir/'
35	(see also: docs/s3compat.md)
36	--limit value The maximum number of objects to list, get, or otherwise handle (0 - unlimited; see also '--max-pages'),
37	e.g.:
38	- 'ais ls gs://abc/dir --limit 1234 --cached --props size,custom,atime' - list no more than 1234 objects
39	- 'ais get gs://abc /dev/null --prefix dir --limit 1234' - get --/--
40	- 'ais scrub gs://abc/dir --limit 1234' - scrub --/-- (default: 0)
41	--max-pages value Maximum number of pages to display (see also '--page-size' and '--limit')
42	e.g.: 'ais ls az://abc --paged --page-size 123 --max-pages 7 (default: 0)
43	--name-only Faster request to retrieve only the names of objects (if defined, '--props' flag will be ignored)
44	--no-dirs Do not return virtual subdirectories (applies to remote buckets only)
45	--no-footers, -F Display tables without footers
46	--no-headers, -H Display tables without headers
47	--non-recursive, --nr Non-recursive operation, e.g.:
48	- 'ais ls gs://bucket/prefix --nr' - list objects and/or virtual subdirectories with names starting with the specified prefix;
49	- 'ais ls gs://bucket/prefix/ --nr' - list contained objects and/or immediately nested virtual subdirectories _without_ recursing into the latter;
50	- 'ais prefetch s3://bck/abcd --nr' - prefetch a single named object (see 'ais prefetch --help' for details);
51	- 'ais rmo gs://bucket/prefix --nr' - remove a single object with the specified name (see 'ais rmo --help' for details)
52	--page-size value Maximum number of object names per page; when the flag is omitted or 0
53	the maximum is defined by the corresponding backend; see also '--max-pages' and '--paged' (default: 0)
54	--paged List objects page by page - one page at a time (see also '--page-size' and '--limit')
55	note: recommended for use with very large buckets
56	--prefix value List objects with names starting with the specified prefix, e.g.:
57	'--prefix a/b/c' - list virtual directory a/b/c and/or objects from the virtual directory
58	a/b that have their names (relative to this directory) starting with the letter 'c'
59	--props value Comma-separated list of object properties including name, size, version, copies, and more; e.g.:
60	--props all
61	--props name,size,cached
62	--props "ec, copies, custom, location"
63	--refresh value Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
64	valid time units: ns, us (or µs), ms, s (default), m, h
65	--regex value Regular expression; use it to match either bucket names or objects in a given bucket, e.g.:
66	ais ls --regex "(m\|n)" - match buckets such as ais://nnn, s3://mmm, etc.;
67	ais ls ais://nnn --regex "^A" - match object names starting with letter A
68	--show-unmatched List also objects that were not matched by regex and/or template (range)
69	--silent Server-side flag, an indication for aistore _not_ to log assorted errors (e.g., HEAD(object) failures)
70	--skip-lookup Do not execute HEAD(bucket) request to lookup remote bucket and its properties; possible usage scenarios include:
71	1) adding remote bucket to aistore without first checking the bucket's accessibility
72	(e.g., to configure the bucket's aistore properties with alternative security profile and/or endpoint)
73	2) listing public-access Cloud buckets where certain operations (e.g., 'HEAD(bucket)') may be disallowed
74	--start-after value List bucket's content alphabetically starting with the first name _after_ the specified
75	--summary Show object numbers, bucket sizes, and used capacity;
76	note: applies only to buckets and objects that are _present_ in the cluster
77	--template value Template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
78	(with optional steps and gaps), e.g.:
79	--template "" # (an empty or '*' template matches everything)
80	--template 'dir/subdir/'
81	--template 'shard-{1000..9999}.tar'
82	--template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
83	and similarly, when specifying files and directories:
84	--template '/home/dir/subdir/'
85	--template "/abc/prefix-{0010..9999..2}-suffix"
86	--units value Show statistics and/or parse command-line specified sizes using one of the following units of measurement:
87	iec - IEC format, e.g.: KiB, MiB, GiB (default)
88	si - SI (metric) format, e.g.: KB, MB, GB
89	raw - do not convert to (or from) human-readable format
90	--help, -h Show help

1	$ ais archive ls ais://bck/arch.tar
2	NAME SIZE
3	arch.tar 4.5KiB
4	arch.tar/obj1 1.0KiB
5	arch.tar/obj2 1.0KiB

1	$ ais archive ls ais://nnn --prefix "A.tar/tutorials"
2	NAME SIZE
3	A.tar/tutorials/README.md 561B
4	A.tar/tutorials/etl/compute_md5.md 8.28KiB
5	A.tar/tutorials/etl/etl_imagenet_pytorch.md 4.16KiB
6	A.tar/tutorials/etl/etl_webdataset.md 3.97KiB
7	Listed: 4 names

1	$ ais ls ais://nnn --prefix "A.tar/tutorials" --archive
2	NAME SIZE
3	A.tar/tutorials/README.md 561B
4	A.tar/tutorials/etl/compute_md5.md 8.28KiB
5	A.tar/tutorials/etl/etl_imagenet_pytorch.md 4.16KiB
6	A.tar/tutorials/etl/etl_webdataset.md 3.97KiB
7	Listed: 4 names

Table of Contents

Subcommands

Archive files and directories (ais archive put)

Append files and directories to an existing archive

Example 1: add file to archive

step 1. create archive (by archiving a given source dir)

step 2. add a single file to existing archive

step 3. list entire bucket with an --archive option to show all archived entries

Example 2: use --template flag to add source files

Example 3: add file to archive

Example 4: add file to archive

Archive multiple objects (ais archive bucket)

Features

Usage

Selection Options

Non-Recursive Option (—nr)

Examples with Non-Recursive Flag

Visual Example

Additional Options

Complete Examples

Notes

List archived content

Options

Examples

Example: use ‘—prefix’ that crosses shard boundary

Get archived content (‘ais archive get`)

Example: extract one file

Example: extract one file using its fully-qualified name::

Example: extract all files from a single shard

Example: extract all files from all shards (with a given prefix)

Example: use ‘—prefix’ that crosses shard boundary

Get archived content: multiple selection

Example: suffix match

Example: WebDataset key

Example: prefix match

Generate shards

Options

Examples

Generate shards with varying numbers of files and file sizes

Generate shards using custom naming template

Multi-extension example

Table of Contents

Subcommands

Archive files and directories (ais archive put)

Append files and directories to an existing archive

Example 1: add file to archive

step 1. create archive (by archiving a given source dir)

step 2. add a single file to existing archive

step 3. list entire bucket with an --archive option to show all archived entries

Example 2: use --template flag to add source files

Example 3: add file to archive

Example 4: add file to archive

Archive multiple objects (ais archive bucket)

Features

Usage

Selection Options

Non-Recursive Option (—nr)

Examples with Non-Recursive Flag

Visual Example

Additional Options

Complete Examples

Notes

List archived content

Options

Examples

Example: use ‘—prefix’ that crosses shard boundary

Get archived content (‘ais archive get`)

Example: extract one file

Example: extract one file using its fully-qualified name::

Example: extract all files from a single shard

Example: extract all files from all shards (with a given prefix)

Example: use ‘—prefix’ that crosses shard boundary

Get archived content: multiple selection

Example: suffix match

Example: WebDataset key

Example: prefix match

Generate shards

Options

Examples

Generate shards with varying numbers of files and file sizes

Archive files and directories (`ais archive put`)

step 3. list entire bucket with an `--archive` option to show all archived entries

Example 2: use `--template` flag to add source files

Archive multiple objects (`ais archive bucket`)

Archive files and directories (`ais archive put`)

step 3. list entire bucket with an `--archive` option to show all archived entries

Example 2: use `--template` flag to add source files

Archive multiple objects (`ais archive bucket`)

1	$ ais archive get --help
2	NAME:
3	ais archive get - Get a shard and extract its content; get an archived file;
4	write the content locally with destination options including: filename, directory, STDOUT ('-'), or '/dev/null' (discard);
5	assorted options further include:
6	- '--prefix' to get multiple shards in one shot (empty prefix for the entire bucket);
7	- '--progress' and '--refresh' to watch progress bar;
8	- '-v' to produce verbose output when getting multiple objects.
9	'ais archive get' examples:
10	- ais://abc/trunk-0123.tar.lz4 /tmp/out - get and extract entire shard to /tmp/out/trunk/*
11	- ais://abc/trunk-0123.tar.lz4 --archpath file45.jpeg /tmp/out - extract one named file
12	- ais://abc/trunk-0123.tar.lz4/file45.jpeg /tmp/out - same as above (and note that '--archpath' is implied)
13	- ais://abc/trunk-0123.tar.lz4/file45 /tmp/out/file456.new - same as above, with destination explicitly (re)named
14	'ais archive get' multi-selection examples:
15	- ais://abc/trunk-0123.tar 111.tar --archregx=jpeg --archmode=suffix - return 111.tar with all *.jpeg files from a given shard
16	- ais://abc/trunk-0123.tar 222.tar --archregx=file45 --archmode=wdskey - return 222.tar with all file45.* files --/--
17	- ais://abc/trunk-0123.tar 333.tar --archregx=subdir/ --archmode=prefix - 333.tar with all subdir/* files --/--
18
19	USAGE:
20	ais archive get BUCKET[/SHARD_NAME] [OUT_FILE\|OUT_DIR\|-] [command options]
21
22	OPTIONS:
23	archive List archived content (see docs/archive.md for details)
24	archmime Expected format (mime type) of an object ("shard") formatted as .tar, .tgz or .tar.gz, .zip, .tar.lz4;
25	especially usable for shards with non-standard extensions
26	archmode Enumerated "matching mode" that tells aistore how to handle '--archregx', one of:
27	* regexp - general purpose regular expression;
28	* prefix - matching filename starts with;
29	* suffix - matching filename ends with;
30	* substr - matching filename contains;
31	* wdskey - WebDataset key
32	example:
33	given a shard containing (subdir/aaa.jpg, subdir/aaa.json, subdir/bbb.jpg, subdir/bbb.json, ...)
34	and wdskey=subdir/aaa, aistore will match and return (subdir/aaa.jpg, subdir/aaa.json)
35	archpath Extract the specified file from an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4;
36	see also: '--archregx'
37	archregx Specifies prefix, suffix, substring, WebDataset key, _or_ a general-purpose regular expression
38	to select possibly multiple matching archived files from a given shard;
39	is used in combination with '--archmode' ("matching mode") option
40	blob-download Utilize built-in blob-downloader (and the corresponding alternative datapath) to read very large remote objects
41	cached Only get in-cluster objects, i.e., objects from the respective remote bucket that are present ("cached") in the cluster
42	checksum Validate checksum
43	chunk-size Chunk size in IEC or SI units, or "raw" bytes (e.g.: 4mb, 1MiB, 1048576, 128k; see '--units')
44	encode-objname Encode object names that contain special symbols (; : ' " < > / \ \| ? #) that may otherwise break shell parsing or URL interpretation
45	extract,x Extract all files from archive(s)
46	inv-id Bucket inventory ID (optional; by default, we use bucket name as the bucket's inventory ID)
47	inv-name Bucket inventory name (optional; system default name is '.inventory')
48	inventory List objects using _bucket inventory_ (docs/s3compat.md); requires s3:// backend; will provide significant performance
49	boost when used with very large s3 buckets; e.g. usage:
50	1) 'ais ls s3://abc --inventory'
51	2) 'ais ls s3://abc --inventory --paged --prefix=subdir/'
52	(see also: docs/s3compat.md)
53	latest Check in-cluster metadata and, possibly, GET, download, prefetch, or otherwise copy the latest object version
54	from the associated remote bucket;
55	the option provides operation-level control over object versioning (and version synchronization)
56	without the need to change the corresponding bucket configuration: 'versioning.validate_warm_get';
57	see also:
58	- 'ais show bucket BUCKET versioning'
59	- 'ais bucket props set BUCKET versioning'
60	- 'ais ls --check-versions'
61	supported commands include:
62	- 'ais cp', 'ais prefetch', 'ais get'
63	limit The maximum number of objects to list, get, or otherwise handle (0 - unlimited; see also '--max-pages'),
64	e.g.:
65	- 'ais ls gs://abc/dir --limit 1234 --cached --props size,custom,atime' - list no more than 1234 objects
66	- 'ais get gs://abc /dev/null --prefix dir --limit 1234' - get --/--
67	- 'ais scrub gs://abc/dir --limit 1234' - scrub --/--
68	num-workers Number of concurrent blob-downloading workers (readers); system default when omitted or zero
69	prefix Get objects with names starting with the specified prefix, e.g.:
70	'--prefix a/b/c' - get objects from the virtual directory a/b/c and objects from the virtual directory
71	a/b that have their names (relative to this directory) starting with 'c';
72	'--prefix ""' - get entire bucket (all objects)
73	progress Show progress bar(s) and progress of execution in real time
74	refresh Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
75	valid time units: ns, us (or µs), ms, s (default), m, h
76	silent Server-side flag, an indication for aistore _not_ to log assorted errors (e.g., HEAD(object) failures)
77	skip-lookup Do not execute HEAD(bucket) request to lookup remote bucket and its properties; possible usage scenarios include:
78	1) adding remote bucket to aistore without first checking the bucket's accessibility
79	(e.g., to configure the bucket's aistore properties with alternative security profile and/or endpoint)
80	2) listing public-access Cloud buckets where certain operations (e.g., 'HEAD(bucket)') may be disallowed
81	units Show statistics and/or parse command-line specified sizes using one of the following units of measurement:
82	iec - IEC format, e.g.: KiB, MiB, GiB (default)
83	si - SI (metric) format, e.g.: KB, MB, GB
84	raw - do not convert to (or from) human-readable format
85	verbose,v Verbose output
86	yes,y Assume 'yes' to all questions
87	help, h Show help

1	$ ais archive get ais://dst/A.tar.gz /tmp/w --archpath 111.ext1
2	GET 111.ext1 from ais://dst/A.tar.gz as "/tmp/w/111.ext1" (12.56KiB)
3
4	$ ls /tmp/w
5	111.ext1

1	$ ais ls ais://dst --archive
2	NAME SIZE
3	A.tar.gz 5.18KiB
4	A.tar.gz/111.ext1 12.56KiB
5	A.tar.gz/222.ext1 12.56KiB
6	A.tar.gz/333.ext2 12.56KiB

1	$ ais archive get ais://dst/A.tar.gz /tmp/www --extract
2	GET A.tar.gz from ais://dst as "/tmp/www/A.tar.gz" (5.18KiB) and extract to /tmp/www/A/
3
4	$ ls /tmp/www/A
5	111.ext1 222.ext1 333.ext2

1	$ ais get ais://dst --archive --prefix A.tar.gz /tmp/www --progress --refresh 1 -y
2
3	GET 51 objects from ais://dst/tmp/ggg (total size 1.08MiB)
4	Objects: 51/51 [==============================================================] 100 %
5	Total size: 1.08 MiB / 1.08 MiB [==============================================================] 100 %

1	$ ais ls ais://dst
2	NAME SIZE
3	A.tar.gz 5.18KiB
4	B.tar.lz4 247.88KiB
5	C.tar.zip 4.15KiB
6	D.tar 2.00KiB

1	$ ais archive get ais://dst /tmp/w --prefix "" --extract -v
2
3	GET 4 objects from ais://dst to /tmp/w (total size 259.21KiB) [Y/N]: y
4	GET D.tar from ais://dst as "/tmp/w/D.tar" (2.00KiB) and extract as /tmp/w/D
5	GET A.tar.gz from ais://dst as "/tmp/w/A.tar.gz" (5.18KiB) and extract as /tmp/w/A
6	GET C.tar.zip from ais://dst as "/tmp/w/C.tar.zip" (4.15KiB) and extract as /tmp/w/C
7	GET B.tar.lz4 from ais://dst as "/tmp/w/B.tar.lz4" (247.88KiB) and extract as /tmp/w/B

1	$ ais archive ls ais://nnn --prefix A.tar/tutorials
2	NAME SIZE
3	A.tar/tutorials/README.md 561B
4	A.tar/tutorials/etl/compute_md5.md 8.28KiB
5	A.tar/tutorials/etl/etl_imagenet_pytorch.md 4.16KiB
6	A.tar/tutorials/etl/etl_webdataset.md 3.97KiB
7	Listed: 4 names

1	$ ais archive get ais://nnn --prefix A.tar/tutorials /tmp/out
2	GET 6 objects from ais://nnn/tmp/out (total size 17.81MiB) [Y/N]: y
3
4	$ ls -al /tmp/out/tutorials/
5	total 20
6	drwxr-x--- 4 root root 4096 May 13 20:05 ./
7	drwxr-xr-x 3 root root 4096 May 13 20:05 ../
8	drwxr-x--- 2 root root 4096 May 13 20:05 etl/
9	-rw-r--r-- 1 root root 561 May 13 20:05 README.md
10	drwxr-x--- 2 root root 4096 May 13 20:05 various/

1	--archpath value extract the specified file from an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4;
2	see also: '--archregx'
3	--archmime value expected format (mime type) of an object ("shard") formatted as: .tar, .tgz or .tar.gz, .zip, .tar.lz4;
4	especially usable for shards with non-standard extensions
5	--archregx value string that specifies prefix, suffix, substring, WebDataset key, _or_ a general-purpose regular expression
6	to select possibly multiple matching archived files from a given shard;
7	is used in combination with '--archmode' ("matching mode") option
8	--archmode value enumerated "matching mode" that tells aistore how to handle '--archregx', one of:
9	* regexp - general purpose regular expression;
10	* prefix - matching filename starts with;
11	* suffix - matching filename ends with;
12	* substr - matching filename contains;
13	* wdskey - WebDataset key
14	example:
15	given a shard containing (subdir/aaa.jpg, subdir/aaa.json, subdir/bbb.jpg, subdir/bbb.json, ...)
16	and wdskey=subdir/aaa, aistore will match and return (subdir/aaa.jpg, subdir/aaa.json)