This document contains ais object commands - the commands to read (GET), write (PUT), APPEND, PROMOTE, PREFETCH, EVICT etc. user data.
Namely:
--skip-vc optionUse ais object get or, same, ais get to GET data from aistore. In other words, read data from the cluster and, optionally, save it locally.
ais get BUCKET[/OBJECT_NAME] [OUT_FILE|-] [command options]
there’s
BUCKET[/OBJECT_NAME]), and[OUT_FILE] or standard output (-)Here’s in detail:
Get the imagenet_train-000010.tgz object from the imagenet bucket and write it to a local file, ~/train-10.tgz:
For comparison, the same GET using curl and the two supported variants of RESTful API:
If OUT_FILE is omitted, the local file name is implied from the object name.
Get the imagenet_train-000010.tgz object from the imagenet bucket and write it to a local file, imagenet_train-000010.tgz:
Get the imagenet_train-000010.tgz object from the imagenet AWS bucket and write it to standard output:
We say that “an object is cached” to indicate two separate things:
In other words, the term “cached” is simply a shortcut to indicate the object’s immediate availability without the need to go to the object’s original location. Being “cached” does not have any implications on an object’s persistence: “cached” objects, similar to objects that originated in a given AIS cluster, are stored with arbitrary (per bucket configurable) levels of redundancy, etc. In short, the same storage policies apply to “cached” and “non-cached”.
The following example checks whether imagenet_train-000010.tgz is “cached” in the bucket imagenet:
Get the contents of object list.txt from texts bucket starting from offset 1024 length 1024 and save it as ~/list.txt file:
Let’s say, bucket ais://src contains 4 copies of aistore readme in its virtual directory docs/:
The following reads 10 bytes from each copy and prints the result:
Same as above with automatic confirmation and writing results to /tmp/w:
Use --mpd for client-side concurrent range-based download with built-in progress bar. This is useful for large objects where you want to see download progress and potentially benefit from parallel chunk downloads.
--mpdvs--blob-download: These are different mechanisms for different purposes:
--mpd(multipart download): Client-side - downloads object from AIStore to client using concurrent range requests--blob-download: Server-side - fetches object from remote backend (e.g., S3, GCS) into AIStore cluster for caching; see blob_downloader.md
Note:
--mpdis for single-object download only and includes its own progress bar automatically. For multi-object downloads (using--prefix), use--progressinstead to track the number of objects processed.
Use --prefix to download multiple objects at once. Note that destination in this case is a local directory and that (an empty) prefix indicates getting entire bucket; see --help for details.
Use --progress to show multi-object progress (number of objects processed). This is different from --mpd which has its own built-in progress bar for single-object downloads.
For objects formatted as (.tar, .tar.gz, .tar.lz4, or .zip), it is possible to GET and extract them in one shot. There are two “responsible” options:
Maybe the most basic:
assuming, ais://nnn/A.tar was previously created via (e.g.)
ais archive put docs ais://nnn/A.tar -r
Let’s say, there’s a bucket ais://dst with a virtual directory abc/ that in turn contains:
Next, we GET and extract them all in the respective sub-directories (note also the --verbose option):
For starters, we recursively archive all aistore docs:
To list a virtual subdirectory inside this newly created shard (e.g.):
Now, extract matching files from the bucket to /tmp/out:
The result:
NOTE: for more “archival” options and examples, please see docs/cli/archive.md.
ais object cat BUCKET/OBJECT_NAME
Get OBJECT_NAME from bucket BUCKET and print it to standard output.
Alias for ais get BUCKET/OBJECT_NAME -.
Print content of list.txt from local bucket texts to the standard output:
Print content of object list.txt starting from offset 1024 length 1024 to the standard output:
ais object show [--props PROP_LIST] BUCKET/OBJECT_NAME
Get object detailed information.
PROP_LIST is a comma-separated list of properties to display.
If PROP_LIST is omitted, default properties are shown.
Supported properties:
cached - the object cached on local drives (always true for AIS buckets)size - object sizeversion - object version (empty if versioning is disabled for the bucket)atime - object’s last access timecopies - the number of object replicas per target (1 if bucket mirroring is disabled), and mountpath where object and its mirrors are locatedchecksum - object’s checksumnode - on which target the object is locatedec - object’s EC info (empty if EC is disabled for the bucket, if EC is enabled it looks like DATA:PARITY[MODE], where DATA - the number of data slices,
PARITY - the number of parity slices, and MODE is protection mode selected for the object: replicated - object has PARITY replicas on other targets,
encoded the object is erasure coded and other targets contains only encoded slices
ais object showis an forais object show- both can be used interchangeably.
Display default properties of object list.txt from bucket texts:
Display all properties of object list.txt from bucket texts:
Show only selected (size,version,ec) properties:
Briefly:
ais put [-|FILE|DIRECTORY[/PATTERN]] BUCKET[/OBJECT_NAME_or_PREFIX]1 [command options]
writes a single file, an entire directory (of files), or a typed content directly from STDIN (-) - into the specified (destination) bucket.
Notice the optional [/PATTERN] - a regular shell filename-matching primitive - to select files from the source directory.
If an object of the same name exists, the object will be overwritten without confirmation
but only if is different, content-wise - writing identical bits is optimized-out
If CLI detects that a user is going to put more than one file, it calculates the total number of files, total data size, and checks if the bucket is empty.
Then it shows all gathered info to the user and asks for confirmation to continue.
Confirmation request can be disabled with the option --yes for use in scripts.
When writing from STDIN, type Ctrl-D to terminate the input.
1 FILE|DIRECTORY should point to a file or a directory. Wildcards are supported, but they work a bit differently from shell wildcards.
Symbols * and ? can be used only in a file name pattern. Directory names cannot include wildcards. Only a file name is matched, not full file path, so /home/user/*.tar --recursive matches not only .tar files inside /home/user but any .tar file in any /home/user/ subdirectory.
This makes shell wildcards like ** redundant, and the following patterns won’t work in ais: /home/user/img-set-*/*.tar or /home/user/bck/**/*.tar.gz
FILE must point to an existing file.
File masks and directory uploading are NOT supported in single-file upload mode.
PUT command handles two possible ways to specify resulting object name if source references single file:
ais put path/to/(..)/file.go bucket/ creates object file.go in bucketais put path/to/(..)/file.go bucket/path/to/object.go creates object path/to/object.go in bucketPUT command handles object naming with range syntax as follows:
/ before first { is excluded from object name.OBJECT_NAME is prepended to each object name.../ are not supported at the moment.PUT command handles object naming if its source references directories:
p of source directory, resulting objects names are path to files with trimmed p prefixOBJECT_NAME is prepended to each object name.../ are not supported at the moment.All examples below put into an empty bucket and the source directory structure is:
The current user HOME directory is /home/user.
Motivation: There’s always a motivation to perform faster. One way to achieve this is by avoiding redundant writes of user data. A write operation can effectively become a no-op if the identical data already exists in the cluster. The conventional method to establish such identity is through content checksumming.
In short, here’s a CLI write-optimizing trick that utilizes client-side checksumming.
Note: Ideally, the checksum is provided with PUT API calls. The CLI takes it one step further: if client-side checksumming is requested but the checksum is empty, the CLI computes it automatically. The corresponding overhead must be taken into account when analyzing resulting performance.
First, compare two simple examples:
In other words, a trailing forward slash in the destination name is interpreted as a destination directory
which is what one would expect from something like Bash:
cp README.md /nnn/ccc/
One other example: put a single file img1.tar into local bucket mybucket, name it img-set-1.tar.
Put a single file img1.tar into local bucket mybucket, with a content checksum flag
to override the default bucket checksum performed at the server side.
Optionally, the user can choose to provide a --compute-cksum flag for the checksum flag and
let the API take care of the computation.
Put a single file ~/bck/img1.tar into bucket mybucket, without explicit name.
Read unpacked content from STDIN and put it into bucket mybucket with name img-unpacked.
Note that content is put in chunks that can have a slight overhead.
--chunk-size allows for controlling the chunk size - the bigger the chunk size the better performance (but also higher memory usage).
Put two objects, /home/user/bck/img1.tar and /home/user/bck/img2.zip, into the root of bucket mybucket.
Note that the path /home/user/bck is a shortcut for /home/user/bck/* and that recursion is disabled by default.
Alternatively, to reference source directory we can use relative (../..) naming.
Also notice progress bar (the --progress flag) and g* wildcard that allows to select only the filenames that start with ‘g’
NOTE double quotes to denote the
"../../../../bin/g*"source above. With pattern matching, using quotation marks is a MUST. Single quotes can be used as well.
The multi-file source can be: a directory, a comma-separated list, a template-defined range - all of the above.
Examples follow below, but also notice:
Same as above, except now we make sure that destination is a virtual directory (notice trailing forward ’/’):
Same as above, with --template embedded into the source argument:
And finally, we can certainly PUT source directory:
The same as above, but without trailing /.
Same as above with source files in double quotes below, and with progress bar:
List of sources that you want to upload can (a) comprize any number (and any mix) of comma-separated files and/or directories, and (b) must be embedded in double or single quotes.
Note ’/’ suffix in
my-virt-dir/above - without trailing filepath separator we would simply get a longer filename (filenames) at the root of the destination bucket.
We can now list them in the bucket ais://aaa the way we would list a directory:
Same as above, except that only files matching pattern *.tar are PUT, so the final bucket content is tars/img1.tar and tars/extra/img1.tar.
NOTE double quotes to denote the source. With pattern matching, using quotation marks is a MUST. Single quotes can be used as well.
Same as above with progress bar, recursion into nested directories, and matching characters anywhere in the filename:
The result will look as follows:
There are several equivalent ways to PUT a templated range of files:
Put 9 files to mybucket using a range request. Note the formatting of object names.
They exclude the longest parent directory of path which doesn’t contain a template ({a..b}).
Same as above but in addition destination object names will have additional prefix subdir/ (notice the trailing /)
In other words, this PUT in affect creates a virtual directory inside destination ais://mybucket
Next, PUT:
Finally, the same exact operation can be accomplished using --template option
--templateis universally supported to specify a range of files or objects
There are several equivalent ways to PUT a list of files:
Alternatively, the same can be done using the --list flag:
--listis universally supported to specify a list of files or objects
The only difference from the two examples above is: trailing / in the destination name.
Preview the files that would be sent to the cluster, without actually putting them.
Generally, the --template option combines (an optional) prefix and/or one or more ranges (e.g., bash brace expansions).
In this example, we only use the “prefix” part of the --template to specify source directory.
Note: to PUT files into a virtual destination directory, use trailing ’/’, e.g.:
ais put ais://nnn/fff/ ...
First, let’s generate some files and directories (strictly for illustration purposes):
Next, PUT them all in one shot (notice quotation marks!):
Let’s now take a look at the result - and observe a PROBLEM:
So Yes, the problem is that by default destination object names are sourced from the source file basenames.
In this examples, we happen to have only 3 basenames: test0.txt, test1.txt, and test2.txt.
The workaround is to include respective parent directories in the destination naming:
As always, see
ais put --helpfor usage examples and more options.
Same as above, but note: alternative syntax, which is maybe more conventional:
--skip-vc optionThe
--skip-vcoption allows AIS to skip loading existing object’s metadata to perform metadata-associated processing (such as comparing source and destination checksums, for instance). In certain scenarios (e.g., massive uploading of new files that cannot be present in the bucket) this can help reduce PUT latency.
Yes, ais put can be used to copy remote files - usage tips follow below. Buf first, disclaimer.
Copying large amounts of data from remote (NFS, SMB) locations is not exactly an exercise for a single client machine. There are alternative designed-in ways, whereby all AIStore nodes partition remote source between themselves and do the copying - in parallel.
Performance-wise, the difference from copying via client (or by client) - is two-fold:
Needless to say, promoting files to objects, as it were, requires that all AIS nodes have connectivity and permissions to access the remote source.
Further references:
--retries optionIncluding --retries in your command will help resolve an occasional timeout and other intermittent failures. For example, --retries 5 will retry a failed requests up to 5 (five) times.
--num-workers optionIn other words, take advantage of the client side multi-threading. If you have sufficient resources, increase this number to allow more workers to transfer data in parallel.
Recursively copy the contents of (NFS-mounted) target_dir/ to the ais://nnn/target_dir/ bucket, using 64 client workers (OS threads) and retrying failed requests up to 3 times.
Same as above (and notice ais put shortcut and --include-src-dir option):
Same as above, but with additional capability to “continue on error” - skip errors that may arise when traversing the source tree:
Same as above, but in addition ask CLI to report all errors that may be skipped or ignored due to the --cont-on-err flag:
Be patient: copying from remote locations is subject to network and remote servers’ delays, both.
Also and separately, note that at the time of this writing AIS CLI does not support pagination of the remote directories that may contain millions of entries. Listing of the entire remote source is (currently) done in one shot, and prior to copying.
If ais put process seems to have paused, there’s a good chance it is still listing remote files or copying in the background.
Refrain from pressing Ctrl-C to interrupt it.
Waiting time may be even greater if you are copying data to an AIStore s3://, gs://, or az:// bucket. AIS uses write-through, so the same data is written to the remote backend and locally as one atomic transaction.
Copying, or generally, working in any shape and form with many (millions of) small files comes with significant and unavoidable overhead, both networking and storage-wise.
Use our ishard tool to convert and serialize your data using the preferred formatting (a.k.a. WebDataset convention):
Inline help follows below:
See above.
Promote /tmp/examples/example1.txt without specified object name.
Promote /tmp/examples/example1.txt as object with name example1.txt.
Make AIS objects out of /tmp/examples files (one file = one object).
/tmp/examples is a directory present on some (or all) of the deployed storage nodes.
Promote /tmp/examples files to AIS objects. Object names will have examples/ prefix.
Try to promote a file that does not exist.
ais object multipart-upload or, same, ais object mpu - Upload large objects in multiple parts for improved performance and reliability.
Multipart upload allows you to upload large objects by breaking them into smaller, manageable parts. This provides several benefits:
The multipart upload process consists of three main steps:
ais object mpu create BUCKET/OBJECT_NAME
Creates a new multipart upload session and returns an upload ID that must be used for all subsequent operations on this upload.
ais object mpu put-part BUCKET/OBJECT_NAME UPLOAD_ID PART_NUMBER FILE_PATH
Uploads individual parts for a multipart upload session. Parts can be uploaded in parallel and in any order.
Parts can be uploaded simultaneously from different terminals or scripts:
All three commands can be executed at the same time, allowing for faster upload of large files.
ais object mpu complete BUCKET/OBJECT_NAME UPLOAD_ID PART_NUMBERS
Completes a multipart upload by assembling all uploaded parts into the final object. Parts are assembled in the order specified by the part numbers.
ais object mpu abort BUCKET/OBJECT_NAME UPLOAD_ID
Aborts a multipart upload session and cleans up any uploaded parts. All uploaded parts are discarded and the object is not created.
Here’s a complete example demonstrating the entire multipart upload process:
Append operation (not to confuse with appending or adding to existing archive) can be executed in 3 different ways:
ais put with --append option;ais object concat;
and finally--chunk-size) small enough to require (appending) multiple chunks.Here’re some examples:
ais object rm or (same) ais rmo - Delete an object or list/range of objects from a bucket.
For multi-object delete operation, see also Operations on Lists and Ranges (and entire buckets) below.
Let’s say, in its initial state the bucket consists of:
Notice that aaa here is both an object and a virtual directory.
That’s why:
And so, as per the Tip (above), we can go ahead and disambiguate one way or another, e.g.:
Delete object myobj.tgz from bucket mybucket.
Delete objects (obj1, obj2) from buckets (aisbck, cloudbck) respectively.
--list or --template, please see: Operations on Lists and Ranges (and entire buckets) below.Some of the supported functionality can be quickly demonstrated with the following examples:
Evict object(s) from a bucket that has remote backend.
--list or --template, please see: Operations on Lists and Ranges (and entire buckets) below.evict also supports embedded prefix - see disambiguating multi-object operationPut file.txt object to cloudbucket bucket and evict it locally.
ais object mv BUCKET/OBJECT_NAME NEW_OBJECT_NAME
Move (rename) an object within an ais bucket. Moving objects from one bucket to another bucket is not supported.
If the NEW_OBJECT_NAME already exists, it will be overwritten without confirmation.
ais object concat DIRNAME|FILENAME [DIRNAME|FILENAME...] BUCKET/OBJECT_NAME
Create an object in a bucket by concatenating the provided files in the order of the arguments provided. If an object of the same name exists, the object will be overwritten without confirmation.
If a directory is provided, files within the directory are sent in lexical order of filename to the cluster for concatenation. Recursive iteration through directories and wildcards is supported in the same way as the PUT operation.
In two separate requests sends file1.txt and dir/file2.txt to the cluster, concatenates the files keeping the order and saves them as obj in bucket mybucket.
Same as above, but additionally shows progress bar of sending the files to the cluster.
Creates obj in bucket mybucket which is concatenation of sorted files from dirB with sorted files from dirA.
Generally, AIS objects have two kinds of properties: system and, optionally, custom (user-defined). Unlike the system-maintained properties, such as checksum and the number of copies (or EC parity slices, etc.), custom properties may have arbitrary user-defined names and values.
Custom properties are not impacted by object updates (PUTs) — a new version of an object simply inherits custom properties of the previous version as is with no changes.
The command’s syntax is similar to the one used to assign bucket properties
ais object set-custom BUCKET/OBJECT_NAME JSON_SPECIFICATION|KEY=VALUE [KEY=VALUE...], [command options]
for example:
To show the results:
Note the flag --props=all used to show all object’s properties including the custom ones, if available.
Generally, multi-object operations are supported in 2 different ways:
--list or --template options, whereby the latter supports Bash expansion syntax and can also contain prefix, such as a virtual parent directory, etc.)This section documents and exemplifies AIS CLI operating on multiple (source) objects that you can specify either explicitly or implicitly
using the --list or --template flags.
The number of objects “involved” in a single operation does not have any designed-in limitations: all AIS targets work on a given multi-object operation simultaneously and in parallel.
This is ais start prefetch or, same, ais prefetch command:
Note usage examples above. You can always run --help option to see the most recently updated inline help.
Note: Similar to delete, evict and copy operations, prefetch also supports embedded prefix - see:
This example demonstrates how to prefetch objects from a remote bucket, and how to monitor the progress of the operation.
First, let’s check which objects are currently stored in-cluster (if any):
To remove all in-cluster content while preserving the bucket’s metadata:
The terms in-cluster and cached are used interchangeably throughout the entire documentation and CLI.
To prefetch objects with a specific prefix from a cloud bucket:
The prefix in the example is “10”
You can monitor the progress of the prefetch operation using the ais show job prefetch command. Add the --refresh flag followed by a time in seconds to get automatic updates:
The output shows statistics for each node in the AIStore cluster:
The output also includes a “Total” row at the bottom that provides cluster-wide aggregated values for the number of objects prefetched and bytes transferred. The checkmark (✓) indicates that all nodes are reporting byte statistics.
You can see the progress over time with automatic refresh:
To stop all in-progress jobs:
This will stop all running jobs. To stop a specific job, use ais stop job JOB_ID.
Initially:
Now, let’s use --prefix option to - in this case - fetch a single object:
Since --template can optionally contain prefix and zero or more ranges, we could execute the above example as follows:
This, in fact, would produce the same result (see previous section).
But of course, “templated” match can also specify an actual range, for example:
NOTE: make sure to use double or single quotations to specify the list, as shown below.
ais object rm BUCKET[/OBJECT_NAME_or_TEMPLATE] [BUCKET[/OBJECT_NAME_or_TEMPLATE] ...] [command options]
Delete an object or list or range of objects from a bucket.
Alias: ais rmo.
Delete a list of objects (obj1, obj2, obj3) from bucket mybucket.
NOTE: when specifying a comma-delimited --list option, make sure to use double or single quotations as shown below.
And one other example (that also includes generating .tar shards):
To fully synchronize in-cluster content with remote backend, please refer to out of band updates
ais evict BUCKET[/OBJECT_NAME_or_TEMPLATE] [BUCKET[/OBJECT_NAME_or_TEMPLATE] ...] [command options]
Command ais evict is a shorter version of ais bucket evict.
Here’s inline help, and specifically notice the multi-object options: --template, --list, and --prefix:
Note usage examples above. You can always run --help option to see the most recently updated inline help.