Batch operations

View as Markdown

Introduction

Extended actions (xactions) are batch operations, or jobs, that run asynchronously, report statistics (viewable at runtime and later), can be waited upon, and can be stopped (aborted).

The list of supported jobs can be viewed via the CLI show command and currently includes:

1$ ais show job --help
2
3 ais show job - Show running and/or finished jobs:
4 archive blob-download cleanup copy-bucket copy-objects create-inventory
5 delete-objects download dsort ec-bucket ec-get ec-put
6 ec-resp elect-primary etl-bucket etl-inline etl-objects evict-objects
7 evict-remote-bucket get-batch list lru-eviction mirror prefetch-objects
8 promote-files put-copies rebalance rechunk rename-bucket resilver
9 summary warm-up-metadata
10 (use any of these names with 'ais show job' command, or try shortcuts: "evict", "prefetch", "copy", "delete", "ec")
11
12 e.g.:
13 - show job prefetch-listrange - show all running prefetch jobs;
14 - show job prefetch - same as above;
15 - show job prefetch --top 5 - show 5 most recent prefetch jobs;
16 - show job tco-cysbohAGL - show a given (multi-object copy/transform) job identified by its unique ID;
17 - show job copy-listrange - show all running multi-object copies;
18 - show job copy-objects - same as above (using display name);
19 - show job copy - show all copying jobs including both bucket-to-bucket and multi-object;
20 - show job copy-objects --all - show both running and already finished (or stopped) multi-object copies;
21 - show job copy-objects --all --top 10 - show 10 most recent multi-object copy jobs;
22 - show job rechunk - show all running rechunk jobs;
23 - show job ec - show all erasure-coding;
24 - show job list - show all running list-objects jobs;
25 - show job ls - same as above;
26 - show job ls --refresh 10 - same as above with periodic _refreshing_ every 10 seconds;
27 - show job ls --refresh 10 --count 4 - same as above but only for the first four 10-seconds intervals;
28 - show job prefetch --refresh 1m - show all running prefetch jobs at 1 minute intervals (until Ctrl-C);
29 - show job evict - all running bucket and/or data evicting jobs;
30 - show job --all - show absolutely all jobs, running and finished.

Terminology-wise, in the code we mostly call it xaction by the name of the corresponding software abstraction. But elsewhere, it is simply a job - the two terms are interchangeable.

In the source code, all supported xactions are enumerated here.

For users, there’s an API to start, stop, and wait for a job:

In CLI, there’s ais job command and its subcommands (<TAB-TAB> completions):

1$ ais job
2start stop wait rm show
3
4$ ais start <TAB-TAB>
5
6prefetch dsort rebalance mirror warm-up-metadata
7blob-download lru resilver ec-encode copy-bck
8download etl cleanup rechunk move-bck

Note that ais start is an alias for the ais job start command - both (versions) can be used interchangeably.

Not all supported jobs are startable.

Not all supported jobs can be started via ais start or by the corresponding Go or Python API call. Example, the job to copy or (ETL) transform datasets has its own dedicated API (both Python and Go) and CLI.

See e.g., ais cp --help

Complete and most recently updated list of supported jobs can be found in this table of job descriptors.

Last (but not the least) is - time. Job execution may take many seconds, sometimes minutes or hours.

Examples include erasure coding or n-way mirroring a dataset, resharding and reshuffling a dataset and more.

Global rebalance gets automatically triggered by any membership changes (nodes joining, leaving, powercycling, etc.) that can be further visualized via ais show rebalance CLI.

Another example would be primary election. AIS proxies provide access points (“endpoints”) for the frontend API. At any point in time there is a single primary proxy that also controls versioning and distribution of the current cluster map. When and if the primary fails, another proxy is majority-elected to perform the (primary) role.

This (election by simple majority) is also a job that cannot be started via ais start or the corresponding API. Similar to global rebalance, it is event-driven. Similar to rebalance, there’s a separate dedicated API to run it administratively.

Rebalance and a few other AIS jobs have their own CLI extensions. Generally, though, you can always monitor xactions via ais show job xaction command that also supports verbose mode and other options.

AIS subsystems integrate subsystem-specific stats - e.g.:

Related CLI documentation:

Table of Contents

Operations on multiple selected objects

AIStore provides APIs to operate on batches of objects:

API Message (apc.ActionMsg)Description
apc.ActCopyObjectscopy multiple objects
apc.ActDeleteObjectsdelete —/—
apc.ActETLObjectsetl (transform) —/—
apc.ActEvictObjectsevict —/—
apc.ActPrefetchObjectsprefetch —/—
apc.ActArchivearchive —/—

For CLI documentation and examples, please see Operations on Lists and Ranges (and entire buckets).

There are two distinct ways to specify the objects: list them (ie., the names) explicitly, or specify a template.

Supported template syntax includes 3 standalone variations - 3 alternative formats:

  1. bash (or shell) brace expansion:
    • prefix-{0..100}-suffix
    • prefix-{00001..00010..2}-gap-{001..100..2}-suffix
  2. at style:
    • prefix-@100-suffix
    • prefix-@00001-gap-@100-suffix
  3. fmt style:
    • prefix-%06d-suffix

In all cases, prefix and/or suffix are optional.

List

List APIs take a JSON array of object names, and initiate the operation on those objects.

ParameterDescription
objnamesJSON array of object names

Range

ParameterDescription
templateThe object name template with optional range parts. If a range is omitted the template is used as an object name prefix

Examples

All the following examples assume that the action is delete and the bucket name is bck, so only the value part of the request is shown:

"value": {"list": "["obj1","dir/obj2"]"} - deletes objects obj1 and dir/obj2 from the bucket bck

"value": {"template": "obj-{07..10}"} - removes the following objects from bck(note leading zeroes in object names):

  • obj-07
  • obj-08
  • obj-09
  • obj-10

"value": {"template": "dir-{0..1}/obj-{07..08}"} - template can contain more than one range, this example removes the following objects from bck(note leading zeroes in object names):

  • dir-0/obj-07
  • dir-0/obj-08
  • dir-1/obj-07
  • dir-1/obj-08

"value": {"template": "dir-10/"} - the template defines no ranges, so the request deletes all objects which names start with dir-10/