Introduction, background, definitions

View as Markdown

Batch operations that run asynchronously and may take seconds (minutes, hours, etc.) to execute - are called eXtended actions (xactions).

Internally, xaction is an abstraction at the root of the inheritance hierarchy that also contains specific user-visible jobs: copy-bucket, evict-objects, and more.

For the most recently updated list of all supported jobs and their respective compile-time properties, see the source.

All jobs run asynchronously, have start and stop times, and common generic statistics

Further, each and every job kind has its own display name, access permissions, scope (bucket and/or global), and a number of boolean properties - examples including:

PropertyDescription
Startabletrue if user can start this job via generic jobi-start API
RefreshCapthe system must refresh capacity stats upon the job’s completion

Many kinds of jobs can be manually started via generic job API (which’s in turn utilized by the ais start command - see next).

Notable exceptions include electing new primary and listing objects in a given bucket - in both of those cases, there’s a separate, more convenient and intuitive API that does the job, so to speak.

Job starting, stopping (i.e., aborting), and monitoring commands all have equivalent shorter versions. For instance ais start download can be expressed as ais start download, while ais wait copy-bucket Z8WkHxwIrr is the same as ais wait Z8WkHxwIrr.

Rest of this document covers starting, stopping, and otherwise managing job kinds and specific job instances. For job monitoring, please use ais show job command and its numerous subcommands and options.

See also

ais job command

Has the following static completions aka subcommands:

1$ ais job <TAB-TAB>
2start stop wait rm show

and further:

1$ ais job --help
2NAME:
3 ais job - monitor, query, start/stop and manage jobs and eXtended actions (xactions)
4
5USAGE:
6 ais job command [arguments...] [command options]
7
8COMMANDS:
9 start run batch job
10 stop terminate a single batch job or multiple jobs (press <TAB-TAB> to select, '--help' for options)
11 wait wait for a specific batch job to complete (press <TAB-TAB> to select, '--help' for options)
12 rm cleanup finished jobs
13 show show running and finished jobs ('--all' for all, or press <TAB-TAB> to select, '--help' for options)
14
15OPTIONS:
16 --help, -h show help

Notice, though, that start, stop, and wait` (verbs) have shorter versions, e.g.:

  • ais start is a built-in alias for ais job start, and so on.

For all configured pre-built and user-defined aliases (aka “shortcuts”), run ais alias or ais alias --help

Table of Contents

Start job

ais start <JOB_NAME> [arguments...]

Start a certain job. Some jobs require additional arguments such as bucket name to execute.

Note: job start download|dsort have slightly different options. Please see their documentation for more:

Examples

Start cluster-wide LRU

Starts LRU xaction on all nodes

1$ ais start lru
2Started "lru" xaction.

An administrator may choose to run LRU on a subset of buckets. This can be achieved by using the --buckets flag to provide a comma-separated list of buckets, for instance --buckets bck1,gcp://bck2, on which LRU needs to be performed. Additionally, the --force(-f) option can be used to override the bucket’s lru.enabled property.

Note: To ensure safety, the force flag (-f) only works when a list of buckets is provided.

1$ ais start lru --buckets ais://buck1,aws://buck2 -f

Re-chunk objects

Re-chunking converts objects between monolithic and chunked representations based on the specified chunking parameters. The job processes objects in the bucket according to the configured threshold:

  • Objects below objsize_limit are stored as monolithic (single file)
  • Objects at or above objsize_limit are split into chunks of chunk_size
  • When objsize_limit is 0, chunking is disabled and all objects are restored as monolithic

Usage:

1$ ais rechunk BUCKET [--chunk-size SIZE] [--objsize-limit SIZE] [--prefix PREFIX]

Flags:

  • --chunk-size SIZE - Size of each chunk (e.g., 16MiB, 20mb). Optional: if omitted, uses the bucket’s current chunk_size
  • --objsize-limit SIZE - Object size threshold for chunking (e.g., 50MiB, 100mb); objects >= this size will be chunked. Optional: if omitted, uses the bucket’s current objsize_limit
  • --prefix PREFIX - Only rechunk objects with the specified prefix (can also be embedded in the bucket URI)
  • --wait - Wait for the job to complete before returning
  • --wait-timeout DURATION - Maximum time to wait (e.g., 5m, 1h)
  • --yes, -y - Assume ‘yes’ to all prompts (skip confirmation)

Note: If either size argument is missing, you will be prompted to confirm using the bucket’s current configuration.

Examples:

Rechunk using the bucket’s existing chunk configuration (prompts for confirmation):

1$ ais rechunk ais://mybucket
2Rechunk configuration:
3 chunk_size: 16MiB
4 objsize_limit: 50MiB
5Proceed with these values? [Y/N]: y
6Started "rechunk" xaction "rechunk[aBc123]": ais://mybucket. To monitor, run 'ais show job aBc123'

Rechunk with one explicit flag and one from bucket (prompts for confirmation):

1$ ais rechunk ais://mybucket --chunk-size 32MiB
2Rechunk configuration:
3 chunk_size: 32MiB
4 objsize_limit: 50MiB
5Proceed with these values? [Y/N]: y
6Started "rechunk" xaction "rechunk[dEf456]": ais://mybucket. To monitor, run 'ais show job dEf456'

Rechunk all objects with both flags explicitly provided (no prompt):

1$ ais rechunk ais://mybucket --chunk-size 16MiB --objsize-limit 50MiB
2Started "rechunk" xaction "rechunk[gHi789]": ais://mybucket. To monitor, run 'ais show job gHi789'

Rechunk only objects with a specific prefix using embedded prefix in the URI:

1$ ais rechunk ais://mybucket/images/ --chunk-size 16MiB --objsize-limit 50MiB
2Started "rechunk" xaction "rechunk[mNo345]": ais://mybucket (prefix: "images/"). To monitor, run 'ais show job mNo345'

Disable chunking and restore all objects as monolithic:

1$ ais rechunk ais://mybucket --chunk-size 16MiB --objsize-limit 0

Wait for the rechunk job to complete:

1$ ais rechunk ais://mybucket --chunk-size 16MiB --objsize-limit 50MiB --wait
2Done.

Note: Regardless of objsize_limit value (even when disabled), objects exceeding the bucket’s maxMonolithicSize configuration will be automatically chunked at that size limit for performance and storage management reasons.

See also: bucket properties, ais bucket props

Stop job

Stop a single job or multiple jobs.

1$ ais stop --help
2NAME:
3 ais stop - (alias for "job stop") terminate a single batch job or multiple jobs, e.g.:
4 - 'stop tco-cysbohAGL' - terminate a given (multi-object copy/transform) job identified by its unique ID;
5 - 'stop copy-listrange' - terminate all multi-object copies;
6 - 'stop copy-objects' - same as above (using display name);
7 - 'stop list' - stop all list-objects jobs;
8 - 'stop ls' - same as above;
9 - 'stop prefetch-listrange' - stop all prefetch jobs;
10 - 'stop prefetch' - same as above;
11 - 'stop g731 --force' - forcefully abort global rebalance g731 (advanced usage only);
12 - 'stop --all' - terminate all running jobs
13 press <TAB-TAB> to select, '--help' for more options.
14
15USAGE:
16 ais stop [NAME] [JOB_ID] [NODE_ID] [BUCKET] [command options]
17
18OPTIONS:
19 --all all running jobs
20 --regex value regular expression to select jobs by name, kind, or description, e.g.: --regex "ec|mirror|elect"
21 --force, -f force execution of the command (caution: advanced usage only)
22 --yes, -y assume 'yes' to all questions
23 --help, -h show help

Examples stopping a single job:

  • ais stop download JOB_ID
  • ais stop JOB_ID
  • ais stop dsort JOB_ID

Examples stopping multiple jobs:

  • ais stop download --all # stop all downloads
  • ais stop copy-bucket ais://abc --all # stop all copy-bucket jobs where the destination bucket is ais://abc
  • ais stop resilver t[rt2erGhbr] # ask target t[rt2erGhbr] to stop resilvering

and more.

Note: job stop download|dsort have slightly different options. Please see their documentation for more:

More Examples

Stop cluster-wide LRU

Stops currently running LRU eviction.

1$ ais stop lru
2Stopped LRU eviction.

Show job

ais show job [NAME] [JOB_ID] [NODE_ID] [BUCKET] [command options]

1$ ais show job --help
2NAME:
3 ais show job - Show running and/or finished jobs:
4 archive blob-download cleanup copy-bucket copy-objects delete-objects
5 download dsort ec-bucket ec-get ec-put ec-resp
6 elect-primary etl-bucket etl-inline etl-objects evict-objects evict-remote-bucket
7 list lru-eviction mirror prefetch-objects promote-files put-copies
8 rebalance rechunk rename-bucket resilver summary warm-up-metadata
9 (use any of these names with 'ais show job' command, or try shortcuts: "evict", "prefetch", "copy", "delete", "ec")
10 e.g.:
11 - show job prefetch-listrange - show all running prefetch jobs;
12 - show job prefetch - same as above;
13 - show job prefetch --top 5 - show 5 most recent prefetch jobs;
14 - show job tco-cysbohAGL - show a given (multi-object copy/transform) job identified by its unique ID;
15 - show job copy-listrange - show all running multi-object copies;
16 - show job copy-objects - same as above (using display name);
17 - show job copy - show all copying jobs including both bucket-to-bucket and multi-object;
18 - show job copy-objects --all - show both running and already finished (or stopped) multi-object copies;
19 - show job copy-objects --all --top 10 - show 10 most recent multi-object copy jobs;
20 - show job rechunk - show all running rechunk jobs;
21 - show job ec - show all erasure-coding;
22 - show job list - show all running list-objects jobs;
23 - show job ls - same as above;
24 - show job ls --refresh 10 - same as above with periodic _refreshing_ every 10 seconds;
25 - show job ls --refresh 10 --count 4 - same as above but only for the first four 10-seconds intervals;
26 - show job prefetch --refresh 1m - show all running prefetch jobs at 1 minute intervals (until Ctrl-C);
27 - show job evict - all running bucket and/or data evicting jobs;
28 - show job --all - show absolutely all jobs, running and finished.
29
30USAGE:
31 ais show job [NAME] [JOB_ID] [NODE_ID] [BUCKET] [command options]
32
33OPTIONS:
34 --all Include all jobs: running, finished, and aborted
35 --count value Used together with '--refresh' to limit the number of generated reports, e.g.:
36 '--refresh 10 --count 5' - run 5 times with 10s interval (default: 0)
37 --date-time Override the default hh:mm:ss (hours, minutes, seconds) time format - include calendar date as well
38 --json, -j JSON input/output
39 --log value Filename to log metrics (statistics)
40 --no-headers, -H Display tables without headers
41 --progress Show progress bar(s) and progress of execution in real time
42 --refresh value Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
43 valid time units: ns, us (or µs), ms, s (default), m, h
44 --regex value Regular expression to select jobs by name, kind, or description, e.g.: --regex "ec|mirror|elect"
45 --top value Show top N most recent jobs (e.g., --top 5 to show the 5 most recent jobs)
46 --units value Show statistics and/or parse command-line specified sizes using one of the following units of measurement:
47 iec - IEC format, e.g.: KiB, MiB, GiB (default)
48 si - SI (metric) format, e.g.: KB, MB, GB
49 raw - do not convert to (or from) human-readable format
50 --verbose, -v Show extended statistics
51 --help, -h Show help

You can show jobs by any combination of the optional (filtering) arguments: NAME, JOB_ID, etc..

Use --all option to include finished (or aborted) jobs.

As usual, press <TAB-TAB> to select and see —help` for details.

job show download|dsort have slightly different options. Please see their documentation for more:

Show extended statistics

All jobs show the number of processed objects(column OBJECTS) and the total size of the data(column BYTES). Both values are cumulative for the entire job’s life-time.

Certain kinds of supported jobs provide extended statistics, including:

Show EC Encoding Statistics

The output contains a few extra columns:

  • ERRORS - the total number of objects EC failed to encode
  • QUEUE - the average length of working queue: the average number of objects waiting in the queue when a new EC encode request received. Values close to 0 mean that every object was processed immediately after the request had been received
  • AVG TIME - the average total processing time for an object: from the moment the object is put to the working queue and to the moment the last encoded slice is sent to another target
  • ENC TIME - the average amount of time spent on encoding an object.

The extended statistics may give a hint what is the possible bottleneck:

  • high values in QUEUE - EC is congested and does not have time to process all incoming requests
  • low values in QUEUE and ENC TIME, but high ones in AVG TIME may mean that the network is slow and a lot of time spent on sending the encoded slices
  • low values in QUEUE, and ENC TIME close to AVG TIME may mean that the local hardware is overloaded: either local drives or CPUs are overloaded.

Show EC Restoring Statistics

Show information about EC restore requests.

The output contains a few extra columns:

  • ERRORS - the total number of objects EC failed to restore
  • QUEUE - the average length of working queue: the average number of objects waiting in the queue when a new EC encode request received. Values close to 0 mean that every object was processed immediately after the request had been received
  • AVG TIME - the average total processing time for an object: from the moment the object is put to the working queue and to the moment the last encoded slice is sent to another target

Examples using --top flag

Show 2 most recent list jobs:

1$ ais show job list --top 2
2list[LIM_Ua6LV] (ctl: ais://job-top-test, props:name,size,cached)
3NODE ID KIND BUCKET OBJECTS BYTES START END STATE
4ZXzt8081 LIM_Ua6LV list ais://job-top-test 6 36B 22:29:39 - Running
5list[q9MIgaNLO] (ctl: ais://job-top-test, props:name,size,cached)
6NODE ID KIND BUCKET OBJECTS BYTES START END STATE
7ZXzt8081 q9MIgaNLO list ais://job-top-test 6 36B 22:29:39 - Running

Show 1 most recent job of any type:

1$ ais show job --top 1
2list[LIM_Ua6LV] (ctl: ais://job-top-test, props:name,size,cached)
3NODE ID KIND BUCKET OBJECTS BYTES START END STATE
4ZXzt8081 LIM_Ua6LV list ais://job-top-test 6 36B 22:29:39 - Running

Show 2 most recent list jobs including finished ones:

1$ ais show job list --all --top 2
2list[LIM_Ua6LV] (ctl: ais://job-top-test, props:name,size,cached)
3NODE ID KIND BUCKET OBJECTS BYTES START END STATE
4ZXzt8081 LIM_Ua6LV list ais://job-top-test 6 36B 22:29:39 - Running
5list[q9MIgaNLO] (ctl: ais://job-top-test, props:name,size,cached)
6NODE ID KIND BUCKET OBJECTS BYTES START END STATE
7ZXzt8081 q9MIgaNLO list ais://job-top-test 6 36B 22:29:39 - Running

Options

FlagTypeDescriptionDefault
--jsonboolOutput details in JSON formatfalse
--allboolIf set, additionally displays old, finished xactionsfalse
--activeboolIf set, displays only running xactionsfalse
--verbose -vboolIf set, displays all xaction statistics including extended ones. If the number of xaction to display is greater than one, the flag is ignored.false

Certain extended actions have additional CLI. In particular, rebalance stats can also be displayed using the following command:

ais show rebalance

Display details about the most recent rebalance xaction.

FlagTypeDescriptionDefault
--refreshdurationRefresh interval - time duration between reports. The usual unit suffixes are supported and include m (for minutes), s (seconds), ms (milliseconds). Ctrl-C to stop monitoring.
--allboolIf set, show all rebalance xactionsfalse

Output of this command differs from the generic xaction output.

Examples

Default compact tabular view:

1$ ais show job --all
2NODE ID KIND BUCKET OBJECTS BYTES START END STATE
3zXZXt8084 FXjl0NWGOU ec-put TESTAISBUCKET-ec-mpaths 5 4.56MiB 12-02 13:04:50 12-02 13:04:50 Aborted

Verbose tabular view:

1$ ais show job FXjl0NWGOU --verbose
2PROPERTY VALUE
3.aborted true
4.bck ais://TESTAISBUCKET-ec-mpaths
5.end 12-02 13:04:50
6.id FXjl0NWGOU
7.kind ec-put
8.start 12-02 13:04:50
9ec.delete.err.n 0
10ec.delete.n 0
11ec.delete.time 0s
12ec.encode.err.n 0
13ec.encode.n 5
14ec.encode.size 4.56MiB
15ec.encode.time 16.964552ms
16ec.obj.process.time 17.142239ms
17ec.queue.len.n 0
18in.obj.n 0
19in.obj.size 0
20is_idle true
21loc.obj.n 5
22loc.obj.size 4.56MiB
23out.obj.n 0
24out.obj.size 0

Wait for job

ais wait [NAME] [JOB_ID] [NODE_ID] [BUCKET]

Wait for the specified job to finish.

job wait download|dsort have slightly different options. Please see their documentation for more:

Options

1$ ais wait --help
2
3NAME:
4 ais wait - (alias for "job wait") wait for a specific batch job to complete (press <TAB-TAB> to select, '--help' for more options)
5
6USAGE:
7 ais wait [NAME] [JOB_ID] [NODE_ID] [BUCKET] [command options]
8
9OPTIONS:
10 --progress Show progress bar(s) and progress of execution in real time
11 --refresh value Time interval for continuous monitoring; can be also used to update progress bar (at a given interval);
12 valid time units: ns, us (or µs), ms, s (default), m, h
13 --timeout value Maximum time to wait for a job to finish; if omitted: wait forever or until Ctrl-C;
14 valid time units: ns, us (or µs), ms, s (default), m, h
15 --help, -h Show help

Distributed Sort

ais start dsort

Run dSort. Further reference for this command can be found here.

Downloader

ais start download or ais start download

Run the AIS Downloader. Further reference for this command can be found here.