CLI Reference for ETLs

View as Markdown

This section documents ETL management operations with ais etl.

As with global rebalance, dSort, and download, all ETL management commands can also be executed via ais job and ais show—the commands that, by definition, support all AIS xactions, including AIS-ETL.

In the ais etl namespace, the commands include:

1$ ais etl <TAB-TAB>
2
3init show view-logs start stop rm object bucket

For background on AIS-ETL, getting started, working examples, and tutorials, please refer to:

Table of Contents

Getting Started

Initializing an ETL

ETL Management

ETL Lifecycle Operations

Data Transformation


Commands

Top-level ETL commands include init, stop, show, and more:

1$ ais etl -h
2NAME:
3 ais etl - Manage and execute custom ETL (Extract, Transform, Load) jobs
4
5USAGE:
6 ais etl command [arguments...] [command options]
7
8COMMANDS:
9 init Initialize ETL using a runtime spec or full Kubernetes Pod spec YAML file (local or remote).
10 Examples:
11 - 'ais etl init -f my-etl.yaml' deploy ETL from a local YAML file;
12 - 'ais etl init -f https://example.com/etl.yaml' deploy ETL from a remote YAML file;
13 - 'ais etl init -f multi-etl.yaml' deploy multiple ETLs from a single file (separated by '---');
14 - 'ais etl init -f spec.yaml --name my-custom-etl' override ETL name from command line;
15 - 'ais etl init -f spec.yaml --comm-type hpull' override communication type;
16 - 'ais etl init -f spec.yaml --object-timeout 30s' set custom object transformation timeout.
17 - 'ais etl init --spec <file|URL>' deploy ETL jobs from a local spec file, remote URL, or multi-ETL YAML.
18
19Additional Info:
20 - You may define multiple ETLs in a single spec file using YAML document separators ('---').
21 - CLI flags like '--name' or '--comm-type' can override values in the spec, but not when multiple ETLs are defined.
22
23 show Show ETL(s).
24 Examples:
25 - 'ais etl show' list all ETL jobs with their status and details;
26 - 'ais etl show my-etl' show detailed specification for a specific ETL job;
27 - 'ais etl show my-etl another-etl' show detailed specifications for multiple ETL jobs;
28 - 'ais etl show errors my-etl' show transformation errors for inline object transformations;
29 - 'ais etl show errors my-etl job-123' show errors for a specific offline (bucket-to-bucket) transform job.
30 view-logs View ETL logs.
31 Examples:
32 - 'ais etl view-logs my-etl' show logs from all target nodes for the specified ETL;
33 - 'ais etl view-logs my-etl target-001' show logs from a specific target node;
34 - 'ais etl view-logs data-converter target-002' view logs from target-002 for data-converter ETL.
35 start Start ETL.
36 Examples:
37 - 'ais etl start my-etl' start the specified ETL (transitions from stopped to running state);
38 - 'ais etl start my-etl another-etl' start multiple ETL jobs by name;
39 - 'ais etl start -f spec.yaml' start ETL jobs defined in a local YAML file;
40 - 'ais etl start -f https://example.com/etl.yaml' start ETL jobs defined in a remote YAML file;
41 - 'ais etl start -f multi-etl.yaml' start all ETL jobs defined in a multi-ETL file;
42 - 'ais etl start --spec <file|URL>' start ETL jobs from a local spec file, remote URL, or multi-ETL YAML.
43
44 stop Stop ETL. Also aborts related offline jobs and can be used to terminate ETLs stuck in 'initializing' state.
45 Examples:
46 - 'ais etl stop my-etl' stop the specified ETL (transitions from running to stopped state);
47 - 'ais etl stop my-etl another-etl' stop multiple ETL jobs by name;
48 - 'ais etl stop --all' stop all running ETL jobs;
49 - 'ais etl stop -f spec.yaml' stop ETL jobs defined in a local YAML file;
50 - 'ais etl stop -f https://example.com/etl.yaml' stop ETL jobs defined in a remote YAML file;
51 - 'ais etl stop stuck-etl' terminate ETL that is stuck in 'initializing' state;
52 - 'ais etl stop --spec <file|URL>' stop ETL jobs from a local spec file, remote URL, or multi-ETL YAML.
53
54 rm Remove ETL.
55 Examples:
56 - 'ais etl rm my-etl' remove (delete) the specified ETL;
57 - 'ais etl rm my-etl another-etl' remove multiple ETL jobs by name;
58 - 'ais etl rm --all' remove all ETL jobs;
59 - 'ais etl rm -f spec.yaml' remove ETL jobs defined in a local YAML file;
60 - 'ais etl rm -f https://example.com/etl.yaml' remove ETL jobs defined in a remote YAML file;
61 - 'ais etl rm running-etl'q remove ETL that is currently running (will be stopped first).
62 - 'ais etl rm --spec <file|URL>' remove ETL jobs from a local spec file, remote URL, or multi-ETL YAML.
63 NOTE: If an ETL is in 'running' state, it will be stopped automatically before removal.
64 object Transform an object.
65 Examples:
66 - 'ais etl object my-etl ais://src/image.jpg /tmp/output.jpg' transform object and save to file;
67 - 'ais etl object my-etl ais://src/data.json -' transform and output to stdout;
68 - 'ais etl object my-etl ais://src/doc.pdf /dev/null' transform and discard output;
69 - 'ais etl object my-etl cp ais://src/image.jpg ais://dst/' transform and copy to another bucket;
70 - 'ais etl object my-etl ais://src/data.xml output.json --args "format=json"' transform with custom arguments.
71 bucket Transform entire bucket or selected objects (to select, use '--list', '--template', or '--prefix').
72 Examples:
73 - 'ais etl bucket my-etl ais://src ais://dst' transform all objects from source to destination bucket;
74 - 'ais etl bucket my-etl ais://src ais://dst --prefix images/' transform objects with prefix 'images/';
75 - 'ais etl bucket my-etl ais://src ais://dst --template "shard-{0001..0999}.tar"' transform objects matching the template;
76 - 'ais etl bucket my-etl s3://remote-src ais://dst --all' transform all objects including non-cached ones;
77 - 'ais etl bucket my-etl ais://src ais://dst --dry-run' preview transformation without executing;
78 - 'ais etl bucket my-etl ais://src ais://dst --num-workers 8' use 8 concurrent workers for transformation;
79 - 'ais etl bucket my-etl ais://src ais://dst --prepend processed/' add prefix to transformed object names.
80
81OPTIONS:
82 --help, -h Show help

Additionally, use --help to display any specific command.

Initializing an ETL

AIStore provides two ways to initialize an ETL using the CLI:


This method uses a YAML file that defines how your ETL should be initialized and run.

Key Fields in the Spec

FieldDescriptionDefault
nameUnique name for the ETL. See naming rulesRequired
runtime.imageDocker image for the ETL containerRequired
runtime.command(Optional) Override the container’s default ENTRYPOINT with custom command and argumentsNone
communication(Optional) Communication method between AIS and the ETL containerhpush://
argument(Optional) Argument passing method: "" (default) or "fqn" (mounts host filesystem)""
init_timeout(Optional) Max time to wait for ETL to become ready5m
obj_timeout(Optional) Max time to process a single object45s
support_direct_put(Optional) Enable direct put optimization for offline transformsfalse

Sample ETL Spec

1name: hello-world-etl
2runtime:
3 image: aistorage/transformer_hello_world:latest
4 # Optional: Override the container entrypoint
5 # command: ["uvicorn", "fastapi_server:fastapi_app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
6
7communication: hpush://
8argument: fqn
9init_timeout: 5m
10obj_timeout: 45s
11support_direct_put: true

CLI Usage

$# From a local file
$$ ais etl init -f spec.yaml
$
$# From a remote URL
$$ ais etl init -f <URL>
$
$# Override values from the spec
$$ ais etl init -f <URL> \
> --name=ETL_NAME \
> --comm-type=COMMUNICATION_TYPE \
> --init-timeout=TIMEOUT \
> --obj-timeout=TIMEOUT

Note: CLI parameters take precedence over the spec file.


2. Using a Full Kubernetes Pod Spec (Advanced)

Use this option if you need full control over the ETL container’s deployment—such as advanced init containers, health checks, or if you’re not using the AIS ETL framework.

Example Pod Spec

1# pod_spec.yaml
2apiVersion: v1
3kind: Pod
4metadata:
5 name: etl-echo
6 annotations:
7 communication_type: "hpush://"
8 wait_timeout: "5m"
9spec:
10 containers:
11 - name: server
12 image: aistorage/transformer_md5:latest
13 ports: [{ name: default, containerPort: 8000 }]
14 command: ["uvicorn", "fastapi_server:fastapi_app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4", "--log-level", "info", "--ws-max-size", "17179869184", "--ws-ping-interval", "0", "--ws-ping-timeout", "86400"]
15 readinessProbe:
16 httpGet: { path: /health, port: default }

CLI Usage

$# Initialize ETL from a Pod spec
$$ ais etl init -f pod_spec.yaml --name transformer-md5

Additional Notes

  • You can define multiple ETLs in a single YAML file by separating them with the standard YAML document separator ---.

    Example:

    1name: hello-world-etl
    2runtime:
    3 image: aistorage/transformer_hello_world:latest
    4---
    5name: md5-etl
    6runtime:
    7 image: aistorage/transformer_md5:latest
  • You may override fields in the spec using CLI flags such as --name, --comm-type, etc.

    However, if your YAML file contains multiple ETL definitions, override flags cannot be used and will result in an error.

    In such cases, you should either:

    • Remove the override flags and apply the full multi-ETL spec as-is, or
    • Split the YAML file into individual files and initialize each ETL separately:

Listing ETLs

To view all currently initialized ETLs in the AIStore cluster, use either of the following commands:

$ais etl show

or the equivalent:

$ais job show etl

This will display all available ETLs along with their current status (initializing, running, stopped, etc.).


View ETL Specification

To view detailed information about one or more ETL jobs and their configuration, use:

$ais etl show <ETL_NAME> [<ETL_NAME> ...]

This command displays detailed attributes of each ETL, including:

  • ETL Name
  • Communication Type
  • Runtime Configuration
    • Container image
    • Command
    • Environment variables
  • ETL Source (Full Pod specification, if applicable)

Note: You can also use the alias ais show etl <ETL_NAME> [<ETL_NAME> ...] for the same functionality.


View ETL Errors

Use this command to view errors encountered during ETL processing—either during inline transformations or offline (bucket-to-bucket) jobs.

Inline ETL Errors

To list errors from inline object transformations:

$ais etl show errors <ETL_NAME>

Example Output:

OBJECT ECODE ERROR
ais://non-exist-obj 404 object not found

Offline ETL (Bucket-to-Bucket) Errors

To list errors from a specific offline ETL job, include the job ID:

$ais etl show errors <ETL_NAME> <OFFLINE-JOB-ID>

Example Output:

OBJECT ECODE ERROR
ais://test-src/7 500 ETL error: <your-custom-error>
ais://test-src/8 500 ETL error: <your-custom-error>
ais://test-src/6 500 ETL error: <your-custom-error>

Here, <your-custom-error> refers to the error raised from within your custom transform function (e.g., in Python).


View ETL Logs

Use the following command to view logs for a specific ETL container:

$ais etl view-logs <ETL_NAME> [TARGET_ID]
  • <ETL_NAME>: Name of the ETL.
  • [TARGET_ID] (optional): Retrieve logs from a specific target node. If omitted, logs from all targets will be aggregated.

Stop ETL

Stops a running ETL and tears down its underlying Kubernetes resources.

$ais etl stop <ETL_NAME> [<ETL_NAME> ...]
  • Frees up system resources without deleting the ETL definition.
  • ETL can be restarted later without reinitialization.

You can also stop ETLs from a specification file:

$ais etl stop -f <spec-file.yaml> # Local file with one or more ETL specs
$ais etl stop -f <URL> # Remote spec file over HTTP(S)
  • Supports multi-ETL YAML files separated by ---.

More info ETL Pod Lifecycle


Start ETL

Restarts a previously stopped ETL by recreating its associated containers on each target.

$ais etl start <ETL_NAME> [<ETL_NAME> ...]
  • Useful when resuming work after a manual or error-triggered stop.
  • Retains all original configuration and transformation logic.

You can also start ETLs from a specification file:

$ais etl start -f <spec-file.yaml> # Local file with one or more ETL specs
$ais etl start -f <URL> # Remote spec file over HTTP(S)
  • Supports multi-ETL YAML files separated by ---.

More info ETL Pod Lifecycle


Remove (Delete) ETL

Remove (delete) ETL jobs.

$ais etl rm <ETL_NAME> [<ETL_NAME> ...]
  • Useful when resuming work after a manual or error-triggered stop.
  • Retains all original configuration and transformation logic.

You can also remove ETLs from a specification file:

$ais etl rm -f <spec-file.yaml> # Local file with one or more ETL specs
$ais etl rm -f <URL> # Remote spec file over HTTP(S)
  • Supports multi-ETL YAML files separated by ---.

More info ETL Pod Lifecycle


Inline Transformation

Use inline transformation to process an object on-the-fly with a registered ETL. The transformed output is streamed directly to the client.

$ais etl object <ETL_NAME> <BUCKET/OBJECT_NAME> <OUTPUT>

Examples

Transform an object and print to STDOUT

$ais etl object transformer-md5 ais://shards/shard-0.tar -

Output:

393c6706efb128fbc442d3f7d084a426

Transform an object and save the output to a file

$ais etl object transformer-md5 ais://shards/shard-0.tar output.txt
$cat output.txt

Output:

393c6706efb128fbc442d3f7d084a426

Transform an object using ETL arguments

Use runtime arguments for customizable transformations. The argument is passed as a query parameter (etl_args) and must be handled by the ETL web server.

$ais etl object transformer-hash-with-args ais://shards/shard-0.tar - --args=123

Output:

4af87d32ee1fb306

Learn more: Inline ETL Transformation


Single-Object Transformation

For operations on selected objects, use ais object and its subcommands.

In particular, notice two highlighted subcommands:

1$ ais object <TAB-TAB>
2get put *cp* *etl* set-custom prefetch show cat
3ls promote archive concat rm evict mv

To transform or copy a single object, you can interchangeably use ais object etl (or ais object cp), or their respective aliases - as shown below.

Examples

Copy and transform to a destination object (same or different bucket)

$ais etl object transformer-md5 cp ais://src/image.jpg ais://dst/image-md5.txt

This command applies the ETL to the source object and stores the transformed result at the destination location.

  • <ETL_NAME> is the name of the registered ETL
  • cp indicates copy-and-transform
  • <SOURCE_OBJECT> is the full AIS URL of the object to transform
  • <DESTINATION> is either a specific object or a destination bucket (preserving source name)

For details and performance, see technical blog: Single-Object Transformation.


Offline Transformation

Use offline transformation to process entire buckets or a selected set of objects. The result is saved in a new destination bucket.

$ais etl bucket <ETL_NAME> <SRC_BUCKET> <DST_BUCKET>

Here’s the command’s help as of v3.30:

$$ ais etl bucket --help
$NAME:
$ ais etl bucket - Transform entire bucket or selected objects (to select, use '--list', '--template', or '--prefix').
$ Examples:
$ - 'ais etl bucket my-etl ais://src ais://dst' transform all objects from source to destination bucket;
$ - 'ais etl bucket my-etl ais://src ais://dst --prefix images/' transform objects with prefix 'images/';
$ - 'ais etl bucket my-etl ais://src ais://dst --template "shard-{0001..0999}.tar"' transform objects matching the template;
$ - 'ais etl bucket my-etl s3://remote-src ais://dst --all' transform all objects including non-cached ones;
$ - 'ais etl bucket my-etl ais://src ais://dst --dry-run' preview transformation without executing;
$ - 'ais etl bucket my-etl ais://src ais://dst --num-workers 8' use 8 concurrent workers for transformation;
$ - 'ais etl bucket my-etl ais://src ais://dst --prepend processed/' add prefix to transformed object names.
$
$USAGE:
$ ais etl bucket ETL_NAME SRC_BUCKET[/OBJECT_NAME_or_TEMPLATE] DST_BUCKET [command options]
$
$OPTIONS:
$ all Transform all objects from a remote bucket including those that are not present (not cached) in cluster
$ cont-on-err Keep running archiving xaction (job) in presence of errors in any given multi-object transaction
$ dry-run Show total size of new objects without really creating them
$ ext Mapping from old to new extensions of transformed objects' names
> force,f Force execution of the command (caution: advanced usage only)
> list Comma-separated list of object or file names, e.g.:
> --list 'o1,o2,o3'
> --list "abc/1.tar, abc/1.cls, abc/1.jpeg"
> or, when listing files and/or directories:
> --list "/home/docs, /home/abc/1.tar, /home/abc/1.jpeg"
> num-workers Number of concurrent workers; if omitted or zero defaults to a number of target mountpaths (disks);
> use (-1) to indicate single-threaded serial execution (ie., no workers);
> any positive value will be adjusted _not_ to exceed the number of target CPUs
> prefix Select virtual directories or objects with names starting with the specified prefix, e.g.:
> '--prefix a/b/c' - matches names 'a/b/c/d', 'a/b/cdef', and similar;
> '--prefix a/b/c/' - only matches objects from the virtual directory a/b/c/
> prepend Prefix to prepend to every object name during operation (copy or transform), e.g.:
> --prepend=abc - prefix all object names with "abc"
> --prepend=abc/ - use "abc" as a virtual directory (note trailing filepath separator)
> - during 'copy', this flag applies to copied objects
> - during 'transform', this flag applies to transformed objects
> template Template to match object or file names; may contain prefix (that could be empty) with zero or more ranges
> (with optional steps and gaps), e.g.:
> --template "" # (an empty or '*' template matches everything)
> --template 'dir/subdir/'
> --template 'shard-{1000..9999}.tar'
> --template "prefix-{0010..0013..2}-gap-{1..2}-suffix"
> and similarly, when specifying files and directories:
> --template '/home/dir/subdir/'
> --template "/abc/prefix-{0010..9999..2}-suffix"
> timeout Maximum time to wait for a job to finish; if omitted: wait forever or until Ctrl-C;
> valid time units: ns, us (or µs), ms, s (default), m, h
> wait Wait for an asynchronous operation to finish (optionally, use '--timeout' to limit the waiting time)
> help, h Show help

Available Flags

FlagDescription
--listComma-separated list of object names (obj1,obj2).
--templateTemplate pattern for object names (obj-{000..100}.tar).
--extExtension transformation map ({jpg:txt}).
--prefixPrefix to apply to output object names.
--waitBlock until transformation is complete.
--requests-timeoutPer-object timeout for transformation.
--dry-runSimulate transformation without modifying cluster state.
--num-workersNumber of concurrent workers to use during transformation.

Examples

Transform an entire bucket

$ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket
$ais wait xaction <XACTION_ID>

Transform a subset of objects using a template

$ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket --template "shard-{10..12}.tar"

Apply extension mapping and add a prefix

$ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket --ext="{in1:out1,in2:out2}" --prefix="etl-" --wait

Perform a dry-run to preview changes

$ais etl bucket transformer-md5 ais://src_bucket ais://dst_bucket --dry-run --wait

Output:

[DRY RUN] No modifications on the cluster
2 objects (20MiB) would have been put into bucket ais://dst_bucket

Learn more: Offline ETL Transformation