CLI Reference for ETLs
This section documents ETL management operations with ais etl.
As with global rebalance, dSort, and download, all ETL management commands can also be executed via
ais jobandais show—the commands that, by definition, support all AIS xactions, including AIS-ETL.
In the ais etl namespace, the commands include:
For background on AIS-ETL, getting started, working examples, and tutorials, please refer to:
Table of Contents
Getting Started
Initializing an ETL
ETL Management
ETL Lifecycle Operations
Data Transformation
Commands
Top-level ETL commands include init, stop, show, and more:
Additionally, use --help to display any specific command.
Initializing an ETL
AIStore provides two ways to initialize an ETL using the CLI:
1. Using a Runtime ETL Specification (Recommended)
This method uses a YAML file that defines how your ETL should be initialized and run.
Key Fields in the Spec
Sample ETL Spec
CLI Usage
Note: CLI parameters take precedence over the spec file.
2. Using a Full Kubernetes Pod Spec (Advanced)
Use this option if you need full control over the ETL container’s deployment—such as advanced init containers, health checks, or if you’re not using the AIS ETL framework.
Example Pod Spec
CLI Usage
Additional Notes
-
You can define multiple ETLs in a single YAML file by separating them with the standard YAML document separator
---.Example:
-
You may override fields in the spec using CLI flags such as
--name,--comm-type, etc.However, if your YAML file contains multiple ETL definitions, override flags cannot be used and will result in an error.
In such cases, you should either:
- Remove the override flags and apply the full multi-ETL spec as-is, or
- Split the YAML file into individual files and initialize each ETL separately:
Listing ETLs
To view all currently initialized ETLs in the AIStore cluster, use either of the following commands:
or the equivalent:
This will display all available ETLs along with their current status (initializing, running, stopped, etc.).
View ETL Specification
To view detailed information about one or more ETL jobs and their configuration, use:
This command displays detailed attributes of each ETL, including:
- ETL Name
- Communication Type
- Runtime Configuration
- Container image
- Command
- Environment variables
- ETL Source (Full Pod specification, if applicable)
Note: You can also use the alias
ais show etl <ETL_NAME> [<ETL_NAME> ...]for the same functionality.
View ETL Errors
Use this command to view errors encountered during ETL processing—either during inline transformations or offline (bucket-to-bucket) jobs.
Inline ETL Errors
To list errors from inline object transformations:
Example Output:
Offline ETL (Bucket-to-Bucket) Errors
To list errors from a specific offline ETL job, include the job ID:
Example Output:
Here, <your-custom-error> refers to the error raised from within your custom transform function (e.g., in Python).
View ETL Logs
Use the following command to view logs for a specific ETL container:
<ETL_NAME>: Name of the ETL.[TARGET_ID](optional): Retrieve logs from a specific target node. If omitted, logs from all targets will be aggregated.
Stop ETL
Stops a running ETL and tears down its underlying Kubernetes resources.
- Frees up system resources without deleting the ETL definition.
- ETL can be restarted later without reinitialization.
You can also stop ETLs from a specification file:
- Supports multi-ETL YAML files separated by
---.
More info ETL Pod Lifecycle
Start ETL
Restarts a previously stopped ETL by recreating its associated containers on each target.
- Useful when resuming work after a manual or error-triggered stop.
- Retains all original configuration and transformation logic.
You can also start ETLs from a specification file:
- Supports multi-ETL YAML files separated by
---.
More info ETL Pod Lifecycle
Remove (Delete) ETL
Remove (delete) ETL jobs.
- Useful when resuming work after a manual or error-triggered stop.
- Retains all original configuration and transformation logic.
You can also remove ETLs from a specification file:
- Supports multi-ETL YAML files separated by
---.
More info ETL Pod Lifecycle
Inline Transformation
Use inline transformation to process an object on-the-fly with a registered ETL. The transformed output is streamed directly to the client.
Examples
Transform an object and print to STDOUT
Output:
Transform an object and save the output to a file
Output:
Transform an object using ETL arguments
Use runtime arguments for customizable transformations. The argument is passed as a query parameter (etl_args) and must be handled by the ETL web server.
Output:
Learn more: Inline ETL Transformation
Single-Object Transformation
For operations on selected objects, use ais object and its subcommands.
In particular, notice two highlighted subcommands:
To transform or copy a single object, you can interchangeably use ais object etl (or ais object cp), or
their respective aliases - as shown below.
Examples
Copy and transform to a destination object (same or different bucket)
This command applies the ETL to the source object and stores the transformed result at the destination location.
<ETL_NAME>is the name of the registered ETLcpindicates copy-and-transform<SOURCE_OBJECT>is the full AIS URL of the object to transform<DESTINATION>is either a specific object or a destination bucket (preserving source name)
For details and performance, see technical blog: Single-Object Transformation.
Offline Transformation
Use offline transformation to process entire buckets or a selected set of objects. The result is saved in a new destination bucket.
Here’s the command’s help as of v3.30:
Available Flags
Examples
Transform an entire bucket
Transform a subset of objects using a template
Apply extension mapping and add a prefix
Perform a dry-run to preview changes
Output:
Learn more: Offline ETL Transformation