AIStore Observability: CLI

The CLI is the fastest way to interrogate an AIS cluster from a terminal. This page is a jump‑table to the handful of commands every SRE or developer uses when triaging performance or capacity issues. For full syntax hit <kbd>—help</kbd> on any command or see the separate CLI reference.

Installation

There are several ways to install AIS CLI:

Using the installation script (recommended):

1 ./scripts/install_from_binaries.sh --help

This script installs aisloader and CLI from the latest or previous GitHub release and enables CLI auto-completions.

Follow the quick-start instructions.
For detailed introduction (including installation) and usage, see the CLI Overview.

After installation, configure your AIS endpoint via the ais config cli command or environment variables:

1 ## HTTP
2 export AIS_ENDPOINT=http://your-ais-cluster-endpoint:port
3 
4 ## or HTTPS
5 export AIS_ENDPOINT=https://your-ais-cluster-endpoint:port

Cluster Status

Question	Command	Typical flags
Nodes and their respective health? Any alerts? Out of space? Out of memory?	`ais show cluster`	`--refresh 1m`
How much space is left?	`ais storage summary`	`--cached`, `--units`, `--prefix`, `--refresh`
Are any mountpaths down?	`ais storage mountpath`	`--fshc` (to run filesystem health checker), `--rescan-disks`

1 # Get summary of cluster membership, capacity, and health
2 ais show cluster
3 
4 # As always, this (and all other) command's options are available via `--help`
5 ais show cluster --help

Example: Node-level Alerts

1 $ ais show cluster
2 
3 PROXY            MEM AVAIL  SYS CPU(%)  UPTIME      STATUS  ALERT
4 p[KKFpNjqo][P]   127.77GiB  32%         108h30m40s  online  **tls-cert-will-soon-expire**
5 ...
6 
7 TARGET           MEM AVAIL  CAP USED(%)     CAP AVAIL   SYS CPU(%)  UPTIME      STATUS  ALERT
8 t[pDztYhhb]      98.02GiB   16%             960.824GiB  61%         108h30m1s   online  **tls-cert-will-soon-expire**
9 ...
10 ...

Node Alerts

AIStore node states are categorized into three severity levels:

Red Alerts - Critical issues requiring immediate attention:
- OOS - Out of space condition
- OOM - Out of memory condition
- OOCPU - Out of CPU resources
- DiskFault - Disk failures detected
- NoMountpaths - No available mountpaths
- NumGoroutines - Excessive number of goroutines
- CertificateExpired - TLS certificate has expired
- CertificateInvalid - TLS certificate is invalid
Warning Alerts - Potential issues that may require attention:
- Rebalancing - Rebalance operation in progress
- RebalanceInterrupted - Rebalance was interrupted
- Resilvering - Resilvering operation in progress
- ResilverInterrupted - Resilver was interrupted
- NodeRestarted - Node was restarted (powercycle, crash)
- MaintenanceMode - Node is in maintenance mode
- LowCapacity - Low storage capacity (OOS possible soon)
- LowMemory - Low memory condition (OOM possible soon)
- LowCPU - Low CPU availability
- CertWillSoonExpire - TLS certificate will expire soon
- KeepAliveErrors - Recent keep-alive errors detected
Information States - Normal operational states:
- ClusterStarted - Cluster has started (primary) or node has joined cluster
- NodeStarted - Node has started (may not have joined cluster yet)
- VoteInProgress - Voting process is in progress

Node state flags are also exposed via Prometheus metrics - for details, see:

Node Alerts in AIStore Prometheus docs.

Live Performance Monitoring

ais performance (alias ais show performance) exposes five sub‑commands. The two most used are throughput and latency.

1 # 30‑second rolling throughput for all targets
2 $ ais performance throughput --refresh 30
3 
4 # 10‑second latency slice, filter to GET operations
5 $ ais performance latency --refresh 10 --regex "get"

Key Flags

Flag	Meaning
`--refresh <dur>`	Continuous mode; prints every dur
`--count <n>`	Stop after n refreshes
`--regex <re>`	Show only columns matching the regexp
`--no‑headers`	Suppress table headers

See cli-performance.md for sub‑command specifics.

Log Management

Task	Command
Tail a given node’s log	`ais log show --refresh DURATION --help`
Download all logs for a support bundle	`ais cluster download-logs`
Rotate logs on one node	`ais advanced rotate-logs <NODE_ID>`

For more details on log configuration and analysis, see Observability: Logs.

Common Command Examples

Here are some frequently used command combinations for everyday operations:

1 # Daily capacity & health snapshot
2 ais show cluster && ais storage summary
3 
4 # Watch GET latency for a single target
5 ais performance latency t[EkMt8081] --refresh 30 --regex "get(\(t\)|cold)"
6 
7 # Verify no misplaced objects in GCS buckets (non‑recursive)
8 ais scrub gs --nr --refresh 20s --count 3

Flags such as --refresh <duration>, --count <n>, --regex <re>, --no-headers, and --units are accepted by most monitoring commands; see --help for the definitive list.

Best Practices

Regular Health Checks: Run ais show cluster and ais storage summary daily to ensure cluster health and capacity
Performance Baselines: Establish baseline performance with ais performance show after initial deployment
Monitoring Script: Create a shell script with key monitoring commands for daily checks
Alert Integration: Pipe CLI output to monitoring systems for automated alerting
Log Collection: To collect logs, integrate with a Kubernetes monitoring stack or (at least) use ais cluster download-logs

Troubleshooting Common Issues

Issue	CLI Command	What to Look For
Node experiencing problems or went offline	`ais show cluster`	Check the ALERT column (example above)
Disk failures	`ais storage mountpath`	Look for disabled or detached mountpaths
Performance degradation	`ais performance --refresh 30s`	Compare against baseline numbers
Failed operations	`ais log show --severity error`	Common error patterns
Network issues	`ais status network`	High latency or timeout errors

CLI Resources

Document	Description
Overview	Introduction to AIS observability
Logs	Configuring, accessing, and utilizing AIS logs
Prometheus	Configuring Prometheus with AIS
Metrics Reference	Complete metrics catalog
Grafana	Visualizing AIS metrics with Grafana
Kubernetes	Working with Kubernetes monitoring stacks

AIStore Observability: CLI

AIStore Observability: CLI

Table of Contents

Installation

Cluster Status

Example: Node-level Alerts

Node Alerts

Live Performance Monitoring

Key Flags

Log Management

Common Command Examples

Best Practices

Troubleshooting Common Issues

CLI Resources

Table of Contents

Installation

Cluster Status

Example: Node-level Alerts

Node Alerts

Live Performance Monitoring

Key Flags

Log Management

Common Command Examples

Best Practices

Troubleshooting Common Issues

CLI Resources

1	## HTTP
2	export AIS_ENDPOINT=http://your-ais-cluster-endpoint:port
3
4	## or HTTPS
5	export AIS_ENDPOINT=https://your-ais-cluster-endpoint:port

1	# Get summary of cluster membership, capacity, and health
2	ais show cluster
3
4	# As always, this (and all other) command's options are available via `--help`
5	ais show cluster --help

1	$ ais show cluster
2
3	PROXY MEM AVAIL SYS CPU(%) UPTIME STATUS ALERT
4	p[KKFpNjqo][P] 127.77GiB 32% 108h30m40s online tls-cert-will-soon-expire
5	...
6
7	TARGET MEM AVAIL CAP USED(%) CAP AVAIL SYS CPU(%) UPTIME STATUS ALERT
8	t[pDztYhhb] 98.02GiB 16% 960.824GiB 61% 108h30m1s online tls-cert-will-soon-expire
9	...
10	...

1	# 30‑second rolling throughput for all targets
2	$ ais performance throughput --refresh 30
3
4	# 10‑second latency slice, filter to GET operations
5	$ ais performance latency --refresh 10 --regex "get"

1	# Daily capacity & health snapshot
2	ais show cluster && ais storage summary
3
4	# Watch GET latency for a single target
5	ais performance latency t[EkMt8081] --refresh 30 --regex "get(\(t\)\|cold)"
6
7	# Verify no misplaced objects in GCS buckets (non‑recursive)
8	ais scrub gs --nr --refresh 20s --count 3

Table of Contents

Installation

Cluster Status

Example: Node-level Alerts

Node Alerts

Live Performance Monitoring

Key Flags

Log Management

Common Command Examples

Best Practices

Troubleshooting Common Issues

CLI Resources

Related Documentation

Table of Contents

Installation

Cluster Status

Example: Node-level Alerts

Node Alerts

Live Performance Monitoring

Key Flags

Log Management

Common Command Examples

Best Practices

Troubleshooting Common Issues

CLI Resources

Related Documentation