AIStore Observability: CLI

View as Markdown

The CLI is the fastest way to interrogate an AIS cluster from a terminal. This page is a jump‑table to the handful of commands every SRE or developer uses when triaging performance or capacity issues. For full syntax hit <kbd>—help</kbd> on any command or see the separate CLI reference.

Table of Contents

Installation

There are several ways to install AIS CLI:

  1. Using the installation script (recommended):
1./scripts/install_from_binaries.sh --help

This script installs aisloader and CLI from the latest or previous GitHub release and enables CLI auto-completions.

  1. Follow the quick-start instructions.

  2. For detailed introduction (including installation) and usage, see the CLI Overview.

After installation, configure your AIS endpoint via the ais config cli command or environment variables:

1## HTTP
2export AIS_ENDPOINT=http://your-ais-cluster-endpoint:port
3
4## or HTTPS
5export AIS_ENDPOINT=https://your-ais-cluster-endpoint:port

Cluster Status

QuestionCommandTypical flags
Nodes and their respective health? Any alerts? Out of space? Out of memory?ais show cluster--refresh 1m
How much space is left?ais storage summary--cached, --units, --prefix, --refresh
Are any mountpaths down?ais storage mountpath--fshc (to run filesystem health checker), --rescan-disks
1# Get summary of cluster membership, capacity, and health
2ais show cluster
3
4# As always, this (and all other) command's options are available via `--help`
5ais show cluster --help

Example: Node-level Alerts

1$ ais show cluster
2
3PROXY MEM AVAIL SYS CPU(%) UPTIME STATUS ALERT
4p[KKFpNjqo][P] 127.77GiB 32% 108h30m40s online **tls-cert-will-soon-expire**
5...
6
7TARGET MEM AVAIL CAP USED(%) CAP AVAIL SYS CPU(%) UPTIME STATUS ALERT
8t[pDztYhhb] 98.02GiB 16% 960.824GiB 61% 108h30m1s online **tls-cert-will-soon-expire**
9...
10...

Node Alerts

AIStore node states are categorized into three severity levels:

  1. Red Alerts - Critical issues requiring immediate attention:

    • OOS - Out of space condition
    • OOM - Out of memory condition
    • OOCPU - Out of CPU resources
    • DiskFault - Disk failures detected
    • NoMountpaths - No available mountpaths
    • NumGoroutines - Excessive number of goroutines
    • CertificateExpired - TLS certificate has expired
    • CertificateInvalid - TLS certificate is invalid
  2. Warning Alerts - Potential issues that may require attention:

    • Rebalancing - Rebalance operation in progress
    • RebalanceInterrupted - Rebalance was interrupted
    • Resilvering - Resilvering operation in progress
    • ResilverInterrupted - Resilver was interrupted
    • NodeRestarted - Node was restarted (powercycle, crash)
    • MaintenanceMode - Node is in maintenance mode
    • LowCapacity - Low storage capacity (OOS possible soon)
    • LowMemory - Low memory condition (OOM possible soon)
    • LowCPU - Low CPU availability
    • CertWillSoonExpire - TLS certificate will expire soon
    • KeepAliveErrors - Recent keep-alive errors detected
  3. Information States - Normal operational states:

    • ClusterStarted - Cluster has started (primary) or node has joined cluster
    • NodeStarted - Node has started (may not have joined cluster yet)
    • VoteInProgress - Voting process is in progress

Node state flags are also exposed via Prometheus metrics - for details, see:

Live Performance Monitoring

ais performance (alias ais show performance) exposes five sub‑commands. The two most used are throughput and latency.

1# 30‑second rolling throughput for all targets
2$ ais performance throughput --refresh 30
3
4# 10‑second latency slice, filter to GET operations
5$ ais performance latency --refresh 10 --regex "get"

Key Flags

FlagMeaning
--refresh <dur>Continuous mode; prints every dur
--count <n>Stop after n refreshes
--regex <re>Show only columns matching the regexp
--no‑headersSuppress table headers

See cli-performance.md for sub‑command specifics.

Log Management

TaskCommand
Tail a given node’s logais log show --refresh DURATION --help
Download all logs for a support bundleais cluster download-logs
Rotate logs on one nodeais advanced rotate-logs <NODE_ID>

For more details on log configuration and analysis, see Observability: Logs.

Common Command Examples

Here are some frequently used command combinations for everyday operations:

1# Daily capacity & health snapshot
2ais show cluster && ais storage summary
3
4# Watch GET latency for a single target
5ais performance latency t[EkMt8081] --refresh 30 --regex "get(\(t\)|cold)"
6
7# Verify no misplaced objects in GCS buckets (non‑recursive)
8ais scrub gs --nr --refresh 20s --count 3

Flags such as --refresh <duration>, --count <n>, --regex <re>, --no-headers, and --units are accepted by most monitoring commands; see --help for the definitive list.

Best Practices

  • Regular Health Checks: Run ais show cluster and ais storage summary daily to ensure cluster health and capacity
  • Performance Baselines: Establish baseline performance with ais performance show after initial deployment
  • Monitoring Script: Create a shell script with key monitoring commands for daily checks
  • Alert Integration: Pipe CLI output to monitoring systems for automated alerting
  • Log Collection: To collect logs, integrate with a Kubernetes monitoring stack or (at least) use ais cluster download-logs

Troubleshooting Common Issues

IssueCLI CommandWhat to Look For
Node experiencing problems or went offlineais show clusterCheck the ALERT column (example above)
Disk failuresais storage mountpathLook for disabled or detached mountpaths
Performance degradationais performance --refresh 30sCompare against baseline numbers
Failed operationsais log show --severity errorCommon error patterns
Network issuesais status networkHigh latency or timeout errors

CLI Resources

DocumentDescription
OverviewIntroduction to AIS observability
LogsConfiguring, accessing, and utilizing AIS logs
PrometheusConfiguring Prometheus with AIS
Metrics ReferenceComplete metrics catalog
GrafanaVisualizing AIS metrics with Grafana
KubernetesWorking with Kubernetes monitoring stacks