For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://docs.nvidia.com/aistore/llms.txt. For full documentation content, see https://docs.nvidia.com/aistore/llms-full.txt.

# AIStore Observability: CLI

> The CLI is the fastest way to interrogate an AIS cluster from a terminal. This page is a jump‑table to the handful of commands every SRE or developer uses when triaging performance or capacity issues. For full syntax hit &lt;kbd>--help&lt;/kbd> on any command or see the separate [CLI reference](/aistore/cli).

## Table of Contents
- [Installation](#installation)
- [Cluster Status](#cluster-status)
  - [Example: Node-level Alerts](#example-node-level-alerts)
- [Node Alerts](#node-alerts)
- [Live Performance Monitoring](#live-performance-monitoring)
  - [Key Flags](#key-flags)
- [Log Management](#log-management)
- [Common Command Examples](#common-command-examples)
- [Best Practices](#best-practices)
- [Troubleshooting Common Issues](#troubleshooting-common-issues)
- [CLI Resources](#cli-resources)
- [Related Documentation](#related-documentation)

## Installation

There are several ways to install AIS CLI:

1. Using the installation script (recommended):

```console
./scripts/install_from_binaries.sh --help
```

This script installs [aisloader](/aistore/aisloader) and [CLI](/aistore/cli) from the latest or previous GitHub [release](https://github.com/NVIDIA/aistore/releases) and enables CLI auto-completions.

2. Follow the [quick-start](/aistore/getting_started#quick-start) instructions.

3. For detailed introduction (including installation) and usage, see the [CLI Overview](/aistore/cli).

After installation, configure your AIS endpoint via the `ais config cli` command or environment variables:

```console
## HTTP
export AIS_ENDPOINT=http://your-ais-cluster-endpoint:port

## or HTTPS
export AIS_ENDPOINT=https://your-ais-cluster-endpoint:port
```

## Cluster Status

| Question                            | Command                 | Typical flags   |
| ----------------------------------- | ----------------------- | --------------- |
| Nodes and their respective health? Any alerts? Out of space? Out of memory? | `ais show cluster`  | `--refresh 1m` |
| How much space is left?             | `ais storage summary`   | `--cached`, `--units`, `--prefix`, `--refresh`  |
| Are any mountpaths down?            | `ais storage mountpath` | `--fshc` (to run filesystem health checker), `--rescan-disks` |

```console
# Get summary of cluster membership, capacity, and health
ais show cluster

# As always, this (and all other) command's options are available via `--help`
ais show cluster --help
```

### Example: Node-level Alerts

```console
$ ais show cluster

PROXY            MEM AVAIL  SYS CPU(%)  UPTIME      STATUS  ALERT
p[KKFpNjqo][P]   127.77GiB  32%         108h30m40s  online  **tls-cert-will-soon-expire**
...

TARGET           MEM AVAIL  CAP USED(%)     CAP AVAIL   SYS CPU(%)  UPTIME      STATUS  ALERT
t[pDztYhhb]      98.02GiB   16%             960.824GiB  61%         108h30m1s   online  **tls-cert-will-soon-expire**
...
...
```

## Node Alerts

AIStore node states are categorized into three severity levels:

1. **Red Alerts** - Critical issues requiring immediate attention:
   - `OOS` - Out of space condition
   - `OOM` - Out of memory condition
   - `OOCPU` - Out of CPU resources
   - `DiskFault` - Disk failures detected
   - `NoMountpaths` - No available mountpaths
   - `NumGoroutines` - Excessive number of goroutines
   - `CertificateExpired` - TLS certificate has expired
   - `CertificateInvalid` - TLS certificate is invalid

2. **Warning Alerts** - Potential issues that may require attention:
   - `Rebalancing` - Rebalance operation in progress
   - `RebalanceInterrupted` - Rebalance was interrupted
   - `Resilvering` - Resilvering operation in progress
   - `ResilverInterrupted` - Resilver was interrupted
   - `NodeRestarted` - Node was restarted (powercycle, crash)
   - `MaintenanceMode` - Node is in maintenance mode
   - `LowCapacity` - Low storage capacity (OOS possible soon)
   - `LowMemory` - Low memory condition (OOM possible soon)
   - `LowCPU` - Low CPU availability
   - `CertWillSoonExpire` - TLS certificate will expire soon
   - `KeepAliveErrors` - Recent keep-alive errors detected

3. **Information States** - Normal operational states:
   - `ClusterStarted` - Cluster has started (primary) or node has joined cluster
   - `NodeStarted` - Node has started (may not have joined cluster yet)
   - `VoteInProgress` - Voting process is in progress

Node state flags are also exposed via Prometheus metrics - for details, see:

* [Node Alerts in AIStore Prometheus docs](/aistore/monitoring-prometheus#node-alerts).

## Live Performance Monitoring

`ais performance` (alias `ais show performance`) exposes five sub‑commands. The two most used are **throughput** and **latency**.

```console
# 30‑second rolling throughput for all targets
$ ais performance throughput --refresh 30

# 10‑second latency slice, filter to GET operations
$ ais performance latency --refresh 10 --regex "get"
```

### Key Flags

| Flag              | Meaning                               |
| ----------------- | ------------------------------------- |
| `--refresh <dur>` | Continuous mode; prints every *dur*   |
| `--count <n>`     | Stop after *n* refreshes              |
| `--regex <re>`    | Show only columns matching the regexp |
| `--no‑headers`    | Suppress table headers                |

> See [`cli-performance.md`](/aistore/cli/performance) for sub‑command specifics.

## Log Management

| Task                                   | Command                                     |
| -------------------------------------- | ------------------------------------------- |
| Tail a given node's log                | `ais log show --refresh DURATION --help`    |
| Download all logs for a support bundle | `ais cluster download-logs`                 |
| Rotate logs on one node                | `ais advanced rotate-logs <NODE_ID>`        |

For more details on log configuration and analysis, see [Observability: Logs](/aistore/monitoring-logs).

## Common Command Examples

Here are some frequently used command combinations for everyday operations:

```console
# Daily capacity & health snapshot
ais show cluster && ais storage summary

# Watch GET latency for a single target
ais performance latency t[EkMt8081] --refresh 30 --regex "get(\(t\)|cold)"

# Verify no misplaced objects in GCS buckets (non‑recursive)
ais scrub gs --nr --refresh 20s --count 3
```

> Flags such as `--refresh <duration>`, `--count <n>`, `--regex <re>`, `--no-headers`, and `--units` are accepted by most monitoring commands; see `--help` for the definitive list.

## Best Practices

- **Regular Health Checks**: Run `ais show cluster` and `ais storage summary` daily to ensure cluster health and capacity
- **Performance Baselines**: Establish baseline performance with `ais performance show` after initial deployment
- **Monitoring Script**: Create a shell script with key monitoring commands for daily checks
- **Alert Integration**: Pipe CLI output to monitoring systems for automated alerting
- **Log Collection**: To collect logs, integrate with a Kubernetes monitoring stack or (at least) use `ais cluster download-logs`

## Troubleshooting Common Issues

| Issue | CLI Command | What to Look For |
|-------|------------|------------------|
| Node experiencing problems or went offline | `ais show cluster` | Check the ALERT column (example above) |
| Disk failures | `ais storage mountpath` | Look for disabled or detached mountpaths |
| Performance degradation | `ais performance --refresh 30s` | Compare against baseline numbers |
| Failed operations | `ais log show --severity error` | Common error patterns |
| Network issues | `ais status network` | High latency or timeout errors |

## CLI Resources

- [`ais help`](/aistore/cli/help)
- [Reference guide](https://github.com/NVIDIA/aistore/blob/main/docs/cli.md#cli-reference)
- [Monitoring](/aistore/cli/show)
  - [`ais show cluster`](/aistore/cli/show)
  - [`ais show performance`](/aistore/cli/show)
  - [`ais show job`](/aistore/cli/show)
  - [`ais show config`](/aistore/cli/show)
- [Cluster and node management](/aistore/cli/cluster)
- [Mountpath (disk) management](/aistore/cli/storage)
- [Attach, detach, and monitor remote clusters](/aistore/cli/cluster)
- [Start, stop, and monitor downloads](/aistore/cli/download)
- [Distributed shuffle](/aistore/cli/dsort)
- [User account and access management](/aistore/cli/auth)
- [Jobs](/aistore/cli/job)
- [AIS CLI Reference](/aistore/cli)

## Related Documentation

| Document | Description |
|----------|-------------|
| [Overview](/aistore/monitoring-overview) | Introduction to AIS observability |
| [Logs](/aistore/monitoring-logs) | Configuring, accessing, and utilizing AIS logs |
| [Prometheus](/aistore/monitoring-prometheus) | Configuring Prometheus with AIS |
| [Metrics Reference](/aistore/monitoring-metrics) | Complete metrics catalog |
| [Grafana](/aistore/monitoring-grafana) | Visualizing AIS metrics with Grafana |
| [Kubernetes](/aistore/monitoring-kubernetes) | Working with Kubernetes monitoring stacks |