For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Blog
DocsAPI Reference
DocsAPI Reference
    • AIStore
    • Documentation
  • Core Documentation
    • In-depth Overview
    • Terminology and core abstractions
    • Getting Started
    • Networking model
    • Buckets: design, operations, namespaces, and system buckets
    • Observability overview
    • CLI overview
    • Production deployment
    • Technical Blog
  • APIs, SDKs, and Compatibility
    • Go API
    • Python SDK
    • PyPI package
    • Python SDK reference guide
    • PyTorch integration
    • TensorFlow integration
    • HTTP API reference
    • curl examples
    • Easy URL
    • S3 compatibility
    • s3cmd quick start
    • Presigned S3 requests
    • Boto3 support
  • Command-Line Interface
    • CLI overview
    • ais help
    • CLI reference guide
    • Bucket operations
    • Cluster and remote-cluster management
    • Storage and mountpath management
    • Monitoring and ais show
    • Downloads
    • Jobs
    • Authentication and access control
    • Configuration via CLI
    • ETL CLI
    • Distributed shuffle CLI
    • ML / get-batch CLI
    • GCP credentials
    • TLS certificate management
  • Storage and Data Management
    • Storage services
    • Buckets: design, operations, namespaces, and system buckets
    • Native Bucket Inventory (NBI)
    • Backend providers
    • On-disk layout
    • Virtual directories
    • System files
    • Evicting remote buckets and cached data
  • Cluster Operations
    • Node lifecycle: maintenance, shutdown, decommission
    • Global rebalance
    • Resilver
    • AIS in Containerized Environments
    • Highly available control plane
    • Information Center (IC)
    • Out-of-band updates
    • Troubleshooting
  • Configuration and Security
    • Configuration
    • Environment variables
    • Feature flags
    • AuthN and access control
    • Authentication validation
    • HTTPS and certificates
    • Switching a cluster to HTTPS
  • ETL and Advanced Workflows
    • ETL overview
    • ETL CLI docs
    • ETL Python SDK examples
    • Custom transformers
    • ETL Python webserver SDK
    • ETL Go webserver package
    • Archives: read, write, and list
    • Distributed shuffle (dsort)
    • Initial sharding utility (ishard)
    • Downloader
    • Blob Downloader
    • Batch object retrieval (get-batch)
    • Batch operations
    • Tools and utilities
    • Extended actions (xactions)
  • Observability, Monitoring, and Performance
    • Observability overview
    • Monitoring with CLI
    • Logs
    • Prometheus integration
    • Metrics reference
    • Grafana dashboards
    • Kubernetes monitoring
    • Distributed tracing
    • Monitoring get-batch
    • AIS load generator (aisloader)
    • Benchmarking AIStore
    • Performance tuning and testing
    • Performance monitoring via CLI
    • Rate limiting
    • Checksumming
    • Filesystem Health Checker (FSHC)
    • Traffic patterns
  • Networking
    • Networking: multi-homing, network separation, IPv6
    • HTTPS configuration
    • Switching to HTTPS
    • Idle connections
    • MessagePack protocol
  • Deployment
    • AIStore on Kubernetes
    • Kubernetes Operator
    • Ansible playbooks
    • Helm charts
    • Deployment monitoring
    • Docker
  • Developer Resources
    • Development guide
    • aisnode command line
    • Build tags
  • Object and Bucket Naming
    • Unicode and special symbols in object and bucket names
    • Extremely long object names
Blog
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoAIStore
On this page
  • Table of Contents
  • Installation
  • Cluster Status
  • Example: Node-level Alerts
  • Node Alerts
  • Live Performance Monitoring
  • Key Flags
  • Log Management
  • Common Command Examples
  • Best Practices
  • Troubleshooting Common Issues
  • CLI Resources
  • Related Documentation
Observability, Monitoring, and Performance

AIStore Observability: CLI

||View as Markdown|
Previous

Batch operations

Next

Logs

The CLI is the fastest way to interrogate an AIS cluster from a terminal. This page is a jump‑table to the handful of commands every SRE or developer uses when triaging performance or capacity issues. For full syntax hit <kbd>—help</kbd> on any command or see the separate CLI reference.

Table of Contents

  • Installation
  • Cluster Status
    • Example: Node-level Alerts
  • Node Alerts
  • Live Performance Monitoring
    • Key Flags
  • Log Management
  • Common Command Examples
  • Best Practices
  • Troubleshooting Common Issues
  • CLI Resources
  • Related Documentation

Installation

There are several ways to install AIS CLI:

  1. Using the installation script (recommended):
1./scripts/install_from_binaries.sh --help

This script installs aisloader and CLI from the latest or previous GitHub release and enables CLI auto-completions.

  1. Follow the quick-start instructions.

  2. For detailed introduction (including installation) and usage, see the CLI Overview.

After installation, configure your AIS endpoint via the ais config cli command or environment variables:

1## HTTP
2export AIS_ENDPOINT=http://your-ais-cluster-endpoint:port
3
4## or HTTPS
5export AIS_ENDPOINT=https://your-ais-cluster-endpoint:port

Cluster Status

QuestionCommandTypical flags
Nodes and their respective health? Any alerts? Out of space? Out of memory?ais show cluster--refresh 1m
How much space is left?ais storage summary--cached, --units, --prefix, --refresh
Are any mountpaths down?ais storage mountpath--fshc (to run filesystem health checker), --rescan-disks
1# Get summary of cluster membership, capacity, and health
2ais show cluster
3
4# As always, this (and all other) command's options are available via `--help`
5ais show cluster --help

Example: Node-level Alerts

1$ ais show cluster
2
3PROXY MEM AVAIL SYS CPU(%) UPTIME STATUS ALERT
4p[KKFpNjqo][P] 127.77GiB 32% 108h30m40s online **tls-cert-will-soon-expire**
5...
6
7TARGET MEM AVAIL CAP USED(%) CAP AVAIL SYS CPU(%) UPTIME STATUS ALERT
8t[pDztYhhb] 98.02GiB 16% 960.824GiB 61% 108h30m1s online **tls-cert-will-soon-expire**
9...
10...

Node Alerts

AIStore node states are categorized into three severity levels:

  1. Red Alerts - Critical issues requiring immediate attention:

    • OOS - Out of space condition
    • OOM - Out of memory condition
    • OOCPU - Out of CPU resources
    • DiskFault - Disk failures detected
    • NoMountpaths - No available mountpaths
    • NumGoroutines - Excessive number of goroutines
    • CertificateExpired - TLS certificate has expired
    • CertificateInvalid - TLS certificate is invalid
  2. Warning Alerts - Potential issues that may require attention:

    • Rebalancing - Rebalance operation in progress
    • RebalanceInterrupted - Rebalance was interrupted
    • Resilvering - Resilvering operation in progress
    • ResilverInterrupted - Resilver was interrupted
    • NodeRestarted - Node was restarted (powercycle, crash)
    • MaintenanceMode - Node is in maintenance mode
    • LowCapacity - Low storage capacity (OOS possible soon)
    • LowMemory - Low memory condition (OOM possible soon)
    • LowCPU - Low CPU availability
    • CertWillSoonExpire - TLS certificate will expire soon
    • KeepAliveErrors - Recent keep-alive errors detected
  3. Information States - Normal operational states:

    • ClusterStarted - Cluster has started (primary) or node has joined cluster
    • NodeStarted - Node has started (may not have joined cluster yet)
    • VoteInProgress - Voting process is in progress

Node state flags are also exposed via Prometheus metrics - for details, see:

  • Node Alerts in AIStore Prometheus docs.

Live Performance Monitoring

ais performance (alias ais show performance) exposes five sub‑commands. The two most used are throughput and latency.

1# 30‑second rolling throughput for all targets
2$ ais performance throughput --refresh 30
3
4# 10‑second latency slice, filter to GET operations
5$ ais performance latency --refresh 10 --regex "get"

Key Flags

FlagMeaning
--refresh <dur>Continuous mode; prints every dur
--count <n>Stop after n refreshes
--regex <re>Show only columns matching the regexp
--no‑headersSuppress table headers

See cli-performance.md for sub‑command specifics.

Log Management

TaskCommand
Tail a given node’s logais log show --refresh DURATION --help
Download all logs for a support bundleais cluster download-logs
Rotate logs on one nodeais advanced rotate-logs <NODE_ID>

For more details on log configuration and analysis, see Observability: Logs.

Common Command Examples

Here are some frequently used command combinations for everyday operations:

1# Daily capacity & health snapshot
2ais show cluster && ais storage summary
3
4# Watch GET latency for a single target
5ais performance latency t[EkMt8081] --refresh 30 --regex "get(\(t\)|cold)"
6
7# Verify no misplaced objects in GCS buckets (non‑recursive)
8ais scrub gs --nr --refresh 20s --count 3

Flags such as --refresh <duration>, --count <n>, --regex <re>, --no-headers, and --units are accepted by most monitoring commands; see --help for the definitive list.

Best Practices

  • Regular Health Checks: Run ais show cluster and ais storage summary daily to ensure cluster health and capacity
  • Performance Baselines: Establish baseline performance with ais performance show after initial deployment
  • Monitoring Script: Create a shell script with key monitoring commands for daily checks
  • Alert Integration: Pipe CLI output to monitoring systems for automated alerting
  • Log Collection: To collect logs, integrate with a Kubernetes monitoring stack or (at least) use ais cluster download-logs

Troubleshooting Common Issues

IssueCLI CommandWhat to Look For
Node experiencing problems or went offlineais show clusterCheck the ALERT column (example above)
Disk failuresais storage mountpathLook for disabled or detached mountpaths
Performance degradationais performance --refresh 30sCompare against baseline numbers
Failed operationsais log show --severity errorCommon error patterns
Network issuesais status networkHigh latency or timeout errors

CLI Resources

  • ais help
  • Reference guide
  • Monitoring
    • ais show cluster
    • ais show performance
    • ais show job
    • ais show config
  • Cluster and node management
  • Mountpath (disk) management
  • Attach, detach, and monitor remote clusters
  • Start, stop, and monitor downloads
  • Distributed shuffle
  • User account and access management
  • Jobs
  • AIS CLI Reference

Related Documentation

DocumentDescription
OverviewIntroduction to AIS observability
LogsConfiguring, accessing, and utilizing AIS logs
PrometheusConfiguring Prometheus with AIS
Metrics ReferenceComplete metrics catalog
GrafanaVisualizing AIS metrics with Grafana
KubernetesWorking with Kubernetes monitoring stacks