AIStore Observability: Logs

View as Markdown

AIStore (AIS) provides comprehensive logging that captures system operations, performance metrics, and error conditions.

Scope. How to configure, collect, and read AIS logs.

AIS logs are the cluster’s ground truth: every proxy or target writes a chronological stream of events, warnings, and periodic performance snapshots. Well‑rotated logs let operators:

  • Reconstruct incidents (root‑cause analysis)
  • Correlate client symptoms with internal state changes
  • Spot long‑running jobs without polling the control plane

Table of Contents

Configuring Logging

$# Show the cluster‑wide logging section (current values)
$ais config cluster log
KeyPurposeTypical prod value
levelInfo verbosity 0-5 (3 = normal, 4/5 = chatty, <3 disables info). W and E are always logged.3
modulesSpace‑separated list of modules whose info lines are forced to level 5 (e.g., ec space). Use none to clear the override.none
max_sizeRotate when a single file exceeds this size32MiB
max_totalUpper bound for the entire directory (oldest files deleted first)1GiB
flush_timeHow often each daemon flushes its in‑memory buffer10s
stats_timeInterval for automatic performance snapshots60s
to_stderrDuplicate log lines to stderr (handy for systemd / kubectl logs)false

Show current values:

$ais config cluster log # cluster‑wide
$ais config node NODE_ID log # single node (effective)

The new value propagates to every node within a second.

Example (Development Defaults)

1"log": {
2 "level": "3",
3 "modules": "none",
4 "max_size": "4MiB",
5 "max_total": "128MiB",
6 "flush_time": "1m",
7 "stats_time": "1m",
8 "to_stderr": false
9}

Example (Production Configuration)

In production environments, settings are typically adjusted for higher retention and less frequent statistics collection:

$ ais config cluster log
PROPERTY VALUE
log.level 3
log.max_size 4MiB
log.max_total 512MiB
log.flush_time 1m
log.stats_time 3m
log.to_stderr false

At startup, AIS logs some of these settings:

I 19:28:24.774518 config:2143 log.dir: "/var/log/ais"; l4.proto: tcp; pub port: 51080; verbosity: 3
I 19:28:24.774523 config:2145 config: "/etc/ais/.ais.conf"; stats_time: 10s; authentication: false; backends: [aws]

Severity & Verbosity

AIS prepends every line with a severity prefix and—in the case of informational messages—an internal numeric level.

Severity Prefixes

PrefixMeaningPrinted when
EError – unrecoverable / user‑visibleAlways
WWarning – succeeded but suspiciousAlways
IInformationalOnly if allowed by level

Numeric Levels for I Lines

LevelTypical use (examples)
5Hot‑path trace, request headers, per‑part
4Verbose progress, retries, caching stats
3Startup, shutdown, xaction summaries (default)
2‑0Progressively quieter; at ≤2 almost silent

Tip. Temporarily crank a node:

$ais config node set TARGET log.level 4
$ais config cluster log.modules ec xs # focus on EC & batch jobs (xactions)

Per‑module Overrides (log.modules)

log.modules lets you boost just a subset of subsystems to level 5 without flooding the whole cluster.

$# Elevate erasure‑coding (ec) and xaction scheduler (xs):
$ais config cluster log.modules ec xs
$
$# Revert to normal
$ais config cluster log.modules none

Log Format and Structure

AIS logs follow a consistent format:

I 2025‑05‑19 13:42:17.791884 cpu:60 Reducing GOMAXPROCS (prev=256) to 32
│ │ │ │ └─ message
│ │ │ └─ Go file:line inside AIS source
│ │ └─ timestamp (µs precision)
│ └─ severity prefix
└─ 'I' for INFO

Common prefixes:

  • config: – effective runtime configuration
  • x-<n>: – extended (batch) action lifecycle
  • nvmeXnY: – per‑disk I/O snapshot
  • kvstats: – cluster‑wide key‑value metrics (see below)

Log File Layout & Rotation

EnvironmentWhere logs appearNotes
Bare‑metal/var/log/ais/<node>.logOne file per daemon (proxy / target)
Kubernetescontainer stdout (kubectl logs)Collected by CRI‑O / containerd

File names include the node ID plus a sequence number (target‑A43c.log.3). Rotation is triggered by max_size; retention is enforced by max_total.

AIS implements automatic log rotation as indicated by the header:

Rotated at 2025/05/14 21:00:38, host ais-target-13, go1.24.3 for linux/amd64

When logs are rotated, new log files are created and old ones are typically compressed or archived according to the retention policy.

Accessing Logs

Via CLI

The AIS CLI provides commands to view and collect logs:

$# View logs from a specific node
$ais log show [NODE_ID]
$
$# Filter logs by severity
$ais log show [NODE_ID] --severity error
$
$# Collect logs from all nodes
$ais log get --help

Directly in Kubernetes

In Kubernetes deployments, access logs using kubectl:

$kubectl logs -n ais ais-proxy-15
$kubectl logs -n ais ais-target-13

Common Log Patterns

Startup Sequence

The startup sequence provides important information about the AIS node configuration:

Started up at 2025/05/13 19:28:24, host ais-proxy-15, go1.24.3 for linux/amd64
W 19:28:24.774364 config:1506 control and data share the same intra-cluster network: ais-proxy-15.ais-proxy.ais.svc.cluster.local
I 19:28:24.774518 config:2143 log.dir: "/var/log/ais"; l4.proto: tcp; pub port: 51080; verbosity: 3
I 19:28:24.774523 config:2145 config: "/etc/ais/.ais.conf"; stats_time: 10s; authentication: false; backends: [aws]
I 19:28:24.774540 daemon:311 Version 3.28.2ec8b22, build 2025-05-13T19:20:12+0000, CPUs(32, runtime=256), containerized

Operation Logs

AIS logs details about operations such as list, put, get:

I 21:00:43.063816 base:211 x-list[ApJcaebM5]-ais://yodas-21:00:03.062994-00:00:00.000000 finished
I 21:00:44.482430 base:211 x-list[J1qpgaWbxG]-ais://yodas-21:00:04.481894-00:00:00.000000 finished

Performance Metrics

AIS regularly logs performance metrics in two formats:

  1. Disk-specific performance:
I 21:00:48.784074 nvme3n1: 54MiB/s, 119KiB, 0B/s, 0B, 26%
I 21:00:48.784078 nvme9n1: 41MiB/s, 119KiB, 0B/s, 0B, 20%
I 21:00:48.784080 nvme10n1: 38MiB/s, 112KiB, 0B/s, 0B, 18%
  1. Comprehensive key-value statistics (at regular intervals defined by stats_time):
I 18:06:18.785011 {aws.head.n:114227,aws.head.ns.total:16799090100532,del.n:109,err.get.n:296108,err.ren.n:1,etl.offline.n:1136785,etl.offline.ns.total:73240613148414,get.bps:219094016,get.n:1006219,get.ns:425545477639,get.ns.total:3041645043551487925,get.redir.ns:3262104,get.size:126049578974910,lcache.evicted.n:1211669,lcache.flush.cold.n:491970,lst.n:8824,put.n:286,put.ns.total:1706491834824,put.size:101717912588,ren.n:103,state.flags:32774,stream.in.n:441,stream.in.size:96717189120,stream.out.n:446,stream.out.size:84119388160,disk.nvme7n1.read.bps:35240346...}

Kubernetes-specific Information

In Kubernetes deployments, AIS logs include pod and cluster-specific details:

I 19:28:24.786264 k8s:93 Pod info: name ais-proxy-15 ,namespace ais ,node 10.49.41.55 ,hostname ais-proxy-15 ,host_network false
I 19:28:24.786281 k8s:101 ais-ais-state (&PersistentVolumeClaimVolumeSource{ClaimName:ais-ais-state-ais-proxy-15,ReadOnly:false,})
I 19:28:24.786304 k8s:103 config-template
I 19:28:24.786306 k8s:103 config-mount

Key Performance Metrics

The key-value statistics contain valuable operational metrics:

Key / patternDescription
get.nNumber of GET operations
put.nNumber of PUT operations
get.size, put.sizeCumulative bytes, GET and PUT respectively
get.bpsBytes per second for GET operations
aws.<name>.ns.totalCumulative latency against a cloud backend (AWS S3, in this example)
aws.head.nNumber of HEAD requests to AWS S3
err.get.nNumber of GET errors
disk.<device>.read.bpsRead throughput for specific disk
disk.<device>.utilDevice utilization percentage

Troubleshooting Checklist

  1. Scan for E & W lines around the timeframe.
  2. Look for spikes in err.<n>.n counters.
  3. Watch disk util > 80% or sustained read.bps plateaus.
  4. Temporarily raise log.level or log.modules on a single node to capture more detail.

For advanced log analysis, consider forwarding logs to external systems for aggregation and visualization.

Operational Tips

  • Keep log.level=3 in production; raise to 4 or 5 only while debugging. Lower to 2 or below if you truly need silence.
  • Raise stats_time (≥ 60s) if logs get noisy on busy systems.
  • Ship rotated logs off‑host weekly.
  • Always attach ais cluster download-logs tarball to GitHub issues.
DocumentDescription
OverviewIntroduction to AIS observability
CLICommand-line monitoring tools
PrometheusConfiguring Prometheus with AIS
Metrics ReferenceComplete metrics catalog
GrafanaVisualizing AIS metrics with Grafana
KubernetesWorking with Kubernetes monitoring stacks