This document describes common AIStore (AIS) failure modes and actionable recovery steps. It is intended for operators and developers troubleshooting clusters that fail to start, fail to join, or exhibit integrity errors.
In most cases, the AIS CLI is the first and best tool to use.
Table of Contents
Note: Some example paths in this document may reflect local dev deployments. In production, cluster-wide metadata is stored in the node’s config directory, while BMD and VMD - bucket and volume metadata, respectively - live at the root of each mountpath. See xmeta (tool) README for more details and examples.
AIS provides extensive CLI tab-completion and discovery.
Start with:
or explore available subcommands interactively:
Example output:
At any time there is exactly one primary proxy. If needed, you can change it administratively:
AIS integrity errors fall into two distinct categories:
Cluster Integrity Errors (cie#)
Inconsistent or conflicting cluster-wide metadata (Smap, BMD, etc.)
Storage Integrity Errors (sie#)
Inconsistent, missing, or invalid mountpath metadata on a target
Understanding which category you are dealing with is critical: CIE errors are cluster-scoped; SIE errors are target-scoped.
Cluster Integrity Errors are raised when a node attempts to join or operate in a cluster with incompatible cluster-wide metadata.
Example:
These errors usually indicate that a node:
Recovery often involves carefully cleaning obsolete metadata:
This must be done with extreme caution. Removing the wrong metadata can permanently orphan data.
CIE recovery is intentionally conservative and usually requires manual inspection and understanding of cluster history.
Storage Integrity Errors relate to mountpaths attached to a storage target. Each target maintains Volume Metadata (VMD) describing its mountpaths, their filesystems, and the target’s persistent Node ID.
Example:
VMD is persisted and replicated across all mountpaths of a target
Each mountpath records:
VMD validation happens at target startup, before runtime checks (FSHC)
Target fails during startup with an error similar to:
This commonly occurs after:
In this state:
This recovery method is safe, explicit, and reversible.
/ais/nvme7n1)xmeta tool to disable the failed mountpath in a given selected VMD replica:Restart the target (/ cluster)
Verify:
or:
The target will now restart in a degraded but safe state, with /ais/nvme7n1 disabled.
Before troubleshooting that involves inspecting or modifying any on-disk metadata:
Speaking of VMD, at minimum back up each mountpath’s metadata and keep the archive somewhere outside /this/ node.
Additionally:
ignore-missing-mountpath unless you fully understand the implications.xmeta is a power tool: indispensable for recovery, dangerous if misused.If unsure, stop to inspect existing metadata before proceeding, and maybe back it up as well.