First, etymology:
“Resilvering” originally referred to restoring the reflective silver backing of a glass mirror. In modern storage systems the word “resilver” is commonly used to mean “rebuild data redundancy / restore intended layout after a device or topology change” (you’ll see it prominently in ZFS/OpenZFS documentation and tooling).
In AIStore, resilvering (or simply “resilver”) is the mechanism for redistributing objects to their correct locations after volume changes within a storage target.
When mountpaths are attached, detached, enabled, or disabled, objects may no longer reside at their proper HRW locations. Resilver walks all objects and relocates them as needed to restore data placement and redundancy under the current configuration.
HRW: a variant of consistent hash based on rendezvous (highest‑random‑weight) algorithm by Thaler and Ravishankar.
AIStore uses the same conceptual model:
At the cluster level, global rebalance distributes objects across targets using HRW.
At the (local) node level, resilver distributes objects across mountpaths using the same HRW algorithm.
This symmetry is intentional. The same reasoning applies at both levels:
The difference is scope and mechanics: rebalance moves data over the network across targets, while resilver - locally across mountpaths on the same machine.
This document is structured as follows:
Resilver is AIStore’s mechanism for restoring correct object placement and redundancy on a given (or any given) AIS target. The system guarantees that objects end up:
Resilver walks the target’s data and performs the minimum work needed to get back to a consistent state:
Resilver is a local process - it never moves data between targets and never requires cluster-wide coordination.
Resilver runs on demand (via ais storage resilver) and automatically in response to mountpath lifecycle events:
These events change the set of available mountpaths and therefore change HRW placement decisions. Resilver starts immediately to reconcile existing data with the new volume topology.
Operationally, disable/detach reduce the set of available mountpaths. Resilver’s job is to restore the target’s intended placement and redundancy using only the currently-available mountpaths.
Resilver is also preemptible. If a second mountpath event occurs while a resilver is running, the current run is aborted and a new one starts using the updated configuration. This ensures that work is never completed based on stale assumptions.
If a resilver is interrupted — by another mountpath event, a restart, or an abort — AIStore resumes the work later. Resilver is convergent by design: as long as mountpath configuration eventually stabilizes, object placement will converge to the correct state.
You can also trigger resilver manually using the CLI, for example after recovering from disk failures or interrupted maintenance.
Resilver relies on HRW (Highest Random Weight) to determine where objects belong.
For a given object name and a given set of available mountpaths, HRW deterministically selects a single mountpath. Every component in the system computes the same answer independently; no shared state or coordination is required.
When mountpaths are added or removed, HRW placement changes only for a subset of objects. Resilver identifies those objects and relocates them. Objects whose HRW placement does not change are left untouched.
An object is considered misplaced if it does not reside on its HRW mountpath under the current configuration.
Misplacement is expected and benign. It occurs naturally when mountpaths change, or when resilvering is interrupted. Reads continue to work: AIStore can locate objects regardless of where they physically reside.
Resilver is the mechanism that restores optimal placement and eliminates long-term imbalance.
Buckets may be configured with N-way mirroring. In that case, each object consists of:
When a mountpath is disabled or detached, copies on that mountpath become unavailable. Resilver removes stale metadata entries for those copies and creates replacements on other mountpaths if possible.
If there are fewer available mountpaths than required copies, resilvering creates as many copies as it can and leaves the system in a degraded but consistent state. When more mountpaths become available later, resilvering completes the replication.
Large objects may be stored as multiple chunks. All chunks are equal in structure and size. None of them is special.
What distinguishes chunk #1 is only its location: it is stored where a non-chunked object of the same name would be stored — at the object’s HRW mountpath. This allows existing lookup logic to locate the object efficiently.
The object itself is defined by its chunk manifest - the metadata that describes all chunks and their placement. Each chunk is placed independently using HRW derived from the object name and chunk index.
During resilvering, chunks are verified independently. A chunked object is considered correct only if all chunks and the manifest are at their correct locations under the current mountpath configuration. If any part is misplaced, the object is repaired as a unit.
See also: Blob Downloader
Resilver runs as a batch job (or xaction).
Progress is visible through standard job monitoring (ais show job) command, e.g.:
The primary progress metric counts main replicas restored to their HRW locations. This reflects actual repair work performed, not just objects visited.
Objects skipped due to locking contention are tracked separately. Skipped objects are not lost; they are handled in subsequent resilver runs if needed.
Resilver is usually triggered implicitly by mountpath operations, but it can also be started manually:
Mountpath lifecycle commands are the most common trigger:
Additionally, there’s (an advanced-usage capability) to manipulate mountpaths without triggering resilver - useful when batching multiple changes:
In ZFS terminology, scrubbing means deep validation: reading every block, verifying checksums, detecting silent corruption, and repairing bit rot. It’s a thorough health check of data integrity at the physical level.
AIStore’s resilver is intentionally much narrower in scope - it focuses exclusively on data placement and redundancy under the current mountpath volume, and it does not verify checksums or read object contents.
This makes resilver fast and topology-focused. It runs after mountpath changes (attach/detach/enable/disable) to restore correct layout, not to detect corruption.
AIS provides APIs and configuration to validate checksums during normal I/O operations (reads, writes, copies). Full end-to-end validation - the equivalent of ZFS scrub - would combine resilver’s placement checks with explicit checksum verification of all objects. Such functionality could be added in the future.
For now, AIStore separates concerns clearly: