Global rebalance is the cluster-wide process of moving objects to their proper storage targets after a topology change.
In AIStore, object placement is not arbitrary and is not tracked via a central directory. Instead, every object has a uniquely defined destination computed from the current cluster state. When that state changes, some objects acquire a new correct location. Global rebalance is the decentralized background process that brings the cluster into agreement with the new placement.
Global rebalance is one of the core AIS mechanisms that makes it possible to grow or shrink a cluster, return nodes to service, and perform maintenance without taking the cluster offline.
Table of Contents
AIStore uses a variant of highest random weight (HRW), also known as rendezvous hashing, to determine object placement.
At the cluster level, the destination target for an object is uniquely determined by the following three inputs:
This is the key to understanding rebalance.
When the cluster map changes, the HRW result may change as well. For some subset of objects, a different target becomes the correct owner. Those are the objects that must be moved.
AIS uses the same general idea internally as well. Within a storage target, object placement across local disks is also HRW-based, except that instead of the cluster map AIS uses the set of local mountpaths.
Global rebalance is triggered by changes that affect cluster-wide target placement.
Typical examples include:
A useful rule of thumb is:
if a topology change can alter the destination for stored objects, it can trigger global rebalance.
When a single target is added to or removed from a cluster of N targets, the fraction of objects that move is typically on the order of 1/N, though the exact amount always depends on the topology and the current object distribution.
For additional background on maintenance, shutdown, and decommission workflows, see Node lifecycle: maintenance, shutdown, decommission.
Global rebalance is fully decentralized.
At a high level:
The steps above describe regular, data-moving rebalance. Cleanup mode, described below, reuses the rebalance lifecycle but does not migrate object payloads.
A few points are worth emphasizing:
This design keeps rebalancing scalable and avoids turning the primary into a data path bottleneck.
Global rebalance does not require cluster downtime.
During migration, incoming reads may target objects that have not yet moved, or are in the middle of being moved. AIS handles this transparently.
In particular, the target that must own an object according to the new cluster map can internally locate a neighbor that still has the object and retrieve it on demand. This internal flow is often described as get-from-neighbor.
As a result:
Global rebalance is controlled and monitored via:
Operationally, administrators typically care about three things:
Like other long-running AIS activities, rebalance is tracked as a cluster job and can be inspected while in progress.
Note: As elsewhere in the documentation,
http://localhost:8080in the examples below denotes a local playground endpoint and should be understood as a placeholder for an arbitrary AIS endpoint (AIS_ENDPOINT).
Likewise, target names such as
t[kOHt8086]are example node identifiers from a test cluster; production deployments will use their own node names, addresses, and ports.
First, populate a bucket:
Verify the object count before rebalance:
Put one target into maintenance. This changes the active target set and starts global rebalance:
Monitor rebalance progress:
In this example, the target placed into maintenance is the one draining data, while the remaining active targets receive it.
Verify the object count after rebalance:
The identical counts before and after rebalance show that the cluster converged to the new placement without losing objects.
Global rebalance normally starts automatically - triggered by a topology change that alters object placement. It can also be started administratively, on demand.
Like a few other AIS xactions, rebalance is directly startable from the CLI:
The rebalance command also supports (advanced-usage) bucket-scoped operation and related synchronization/versioning options:
Bucket scope is advanced usage and generally should be avoided unless you know exactly why a partial, bucket-level rebalance is appropriate. Generally, the cluster should be allowed to rebalance globally.
For cleanup mode, see the next section.
Rebalance can also run in cleanup mode:
Cleanup mode is an administrative maintenance operation. It reuses the rebalance lifecycle and monitoring machinery, but it does not migrate object payloads between targets.
Versions 4.4 and earlier tracked every migrated object with per-object acknowledgments from destination to source, and used those acknowledgments to delete the source copy once placement was confirmed at the destination. That implicit reclamation mechanism did not scale to clusters and buckets with billions of objects and was removed.
As a result, regular rebalance no longer reclaims source-side copies implicitly. After a topology change converges, the cluster may continue to hold local copies of objects whose proper owner is now a different target. These copies are not lost data and they do not affect correctness, but they do consume local capacity until something reclaims them.
Cleanup mode is the explicit, operator-driven replacement: a separate verified pass - rebalance retracing its own steps - that discovers misplaced local copies and removes only those whose proper owner already has the object.
For broader local-storage hygiene, AIS also provides
ais space-cleanup. That tool can remove several classes of local garbage, including corrupted metadata files, zero-size objects when requested, extra local copies, misplaced EC artifacts, local mountpath orphans, and verified migrated-away leftovers. Rebalance’s cleanup mode is narrower: it is the placement-specific, rebalance-lifecycle mode intended for cleaning up source-side copies left after topology changes and regular data-moving rebalance.
Each target walks its local mountpaths and looks for object copies that no longer belong on that target according to the current cluster map. For every local object, AIS recomputes the expected location. If the local target is already the expected owner, the object is skipped.
For a misplaced local copy, AIS contacts the expected owner and requests object properties used to establish identity - size, checksum, version, custom metadata, and ETag. Different byte content means a different version: two copies with the same name but divergent metadata are not the same object.
The local copy is removed only when AIS can verify that the expected owner holds the same version.
In other words, regular rebalance converges placement by moving objects to their proper targets. Cleanup mode converges local storage by removing misplaced copies that are already present at their proper targets.
Cleanup mode is intentionally out-of-band. Regular data-moving rebalance can temporarily create extra local copies while the cluster converges, but tracking every migrated object at runtime would not scale for large clusters and buckets with millions or billions of objects. Cleanup mode therefore performs a separate verified pass: it discovers misplaced local copies from the current on-disk namespace and safely removes only those that are already present at their expected locations.
By default, cleanup mode:
mirror.enabled=true)Cleanup mode is useful after operational workflows such as maintenance, rolling upgrades, or recovery procedures where misplaced local copies may remain and an administrator wants to reclaim local capacity without running a full data-moving rebalance.
Cleanup mode can be bucket-scoped and prefix-scoped, similarly to administrative rebalance. It is incompatible with --latest and --sync.
Cleanup mode can be monitored with the usual rebalance commands:
When cleanup mode is running or has completed, reported counters describe objects removed and bytes reclaimed rather than objects sent and received.
The following abbreviated example shows a three-target cluster where t[VCft8081] has been returned from maintenance. Regular rebalance g23 first moves objects according to the updated cluster map. Cleanup rebalance g24 then removes leftover misplaced local copies from the other targets.
Note that t[VCft8081] does not appear in the cleanup output. Having just returned to service under the current cluster map, it holds no misplaced copies - every local object is HRW-correct from its perspective. Only the targets that had to send data during g23 carry misplaced leftovers.
Cleanup-specific rebalance output reports objects removed and bytes reclaimed:
The --force option is valid only with cleanup mode:
Forced cleanup is advanced usage. To explain what it does, recall the default identity check: cleanup removes a misplaced local copy only when the expected owner reports identical metadata (size, checksum, version, ETag, custom metadata).
When local metadata diverges from the expected owner’s metadata, the two copies are not byte-identical - same name, different content. Concretely, this can happen with a raced write, an out-of-band update at the remote backend, or a stale pre-overwrite leftover. By default, cleanup keeps such divergent local copies, because removing one of two non-identical copies is data loss for whoever happens to hold the version that gets deleted.
--force removes them anyway, treating the HRW owner’s copy as authoritative. Use forced cleanup only when you have established that the divergent local copy is the one to discard.
Two things --force does not do:
dont_cleanup_time) and does not allow cleanup to run concurrently with active rebalance or resilver.Both rebalance and resilver restore HRW-based placement, but they do so at different scopes.
Global rebalance restores placement across the cluster. When the active target set changes, some objects acquire a new HRW destination under the current cluster map, and AIS moves those objects from one storage node to another.
Resilver restores placement within a single target. When the local set of mountpaths changes, some objects acquire a new local destination on that node, and AIS moves those objects across mountpaths on the same target.
In short:
rebalance is cluster-wide and inter-target; resilver is local and intra-target.
For more on resilver, see Resilver.
Rebalancing moves data in the background and competes for I/O, CPU, memory, and network resources.
As a result, during rebalance:
The exact overhead depends on multiple factors, including:
For this reason, administrators often tune rebalance behavior and may temporarily disable automated rebalance while performing planned maintenance or staged upgrades.
Cleanup mode has a different resource profile: it does not migrate object payloads, but it still walks local namespace entries, loads object metadata, performs intra-cluster verification, and removes local files. It should therefore still be treated as a background maintenance operation.