AIStore cluster will survive a loss of any storage target and any gateway including the primary gateway (leader). New gateways and targets can join at any time – including the time of electing a new leader. Each new node joining a running cluster will get updated with the most current cluster-level metadata. Failover – that is, the election of a new leader – is carried out automatically on failure of the current/previous leader. Failback – that is, administrative selection of the leading (likely, an originally designated) gateway – is done manually via AIStore API
It is, therefore, recommended that AIStore cluster is deployed with multiple proxies aka gateways (the terms that are interchangeably used throughout the source code and this README).
When there are multiple proxies, only one of them acts as the primary while all the rest are, respectively, non-primaries. The primary proxy’s (primary) responsibility is serializing updates of the cluster-level metadata (which is also versioned and immutable).
Further:
The proxy’s bootstrap sequence initiates by executing the following three main steps:
AIS_PRIMARY_EP to figure out whether this proxy must keep starting up as a primary;
The rules to determine whether a given starting-up proxy is the primary one in the cluster - are simple. In fact, it’s a single switch statement in the namesake function:
Further, the (potentially) primary proxy executes more steps:
If during any of these steps the proxy finds out that it must be joining as a non-primary then it simply does so.
The primary proxy election process is as follows:
AIStore cluster can be stretched to collocate its redundant gateways with the compute nodes. Those non-electable local gateways (AIStore configuration) will only serve as access points but will never take on the responsibility of leading the cluster.
By design, AIStore does not have a centralized (SPOF) shared cluster-level metadata. The metadata consists of versioned objects: cluster map, buckets (names and properties), authentication tokens. In AIStore, these objects are consistently replicated across the entire cluster – the component responsible for this is called metasync. AIStore metasync makes sure to keep cluster-level metadata in-sync at all times.
While the control plane handles node membership and consistency of the cluster-level metadata, the data plane has its own resilience mechanisms: