AIStore v3.28 introduces a unified rate-limiting capability that works at both the frontend (client-facing) and backend (cloud-facing) layers. It enables proactive control to prevent hitting limits and reactive handling when limits are encountered — all configurable at both the cluster and bucket levels, with zero performance overhead when disabled.
This text explains how it all fits together.
The original motivation was to gracefully handle rate-limited cloud storage such as Amazon S3, Google Cloud Storage (GCS), and other remote backends.
One common misconception is that integrating with systems that impose their own rate constraints boils down to simply retrying failed requests with exponential backoff.
Not true! In reality, such integrations are always a balancing act: the goal is to minimize retries while running at the maximum allowed speed — which further requires configuration, runtime state, and a few more elements explained below.
Proactive Rate Limiting aims to keep requests within permitted throughput before hitting any system-imposed limits—cloud or otherwise. By governing the flow of requests in real time, we reduce the need for retries.
Reactive Rate Limiting comes into play when the external service or remote storage actually enforces a limit (returning 429 or 503).
When that happens, we respond with adaptive backoff and retry logic — but only if the corresponding bucket has its rate-limiting policy enabled. This gives us a self-adjusting mechanism that converges on the maximum permissible speed with minimal overhead.
“AIStore has a unique dual identity: it acts as reliable distributed storage (managing local and remote buckets) while also serving as a fast tiering layer for other systems.”
Bursty (Frontend)
429 immediately.Adaptive or Shaping (Backend)
(bucket, verb) basis, where the verbs are: GET, PUT, and DELETE.429 or 503, AIS target may engage exponential backoff to stay under the cloud provider’s limits.Version 3.28 introduces per-bucket configuration, with inheritable defaults set at the cluster level. Buckets automatically inherit the global settings, but each bucket can override them at creation time or any point thereafter. For reference, see the updated configuration definitions in cmn/config.go (lines 676–733).
This includes:
1 / nap share of incoming requests, where nap is the current number of active proxies in the cluster.nat targets accessing the same remote bucket in parallel, and computes its share accordingly.For a given bucket, configuration may look as follows:
Although frontend and backend differ in their specific mechanisms (bursty vs. shaping), the underlying logic is unified:
Frontend: Each proxy enforces a configurable rate limit on a per-bucket and per-operation (verb) basis.
Backend: Each target enforces the configured limit for outbound calls. This is wrapped by a dedicated ais/rlbackend layer that shapes traffic to remote AIS clusters or external clouds (e.g., S3, GCS).
Here’s a simplified snippet of logic with (4) inline comments:
AIStore supports numerous batch jobs that read and transform data across buckets. For example, a job might read data from one bucket, apply a user-defined transformation, and then write the results to another bucket. Multiple rate-limiting scenarios can arise:
At first, the permutations may seem too numerous, but in reality it is easy to state a single rule:
(bucket, verb) rate limiter that keeps adjusting its runtime state based on the responses from remote storage.Scenario: You have an S3 bucket with a known rate limit of 3500 requests per second.
Configuration:
“You want to limit the maximum number of client requests to a specific bucket to 20,000 per minute. Further, the cluster in question happens to have 10 AIS gateways (and a load balancer on the front).”
Configuration:
This configures a given bucket to:
429 (“Too Many Requests”) if clients exceed these limits.aisloader runAnd then:
Scenario: You are migrating or copying data from GCS to S3 and need to respect both providers’ limits.
Configuration:
When running a copy or transform job between these buckets, AIStore automatically respects both rate limits without (requiring) any additional configuration.
There are statistics and Prometheus metrics to monitor all performance-related aspects including (but not limited to) rate-limiting.
Below are two tables — one for GET, another for PUT — that illustrate how the performance monitoring might look for an AIStore cluster under the described rate-limited scenario.
These tables can vary widely, primarily depending on the percentage of source data that is in-cluster, but also on:
In practice, you’d adjust the rate-limit interval and max_tokens (and potentially other AIStore config parameters) to match your workload and performance requirements.
The objective for v3.28 was to maintain linear scalability and high performance while safeguarding against external throttling or internal overload.
The solution features: