AIS in Containerized Environments
In production, AIStore clusters often run in containerized, sometimes constrained environments.
When AIS runs inside a constrained container, it:
- detects containerized execution at startup
- determines whether the environment uses cgroup v2 or cgroup v1
- applies a container-aware effective CPU count
- adjusts
GOMAXPROCSaccordingly - reports container-scoped memory rather than host memory
- reports CPU utilization as a smoothed moving average
- tracks CPU throttling separately when cgroup v2 provides it
In practice, this means AIS runtime behavior and operator-facing metrics align more closely with the limits the container runtime actually enforces.
Table of Contents
- Startup behavior
- CPU reporting
- Interpreting CPU Signals
- Memory reporting
- Kubernetes note
- When to use
ForceContainerCPUMem - How to validate the behavior
- Fallback behavior
- Current scope and limitations
Startup behavior
AIS initializes CPU and memory accounting in two stages:
- a minimal package init that always has a safe default
- a runtime
sys.Init()phase that performs container detection, resolves the cgroup version once, applies container-aware CPU count, and adjustsGOMAXPROCS
The container-detection step is best-effort. AIS checks for common markers such as /.dockerenv and well-known cgroup tokens including docker, containerd, kubepods, kube, lxc, libpod, and podman.
At startup, AIS logs the effective runtime context.
Example startup log
In this example:
runtime=80is what the Go runtime sees on the hostCPUs(40, ...)is the effective CPU count AIS uses after applying cgroup-aware accountingcontainer:cgroup-v2means AIS detected a containerized cgroup-v2 environment
This is expected when a pod or container is allowed to use only a subset of the host’s CPU capacity.
CPU reporting
AIS reports CPU utilization as a smoothed moving average rather than a one-shot instantaneous sample. The goal is to provide a more stable and more useful operational signal.
AIS distinguishes between:
- CPU utilization: how busy the node is
- CPU throttling: how much CPU time the runtime is denying to the container
These are related, but not the same. A node may show moderate utilization while still being throttled. In that case, the problem is not simply “CPU is busy” but “the container is not being allowed to use more CPU.”
ais show cluster
The ais show cluster output includes, in particular:
- SYS CPU(%) by default
- LOAD AVERAGE only with
--verbose - THROTTLED(%) only when at least one node in the displayed proxy or target section reports non-zero cgroup-v2 throttling
SYS CPU(%) is the smoothed node CPU utilization reported by AIS.
THROTTLED(%) appears only in environments where AIS can observe CPU throttling via cgroup v2. It indicates how much CPU time the container is losing due to runtime throttling.
Example: AIS/Kubernetes deployment
The following example shows what a healthy AIS 4.4 deployment may look like inside Kubernetes when nodes run under cgroup v2.
A few things are worth noting:
SYS CPU(%)is the node CPU utilization reported by AIS, not the older load-average view.THROTTLED(%)appears only when at least one node in the displayed section reports non-zero cgroup-v2 throttling.- Small non-zero throttling values are not unusual in busy containerized environments.
- Memory and CPU totals shown by AIS should reflect the container’s effective limits rather than the host’s full physical capacity.
The numbers above are illustrative, but the format and interpretation match AIS 4.4 behavior. To better understand the numbers in memory columns, please see Two different views of memory section below.
Example: verbose view
Use ais show cluster --verbose to add LOAD AVERAGE to the default 4.4 cluster view. This is useful when you want the traditional 1-, 5-, and 15-minute load numbers alongside the newer AIS CPU metrics.
Interpreting CPU Signals
AIS 4.4 exposes several CPU-related signals, but they answer different questions.
SYS CPU(%) is the primary AIS CPU signal. It is computed from recent CPU usage and reported as a smoothed moving average. THROTTLED(%) is separate: it indicates CPU time the runtime is denying to the container in cgroup-v2 environments. LOAD AVERAGE remains available with --verbose, but it is secondary and should be treated as additional context rather than the main CPU metric.
Memory reporting
Memory reporting also changes based on the detected runtime environment.
On bare metal, AIS reads host memory information from /proc/meminfo.
In containerized Linux environments:
- for cgroup v2, AIS uses
memory.max,memory.current, andmemory.stat - for cgroup v1, AIS uses the corresponding v1 memory control files
If the container has no explicit memory limit, AIS falls back to host memory reporting.
This means that memory shown by AIS inside a constrained container should reflect the container’s configured limit rather than the host’s total RAM.
AIS also treats some auxiliary memory details as best-effort so that missing or unreadable secondary files do not cause runtime failure.
Two different views of memory
MEM USED(%) and MEM AVAIL use different but complementary views of memory. In the current 4.4 implementation, MEM USED(%) is derived from the AIS process resident set size (RSS) as a percentage of the node’s effective memory total, or the container’s memory limit when running under cgroups.
MEM AVAIL, on the other hand, is not simply raw free memory. MEM AVAIL reflects the node’s available memory (ActualFree), which accounts for reclaimable kernel caches - memory that is technically in use (e.g., for kernel’s pagecache) but can be freed under pressure.
As a result, these columns are useful for operational triage, but they are not directly comparable to Prometheus metrics such as container_memory_usage_bytes, which report container-wide usage using different accounting semantics.
In particular, MEM USED(%) will typically be much smaller than what Prometheus metrics such as container_memory_usage_bytes report, and often smaller than what operators see via kubectl top (which is a separate Kubernetes resource-usage view and should be treated as such).
That’s because MEM USED(%) reports only its own process footprint (see next section), while those tools generally reflect broader container memory usage.
About RSS
MEM USED(%) in AIS is based on the process RSS (Resident Set Size) - the portion of a process’s memory that is currently resident in physical RAM. This includes the Go heap, goroutine stacks, the Go runtime itself, loaded code and libraries, and other in-process data structures.
AIS nodes do not allocate custom off-heap buffers (e.g., via
mmap), so RSS remains a reasonable approximation of the process’s own memory footprint.
RSS is a process-level view of memory, not a container-level one. It explains why MEM USED(%) may remain relatively small even when Prometheus metrics such as container_memory_usage_bytes - or operator-facing tools such as kubectl top - report much larger memory usage for the same pod or container.
The difference is expected: container-level metrics commonly include memory charged to the container beyond the AIS process RSS, including filesystem page cache and other kernel-accounted memory associated with the workload.
In AIS, this distinction is especially visible during heavy I/O: large reads and writes can increase container memory usage through page cache activity even when the AIS process RSS itself changes only modestly.
Kubernetes note
On Kubernetes, cgroup v2 control files often live under a pod-specific subtree rather than directly under /sys/fs/cgroup/.
The exact path depends on the container runtime and cgroup namespace configuration — CRI-O, containerd, systemd-nspawn, and cgroupns=private setups all produce different nested layouts. For example, plain Docker typically mounts the cgroup root directly at /sys/fs/cgroup/, while a Kubernetes pod might see its files under something like /sys/fs/cgroup/kubepods.slice/.../crio-<hash>.scope/.
AIS handles this by reading the process’s own cgroup membership from /proc/self/cgroup at startup and resolving the actual file paths from there, rather than assuming a single hardcoded location. This means the same binary works whether it runs in a flat Docker container, a nested K8s pod, or any other cgroup v2 layout — no configuration required.
When to use ForceContainerCPUMem
The ForceContainerCPUMem feature flag can override failed container auto-detection.
In the unlikely event that auto-detection fails, enable ForceContainerCPUMem and restart the node or the entire cluster.
This forces cgroup-based CPU and memory accounting and is intended for unusual deployments.
Use it when:
- AIS is running in a container or pod
- startup logs do not show “container” in the first 10-12 lines
- reported CPU or memory clearly matches the host instead of the container
How to validate the behavior
A simple way to validate container-aware CPU and memory reporting is to compare the same test or deployment on the host and inside a constrained container.
AIS includes a sys package unit test that you can run inside a constrained container to verify container-aware CPU and memory accounting:
In that setup:
- CPU count and memory totals should reflect the container’s limits rather than the host’s capacity
- changing
--memory=512mto--memory=4Gshould change the observed memory totals accordingly - the same workload should produce different CPU and memory observations on host versus inside the container
For a running AIS deployment, practical checks are:
- inspect startup logs for
container:cgroup-v2orcontainer:cgroup-v1 - compare
CPUs(..., runtime=...)in the startup line - run
ais show cluster - confirm that
SYS CPU(%)is shown by default - look for
THROTTLED(%)in constrained cgroup-v2 environments
Fallback behavior
AIS prefers the most appropriate source for the current environment, but it also degrades gracefully.
In general:
- CPU reporting tries to preserve a usable percentage whenever possible
- memory reporting prefers host stats over failing hard when container-specific files are unavailable
- cgroup probing is done once at startup, not repeatedly during steady-state runtime
This keeps the behavior predictable and avoids repeated source-selection churn while the process is running.
Current scope and limitations
Starting with v4.4, AIStore fully supports cgroup v2 and continues to support cgroup v1.
Note that cgroup v1 support is considered deprecated and may be removed in a future release.
Future work may include additional CPU pressure signals and support for non-Linux platforms (such as macOS where Linux cgroups are not available).