In production, AIStore clusters often run in containerized, sometimes constrained environments.
When AIS runs inside a constrained container, it:
GOMAXPROCS accordinglyIn practice, this means AIS runtime behavior and operator-facing metrics align more closely with the limits the container runtime actually enforces.
Table of Contents
ForceContainerCPUMemAIS initializes CPU and memory accounting in two stages:
sys.Init() phase that performs container detection, resolves the cgroup version once, applies container-aware CPU count, and adjusts GOMAXPROCSThe container-detection step is best-effort. AIS checks for common markers such as /.dockerenv and well-known cgroup tokens including docker, containerd, kubepods, kube, lxc, libpod, and podman.
At startup, AIS logs the effective runtime context.
In this example:
runtime=80 is what the Go runtime sees on the hostCPUs(40, ...) is the effective CPU count AIS uses after applying cgroup-aware accountingcontainer:cgroup-v2 means AIS detected a containerized cgroup-v2 environmentThis is expected when a pod or container is allowed to use only a subset of the host’s CPU capacity.
AIS reports CPU utilization as a smoothed moving average rather than a one-shot instantaneous sample. The goal is to provide a more stable and more useful operational signal.
AIS distinguishes between:
These are related, but not the same. A node may show moderate utilization while still being throttled. In that case, the problem is not simply “CPU is busy” but “the container is not being allowed to use more CPU.”
ais show clusterThe ais show cluster output includes, in particular:
--verboseSYS CPU(%) is the smoothed node CPU utilization reported by AIS.
THROTTLED(%) appears only in environments where AIS can observe CPU throttling via cgroup v2. It indicates how much CPU time the container is losing due to runtime throttling.
The following example shows what a healthy AIS 4.4 deployment may look like inside Kubernetes when nodes run under cgroup v2.
A few things are worth noting:
SYS CPU(%) is the node CPU utilization reported by AIS, not the older load-average view.THROTTLED(%) appears only when at least one node in the displayed section reports non-zero cgroup-v2 throttling.
The numbers above are illustrative, but the format and interpretation match AIS 4.4 behavior. To better understand the numbers in memory columns, please see Two different views of memory section below.
Use ais show cluster --verbose to add LOAD AVERAGE to the default 4.4 cluster view. This is useful when you want the traditional 1-, 5-, and 15-minute load numbers alongside the newer AIS CPU metrics.
AIS 4.4 exposes several CPU-related signals, but they answer different questions.
SYS CPU(%) is the primary AIS CPU signal. It is computed from recent CPU usage and reported as a smoothed moving average. THROTTLED(%) is separate: it indicates CPU time the runtime is denying to the container in cgroup-v2 environments. LOAD AVERAGE remains available with --verbose, but it is secondary and should be treated as additional context rather than the main CPU metric.
Memory reporting also changes based on the detected runtime environment.
On bare metal, AIS reads host memory information from /proc/meminfo.
In containerized Linux environments:
memory.max, memory.current, and memory.statIf the container has no explicit memory limit, AIS falls back to host memory reporting.
This means that memory shown by AIS inside a constrained container should reflect the container’s configured limit rather than the host’s total RAM.
AIS also treats some auxiliary memory details as best-effort so that missing or unreadable secondary files do not cause runtime failure.
MEM USED(%) and MEM AVAIL use different but complementary views of memory. In the current 4.4 implementation, MEM USED(%) is derived from the AIS process resident set size (RSS) as a percentage of the node’s effective memory total, or the container’s memory limit when running under cgroups.
MEM AVAIL, on the other hand, is not simply raw free memory. MEM AVAIL reflects the node’s available memory (ActualFree), which accounts for reclaimable kernel caches - memory that is technically in use (e.g., for kernel’s pagecache) but can be freed under pressure.
As a result, these columns are useful for operational triage, but they are not directly comparable to Prometheus metrics such as container_memory_usage_bytes, which report container-wide usage using different accounting semantics.
In particular, MEM USED(%) will typically be much smaller than what Prometheus metrics such as container_memory_usage_bytes report, and often smaller than what operators see via kubectl top (which is a separate Kubernetes resource-usage view and should be treated as such).
That’s because MEM USED(%) reports only its own process footprint (see next section), while those tools generally reflect broader container memory usage.
MEM USED(%) in AIS is based on the process RSS (Resident Set Size) - the portion of a process’s memory that is currently resident in physical RAM. This includes the Go heap, goroutine stacks, the Go runtime itself, loaded code and libraries, and other in-process data structures.
AIS nodes do not allocate custom off-heap buffers (e.g., via
mmap), so RSS remains a reasonable approximation of the process’s own memory footprint.
RSS is a process-level view of memory, not a container-level one. It explains why MEM USED(%) may remain relatively small even when Prometheus metrics such as container_memory_usage_bytes - or operator-facing tools such as kubectl top - report much larger memory usage for the same pod or container.
The difference is expected: container-level metrics commonly include memory charged to the container beyond the AIS process RSS, including filesystem page cache and other kernel-accounted memory associated with the workload.
In AIS, this distinction is especially visible during heavy I/O: large reads and writes can increase container memory usage through page cache activity even when the AIS process RSS itself changes only modestly.
On Kubernetes, cgroup v2 control files often live under a pod-specific subtree rather than directly under /sys/fs/cgroup/.
The exact path depends on the container runtime and cgroup namespace configuration — CRI-O, containerd, systemd-nspawn, and cgroupns=private setups all produce different nested layouts. For example, plain Docker typically mounts the cgroup root directly at /sys/fs/cgroup/, while a Kubernetes pod might see its files under something like /sys/fs/cgroup/kubepods.slice/.../crio-<hash>.scope/.
AIS handles this by reading the process’s own cgroup membership from /proc/self/cgroup at startup and resolving the actual file paths from there, rather than assuming a single hardcoded location. This means the same binary works whether it runs in a flat Docker container, a nested K8s pod, or any other cgroup v2 layout — no configuration required.
ForceContainerCPUMemThe ForceContainerCPUMem feature flag can override failed container auto-detection.
In the unlikely event that auto-detection fails, enable ForceContainerCPUMem and restart the node or the entire cluster.
This forces cgroup-based CPU and memory accounting and is intended for unusual deployments.
Use it when:
A simple way to validate container-aware CPU and memory reporting is to compare the same test or deployment on the host and inside a constrained container.
AIS includes a sys package unit test that you can run inside a constrained container to verify container-aware CPU and memory accounting:
In that setup:
--memory=512m to --memory=4G should change the observed memory totals accordinglyFor a running AIS deployment, practical checks are:
container:cgroup-v2 or container:cgroup-v1CPUs(..., runtime=...) in the startup lineais show clusterSYS CPU(%) is shown by defaultTHROTTLED(%) in constrained cgroup-v2 environmentsAIS prefers the most appropriate source for the current environment, but it also degrades gracefully.
In general:
This keeps the behavior predictable and avoids repeated source-selection churn while the process is running.
Starting with v4.4, AIStore fully supports cgroup v2 and continues to support cgroup v1.
Note that cgroup v1 support is considered deprecated and may be removed in a future release.
Future work may include additional CPU pressure signals and support for non-Linux platforms (such as macOS where Linux cgroups are not available).