AIS in Containerized Environments

View as Markdown

In production, AIStore clusters often run in containerized, sometimes constrained environments.

When AIS runs inside a constrained container, it:

  • detects containerized execution at startup
  • determines whether the environment uses cgroup v2 or cgroup v1
  • applies a container-aware effective CPU count
  • adjusts GOMAXPROCS accordingly
  • reports container-scoped memory rather than host memory
  • reports CPU utilization as a smoothed moving average
  • tracks CPU throttling separately when cgroup v2 provides it

In practice, this means AIS runtime behavior and operator-facing metrics align more closely with the limits the container runtime actually enforces.

Table of Contents

Startup behavior

AIS initializes CPU and memory accounting in two stages:

  • a minimal package init that always has a safe default
  • a runtime sys.Init() phase that performs container detection, resolves the cgroup version once, applies container-aware CPU count, and adjusts GOMAXPROCS

The container-detection step is best-effort. AIS checks for common markers such as /.dockerenv and well-known cgroup tokens including docker, containerd, kubepods, kube, lxc, libpod, and podman.

At startup, AIS logs the effective runtime context.

Example startup log

1I 18:55:47.766809 daemon:323 Version 4.4.rc2.bbac8be09, build 2026-04-06T18:49:38+0000, CPUs(40, runtime=80), container:cgroup-v2
2I 18:55:47.788269 daemon:265 Node t[QtxHjmTxD], Version 4.4.rc2.bbac8be09, build 2026-04-06T18:49:38+0000, CPUs(40, runtime=80), container:cgroup-v2

In this example:

  • runtime=80 is what the Go runtime sees on the host
  • CPUs(40, ...) is the effective CPU count AIS uses after applying cgroup-aware accounting
  • container:cgroup-v2 means AIS detected a containerized cgroup-v2 environment

This is expected when a pod or container is allowed to use only a subset of the host’s CPU capacity.

CPU reporting

AIS reports CPU utilization as a smoothed moving average rather than a one-shot instantaneous sample. The goal is to provide a more stable and more useful operational signal.

AIS distinguishes between:

  • CPU utilization: how busy the node is
  • CPU throttling: how much CPU time the runtime is denying to the container

These are related, but not the same. A node may show moderate utilization while still being throttled. In that case, the problem is not simply “CPU is busy” but “the container is not being allowed to use more CPU.”

ais show cluster

The ais show cluster output includes, in particular:

  • SYS CPU(%) by default
  • LOAD AVERAGE only with --verbose
  • THROTTLED(%) only when at least one node in the displayed proxy or target section reports non-zero cgroup-v2 throttling

SYS CPU(%) is the smoothed node CPU utilization reported by AIS.

THROTTLED(%) appears only in environments where AIS can observe CPU throttling via cgroup v2. It indicates how much CPU time the container is losing due to runtime throttling.

Example: AIS/Kubernetes deployment

The following example shows what a healthy AIS 4.4 deployment may look like inside Kubernetes when nodes run under cgroup v2.

1$ ais show cluster
2
3PROXY MEM USED(%) MEM AVAIL SYS CPU(%) UPTIME K8s POD STATUS
4p[br4ghom6tlsn5][P] 18.75% 1.18TiB 21% 643h3m30s ais-proxy-15 online
5p[ydggldkytams7] 0.97% 1.17TiB 9% 642h59m30s ais-proxy-0 online
6p[j4cchhwdmi5ft] 1.02% 1.17TiB 11% 642h59m50s ais-proxy-1 online
7p[tuimpcvr16mnu] 0.98% 1.19TiB 8% 643h0m10s ais-proxy-2 online
8p[ptzgio2qkk02z] 0.91% 1.20TiB 7% 643h0m20s ais-proxy-3 online
9...
10...
11
12TARGET MEM USED(%) MEM AVAIL CAP USED(%) CAP AVAIL SYS CPU(%) THROTTLED(%) UPTIME K8s POD STATUS
13t[dysqeHCy] 0.14% 2.30TiB 62% 27.108TiB 34% 0% 685h39m20s ais-target-0 online
14t[bwLnMici] 0.14% 2.36TiB 64% 26.863TiB 41% 0% 685h38m0s ais-target-1 online
15t[tQbKmamu] 0.15% 2.35TiB 63% 27.118TiB 38% 1% 685h37m10s ais-target-2 online
16t[nOtbUodZ] 0.15% 2.32TiB 62% 27.090TiB 47% 0% 685h35m40s ais-target-3 online
17t[fqpzfbfx] 0.16% 2.34TiB 63% 26.721TiB 52% 2% 685h34m10s ais-target-4 online
18t[abRtepsE] 0.14% 2.30TiB 62% 27.587TiB 36% 0% 685h32m40s ais-target-5 online
19...
20...
21
22Cluster:
23 CLI Endpoint: https://mnk.abc.oci.aistore.nvidia.com:51080
24 Proxies: 16 (all electable)
25 Targets: 16 (total disks: 191)
26 Capacity: used 751.37TiB (64%), available 429.74TiB
27 Cluster Map: version 10946, UUID B7nP8-i7RQ, primary p[br4ghom6tlsn5]
28 Software: 4.4.rc2.bbac8be09
29 Backend: AWS
30 Deployment: K8s
31 Status: 32 online
32 Authentication: enabled

A few things are worth noting:

  • SYS CPU(%) is the node CPU utilization reported by AIS, not the older load-average view.
  • THROTTLED(%) appears only when at least one node in the displayed section reports non-zero cgroup-v2 throttling.
    • Small non-zero throttling values are not unusual in busy containerized environments.
  • Memory and CPU totals shown by AIS should reflect the container’s effective limits rather than the host’s full physical capacity.

The numbers above are illustrative, but the format and interpretation match AIS 4.4 behavior. To better understand the numbers in memory columns, please see Two different views of memory section below.

Example: verbose view

Use ais show cluster --verbose to add LOAD AVERAGE to the default 4.4 cluster view. This is useful when you want the traditional 1-, 5-, and 15-minute load numbers alongside the newer AIS CPU metrics.

1$ ais show cluster --verbose
2
3PROXY MEM USED(%) MEM AVAIL SYS CPU(%) LOAD AVERAGE UPTIME K8s POD STATUS
4p[br4ghom6tlsn5][P] 18.75% 1.18TiB 21% [6.2 6.9 7.2] 643h3m30s ais-proxy-15 online
5p[ydggldkytams7] 0.97% 1.17TiB 9% [12.4 11.5 11.4] 642h59m30s ais-proxy-0 online
6p[j4cchhwdmi5ft] 1.02% 1.17TiB 11% [8.8 7.6 7.4] 642h59m50s ais-proxy-1 online
7p[tuimpcvr16mnu] 0.98% 1.19TiB 8% [6.5 6.9 6.9] 643h0m10s ais-proxy-2 online
8p[ptzgio2qkk02z] 0.91% 1.20TiB 7% [7.2 7.7 7.8] 643h0m20s ais-proxy-3 online
9...
10...
11
12TARGET MEM USED(%) MEM AVAIL CAP USED(%) CAP AVAIL SYS CPU(%) THROTTLED(%) LOAD AVERAGE UPTIME K8s POD STATUS
13t[dysqeHCy] 0.14% 2.30TiB 62% 27.108TiB 34% 0% [12.4 11.5 11.4] 685h39m20s ais-target-0 online
14t[bwLnMici] 0.14% 2.36TiB 64% 26.863TiB 41% 0% [8.8 7.9 7.6] 685h38m0s ais-target-1 online
15t[tQbKmamu] 0.15% 2.35TiB 63% 27.118TiB 38% 1% [7.2 7.7 7.8] 685h37m10s ais-target-2 online
16t[nOtbUodZ] 0.15% 2.32TiB 62% 27.090TiB 47% 0% [7.4 10.0 10.9] 685h35m40s ais-target-3 online
17t[fqpzfbfx] 0.16% 2.34TiB 63% 26.721TiB 52% 2% [8.0 9.5 9.8] 685h34m10s ais-target-4 online
18t[abRtepsE] 0.14% 2.30TiB 62% 27.587TiB 36% 0% [6.1 6.6 6.9] 685h32m40s ais-target-5 online
19...
20...

Interpreting CPU Signals

AIS 4.4 exposes several CPU-related signals, but they answer different questions.

SYS CPU(%) is the primary AIS CPU signal. It is computed from recent CPU usage and reported as a smoothed moving average. THROTTLED(%) is separate: it indicates CPU time the runtime is denying to the container in cgroup-v2 environments. LOAD AVERAGE remains available with --verbose, but it is secondary and should be treated as additional context rather than the main CPU metric.

What you seeLikely interpretationWhat to check next
High LOAD AVERAGE, moderate SYS CPU(%)The node may be under pressure, but not necessarily from sustained CPU execution. Think queueing, contention, or blocked work rather than “CPU fully busy.”Check whether THROTTLED(%) is non-zero. In containerized environments, CPU starvation can show up even when SYS CPU(%) is only moderate.
High SYS CPU(%), moderate LOAD AVERAGECPUs are genuinely busy right now. This is the clearest sign of active CPU work.Treat SYS CPU(%) as the primary signal. Look for sustained values, workload spikes, or competing foreground/background activity.
High SYS CPU(%) and high THROTTLED(%)The node is both busy and being denied CPU by the runtime.This is usually the strongest signal of CPU pressure in cgroup-v2 environments. Consider increasing CPU limits or reducing contention.
Low or moderate SYS CPU(%), non-zero THROTTLED(%)The container may still be short on CPU availability. Throttling indicates lack of CPU entitlement, not just high usage.Investigate container CPU limits and runtime throttling before assuming the node is healthy.
Low SYS CPU(%), low THROTTLED(%), high LOAD AVERAGEPressure may be coming from outside the immediate CPU-utilization path.Treat load average as context, not proof of CPU saturation. Check for queued work, transient backlog, or other resource contention.

Memory reporting

Memory reporting also changes based on the detected runtime environment.

On bare metal, AIS reads host memory information from /proc/meminfo.

In containerized Linux environments:

  • for cgroup v2, AIS uses memory.max, memory.current, and memory.stat
  • for cgroup v1, AIS uses the corresponding v1 memory control files

If the container has no explicit memory limit, AIS falls back to host memory reporting.

This means that memory shown by AIS inside a constrained container should reflect the container’s configured limit rather than the host’s total RAM.

AIS also treats some auxiliary memory details as best-effort so that missing or unreadable secondary files do not cause runtime failure.

Two different views of memory

MEM USED(%) and MEM AVAIL use different but complementary views of memory. In the current 4.4 implementation, MEM USED(%) is derived from the AIS process resident set size (RSS) as a percentage of the node’s effective memory total, or the container’s memory limit when running under cgroups.

MEM AVAIL, on the other hand, is not simply raw free memory. MEM AVAIL reflects the node’s available memory (ActualFree), which accounts for reclaimable kernel caches - memory that is technically in use (e.g., for kernel’s pagecache) but can be freed under pressure.

As a result, these columns are useful for operational triage, but they are not directly comparable to Prometheus metrics such as container_memory_usage_bytes, which report container-wide usage using different accounting semantics.

In particular, MEM USED(%) will typically be much smaller than what Prometheus metrics such as container_memory_usage_bytes report, and often smaller than what operators see via kubectl top (which is a separate Kubernetes resource-usage view and should be treated as such).

That’s because MEM USED(%) reports only its own process footprint (see next section), while those tools generally reflect broader container memory usage.

About RSS

MEM USED(%) in AIS is based on the process RSS (Resident Set Size) - the portion of a process’s memory that is currently resident in physical RAM. This includes the Go heap, goroutine stacks, the Go runtime itself, loaded code and libraries, and other in-process data structures.

AIS nodes do not allocate custom off-heap buffers (e.g., via mmap), so RSS remains a reasonable approximation of the process’s own memory footprint.

RSS is a process-level view of memory, not a container-level one. It explains why MEM USED(%) may remain relatively small even when Prometheus metrics such as container_memory_usage_bytes - or operator-facing tools such as kubectl top - report much larger memory usage for the same pod or container.

The difference is expected: container-level metrics commonly include memory charged to the container beyond the AIS process RSS, including filesystem page cache and other kernel-accounted memory associated with the workload.

In AIS, this distinction is especially visible during heavy I/O: large reads and writes can increase container memory usage through page cache activity even when the AIS process RSS itself changes only modestly.

Kubernetes note

On Kubernetes, cgroup v2 control files often live under a pod-specific subtree rather than directly under /sys/fs/cgroup/.

The exact path depends on the container runtime and cgroup namespace configuration — CRI-O, containerd, systemd-nspawn, and cgroupns=private setups all produce different nested layouts. For example, plain Docker typically mounts the cgroup root directly at /sys/fs/cgroup/, while a Kubernetes pod might see its files under something like /sys/fs/cgroup/kubepods.slice/.../crio-<hash>.scope/.

AIS handles this by reading the process’s own cgroup membership from /proc/self/cgroup at startup and resolving the actual file paths from there, rather than assuming a single hardcoded location. This means the same binary works whether it runs in a flat Docker container, a nested K8s pod, or any other cgroup v2 layout — no configuration required.

When to use ForceContainerCPUMem

The ForceContainerCPUMem feature flag can override failed container auto-detection.

In the unlikely event that auto-detection fails, enable ForceContainerCPUMem and restart the node or the entire cluster.

This forces cgroup-based CPU and memory accounting and is intended for unusual deployments.

Use it when:

  • AIS is running in a container or pod
  • startup logs do not show “container” in the first 10-12 lines
  • reported CPU or memory clearly matches the host instead of the container

How to validate the behavior

A simple way to validate container-aware CPU and memory reporting is to compare the same test or deployment on the host and inside a constrained container.

AIS includes a sys package unit test that you can run inside a constrained container to verify container-aware CPU and memory accounting:

1docker run --rm \
2 --cpus=1.5 \
3 --memory=512m \
4 -v "$PWD":/src -w /src \
5 -v "$HOME/go/pkg/mod":/go/pkg/mod \
6 -v "$HOME/.cache/go-build":/root/.cache/go-build \
7 golang:1.26 \
8 go test ./sys -run . -v -count=1 2>&1

In that setup:

  • CPU count and memory totals should reflect the container’s limits rather than the host’s capacity
  • changing --memory=512m to --memory=4G should change the observed memory totals accordingly
  • the same workload should produce different CPU and memory observations on host versus inside the container

For a running AIS deployment, practical checks are:

  1. inspect startup logs for container:cgroup-v2 or container:cgroup-v1
  2. compare CPUs(..., runtime=...) in the startup line
  3. run ais show cluster
  4. confirm that SYS CPU(%) is shown by default
  5. look for THROTTLED(%) in constrained cgroup-v2 environments

Fallback behavior

AIS prefers the most appropriate source for the current environment, but it also degrades gracefully.

In general:

  • CPU reporting tries to preserve a usable percentage whenever possible
  • memory reporting prefers host stats over failing hard when container-specific files are unavailable
  • cgroup probing is done once at startup, not repeatedly during steady-state runtime

This keeps the behavior predictable and avoids repeated source-selection churn while the process is running.

Current scope and limitations

Starting with v4.4, AIStore fully supports cgroup v2 and continues to support cgroup v1.

Note that cgroup v1 support is considered deprecated and may be removed in a future release.

Future work may include additional CPU pressure signals and support for non-Linux platforms (such as macOS where Linux cgroups are not available).