EFA (RDMA over AWS Fabric) on EKS
EFA (RDMA over AWS Fabric) on EKS
EFA (RDMA over AWS Fabric) on EKS
This guide covers setting up RDMA over AWS Elastic Fabric Adapter (EFA) on EKS for high-performance disaggregated inference with Dynamo. EFA is the only RDMA fabric available on AWS — InfiniBand and RoCE are not offered. With EFA, Dynamo’s prefill and decode workers transfer KV cache directly between GPUs across nodes via GPU-Direct RDMA, bypassing CPU and TCP/IP stacks.
Without RDMA, disaggregated inference falls back to TCP with severe performance degradation (~98s TTFT vs ~1s with EFA on Llama-3.1-8B at ISL 8000). See the Disaggregated Communication Guide for the transport-layer fundamentals.
Prerequisites
Recommended GPU EC2 instance types with EFA:
This table is not an exhaustive list of all AWS instance types that support EFA. It lists the GPU families most relevant to Dynamo disaggregated inference.
Cluster setup:
- GPU-Direct RDMA enabled on the host — either kernel ≥ 5.12 (DMA-BUF path; default on current AWS EKS AMIs, typically 6.14+) or an older kernel with the
nvidia-peermem/ AWSefa_nv_peermemmodule loaded (legacy peer-memory path; see Step 2 for how to install it). - EFA-enabled security group — VPC security groups must allow all traffic between EFA-attached ENIs. The standard recommendation is a self-referencing security group rule that allows all protocols within the group. See AWS EFA security group setup.
- EKS node groups created with EFA support — when using
eksctl, setefaEnabled: trueon the GPU node group. This attaches the appropriate number of EFA ENIs per instance type.
Overview
EFA setup involves three pieces:
- AWS EFA Kubernetes device plugin — exposes EFA NICs as the
vpc.amazonaws.com/efaextended resource (host-level setup, Step 1). On modern kernels (≥ 5.12) the DMA-BUF path is used andefa_nv_peermemis not required; older kernels need it loaded (Step 2). - Container image with libfabric + aws-ofi-nccl + Dynamo (Step 3).
- Workload spec that selects the LIBFABRIC NIXL backend, requests EFA resources, and runs privileged (Step 4, Step 5).
Step 1: Install the AWS EFA Kubernetes Device Plugin
The AWS EFA Kubernetes Device Plugin exposes each node’s EFA endpoints as the vpc.amazonaws.com/efa extended resource so pods can request them. AWS publishes two install paths — pick one:
Helm (recommended, from the official aws/eks-charts repo):
Or raw manifest (from aws-samples/aws-efa-eks):
Wait for the device plugin pods to start on every EFA-capable node:
Verify EFA resources are advertised by each GPU node:
Each EFA-capable node should report a non-zero vpc.amazonaws.com/efa count (e.g., 32 on p5.48xlarge, reflecting that instance’s EFA endpoint count). The exact count depends on instance type and how the node group’s ENIs were configured at launch.
Step 2: Verify Host Kernel Modules
Modern AWS GPU AMIs (Amazon Linux 2023, Ubuntu 22.04+, kernel ≥ 5.12) use DMA-BUF for GPU-Direct RDMA and do not require nvidia-peermem or efa_nv_peermem. The default AMIs for p5/p5e/p5en/p6-b200/GB200 ship with kernels in the 6.x line where DMA-BUF is the active path.
To confirm:
If you are on an older kernel (< 5.12) and the host doesn’t already have efa_nv_peermem loaded, the simplest path is to switch to an AMI that includes EFA host-level components — the EKS-optimized AL2023 NVIDIA AMI and all Bottlerocket AMIs include them. Otherwise, run aws-efa-installer on the host (via a privileged DaemonSet or baked into a custom AMI). See AWS — Manage EFA devices on Amazon EKS for the full picture.
Step 3: Build a Dynamo EFA Image
Dynamo’s image build is two steps: container/render.py writes a Dockerfile for the chosen framework + target, then docker build consumes it. Passing --make-efa to render.py appends the AWS EFA installer stage from container/templates/aws.Dockerfile, which defines a stage named aws on top of runtime. You must pass --target aws to docker build — without it, docker build stops at the runtime stage and you get an image without EFA. See container/README.md for the full build workflow.
--output-short-filename writes to container/rendered.Dockerfile; omit it to get the long auto-generated filename (e.g., vllm-runtime-cuda12.9-amd64-rendered.Dockerfile) — useful when keeping several rendered Dockerfiles side by side.
See Known Issues below for one case where the default-built image does not produce a working EFA deployment out of the box (GB200 / arm64 64K-page kernels). The symptom looks like a working setup but fails at startup during NIXL memory registration.
Step 4: Configure NIXL Backend
NIXL is the high-level KV transfer API and supports multiple backends. For EFA, the LIBFABRIC backend must be selected. UCX is NIXL’s default backend, and while it has CUDA-IPC / RDMA transports available in the image, in standard pod-to-pod EFA configurations it lands on a slow transport (effectively TCP-speed at ~1–3 GB/s) instead of EFA’s line rate. Empirically, LIBFABRIC is the only backend that reaches full EFA bandwidth on AWS.
Each framework selects the backend differently:
This is a silent-failure path — getting it wrong manifests as ~100 s TTFT instead of a clear error. Always verify at startup that LIBFABRIC is active.
Required EFA environment variables
In addition to backend selection, set these on every worker pod:
Recommended EFA performance tuning
When using FI_EFA_USE_HUGE_PAGE=1, also add hugepages-2Mi: 5120Mi to the pod resource limits.
Step 5: Pod Resource Requests
Dynamo pods that use EFA must request the resource and run privileged:
privileged: true is required for NIXL to register CUDA VRAM with the EFA NIC via fi_mr_reg. IPC_LOCK alone is insufficient.
Known Issues
One issue currently affects default-built Dynamo EFA images.
Issue 1: libfabric on GB200 fails fi_mr_reg on CUDA VRAM
Known affected platforms: GB200.
Symptom: Worker pod fails at startup with fi_mr_reg returning EFAULT during NIXL initialization. NIXL VRAM registration fails; depending on the framework, the worker either crashes or silently falls back to TCP.
Root cause: The libfabric version (versions lower than 2.5.x) bundled with the EFA installer (up to currently latest 1.48.0) lacks a CUDA branch in the dmabuf-eligibility check in prov/efa/src/efa_mr.c. On x86_64 hosts the legacy ibv_reg_mr path handles CUDA pointers natively, so the bug doesn’t surface. On arm64 64K-page kernels (GB200), the legacy path returns EFAULT for CUDA VRAM. Tracked in ofiwg/libfabric#12019.
Upstream status: The bug is resolved in ofiwg/libfabric main and v2.5.x via a more comprehensive rewrite of efa_mr_reg_ibv_mr(). AWS’s aws/libfabric fork has not picked up the upstream rewrite; the latest EFA installer (1.48.0) still ships v2.4.0amzn3.0 with the older code path.
Workarounds:
- Apply the one-line patch to the bundled libfabric. During image build, replace the
aws.Dockerfileinstall step with a custom build:
- Replace bundled libfabric with
ofiwg/libfabric@v2.5.1(or newer). The upstream rewrite is already present; no patch needed. Rebuildaws-ofi-ncclagainst it.
Verification
After deployment, confirm EFA is actually being used (not silent TCP fallback):
1. NIXL chose the LIBFABRIC backend (not UCX):
2. The LIBFABRIC plugin is loaded and executing (not just opened):
3. Registered RDMA memory is GPU VRAM, not CPU pinned memory (no CPU bounce):
4. NIXL transfers are happening, none failing (via Prometheus metrics endpoint):
NIXL telemetry is off by default. To enable it, set on each worker:
Then query:
The same metrics with the vllm: prefix are also published to vLLM’s own metrics endpoint (typically DYN_SYSTEM_PORT, e.g. 8081) when vLLM is the frontend.
5. Decode side confirms KV receipt:
Do not use rdma_write_bytes or other /sys/class/infiniband/*/counters/* checks for EFA verification. EFA SRD uses SEND operations at the hardware level, not RDMA READ/WRITE — rdma_write_bytes is always 0 on correctly configured EFA by design. Use the Prometheus + /proc/<pid>/maps methodology above instead.
Common Failure Modes
References
- Disaggregated Communication Guide — transport-layer fundamentals
- RDMA / InfiniBand on AKS — Azure equivalent
container/templates/aws.Dockerfile— EFA installer template- AWS — Manage EFA devices on Amazon EKS — official EKS-side guide (DRA driver + device plugin)
- AWS EFA documentation — EC2-side EFA overview
aws/eks-charts—aws-efa-k8s-device-plugin— Helm chart source- ofiwg/libfabric#12019 — CUDA dmabuf registration on EFA