GKE TCPXO Networking Prerequisites
For *-gke-cos-training* recipes, GPUDirect TCPXO enables high-speed inter-node GPU communication on GKE. Without it, NCCL falls back to TCP (~4 GB/s vs ~340 GB/s with TCPXO).
Infrastructure Prerequisites
GKE clusters must have multi-NIC networking configured before deploying AICR bundles:
- Multi-NIC networking enabled (8 GPU NICs per a3-megagpu-8g node)
Network+GKENetworkParamSetCRs configured for GPU NICs (cluster-specific, not managed by AICR)nccl-tcpxo-installerDaemonSet on GPU nodes (included in AICR bundle)nri-device-injectorDaemonSet on GPU nodes (included in AICR bundle)
Important: The GPU node pool must be provisioned with only the 8 GPU NIC
networks (gpu-nic-0 through gpu-nic-7). Do not include a gVNIC additional
network — it takes a GPU NIC PCI slot (0000:06:00.0), leaving only 7/8 GPUs
available for TCPXO.
Workload Pod Configuration (NRI Profile)
The NRI profile mounts the host’s /sys and /proc/sys into the TCPXO daemon
container, giving it PCI sysfs visibility without hostNetwork. This preserves
pod networking (DNS, network policies, service mesh compatibility).
Key properties:
hostNetwork: false— workloads get proper pod networkingprivileged: false— tcpxo-daemon uses onlyNET_ADMINandNET_BIND_SERVICE/sysmounted as/hostsysfs— provides PCI sysfs visibility for GPU enumeration/proc/sysmounted as/hostprocsysfs— allows kernel network tuning- NRI annotations inject GPU devices and multi-NIC interfaces
- Requires NRI device injector DaemonSet deployed on GPU nodes
See demos/workloads/training/gke-nccl-test-tcpxo.yaml for a complete 2-node NCCL benchmark example.
NCCL Plugin Version Matching
The NCCL test container image must match the cluster’s installed TCPXO plugin version. Check with:
Update the nccl-plugin-gpudirecttcpx-dev image tag in your workload to match.
Troubleshooting
RxDM detects 7/8 GPUs
If RxDM reports Number of GPUs detected 7 is not equal to the actual number of GPUs 8, check the GPU node pool’s additional network configuration:
If a gVNIC network appears in the list, it is taking a GPU NIC PCI slot. Remove the gVNIC from the node pool and reprovision the GPU nodes.
You can also verify the node NIC mapping:
All 8 GPU NIC PCI addresses should be mapped to eth1–eth8. If a gVNIC is present, it typically occupies PCI 0000:06:00.0, displacing the first GPU NIC.
RxDM detects 0/8 GPUs
If RxDM reports Number of GPUs detected in the PCI tree 0, the pod is missing the /sys hostPath mount. Ensure /sys is mounted as /hostsysfs in the tcpxo-daemon container. Without it, the container network namespace hides the host PCI sysfs tree entirely.
Performance Reference
Validated on GKE 1.35 / a3-megagpu-8g (2 nodes, 16 GPUs):