New Features
Version | Description |
---|---|
23.4.0 | Added support for Kubernetes >= 1.21 and <=1.27. |
Added support for NicClusterPolicy update and removal. | |
Added support for OpenShift Container Platform 4.11 and 4.12. | |
23.1.0 |
|
| |
| |
| |
1.4.0 | Added support for Kubernetes >= 1.21 and <=1.25. |
Added support for Ubuntu 22.04. | |
Added support for OpenShift Container Platform 4.11 including DGX platform. | |
Added Beta support for PKey configuration for IB networks with IB-Kubernetes. | |
1.3.0 | Added support for Kubernetes >= 1.17 and <=1.24. |
Added the option to use a single namespace to deploy Network Operator components. | |
Added support for automatic MLNX OFED driver upgrade. | |
Added support for IPoIB CNI. | |
Added support for Air Gap deployment. | |
1.2.0 | Added support for OpenShift Container Platform 4.10. |
Added extended selectors support for SR-IOV Device Plugin resources with Helm chart. | |
Added WhereAbouts IP reconciler support. | |
Added BlueField2 NICs support for SR-IOV operator. | |
1.1.0 | Added support for OpenShift Container Platform 4.9. |
Added support for Network Operator upgrade from v1.0.0. | |
Added support for Kubernetes POD Security Policy. | |
Added support for Kubernetes >= 1.17 and <=1.22. | |
Added the ability to propagate nodeAffinity property from the NicClusterPolicy to Network Operator dependencies. | |
1.0.0 | Added Node Feature Discovery that can be used to mark nodes with NVIDIA SR-IOV NICs. |
Added support for different networking models:
| |
Added Kubernetes cluster scale-up support. | |
Published Network Operator image at NGC. | |
Added support for Kubernetes >= 1.17and <=1.21. |
General Support
Upgrade Notes
Version | Notes |
---|---|
1.3.0 | The option of manual gradual upgrade is not supported when upgrading to Network Operator v1.3.0, since all pods are dropped/restarted in case components are deployed into the single namespace when the old namespace is deleted. This could lead to networking connectivity issues during the upgrade procedure. |
1.2.0 |
|
1.1.0 | N/A |
1.0.0 | N/A |
System Requirements
- RDMA capable hardware: NVIDIA ConnectX-5 NIC, or newer
- NVIDIA GPU and driver supporting GPUDirect - e.g Quadro RTX 6000/8000, NVIDIA T4/NVIDIA A100/NVIDIA V100 (GPU-Direct only)
- GPU Operator Version 1.10 (required only for GPUDirect)
- Operating System: Ubuntu 20.04, Ubuntu 22.04, OpenShift Container Platform 4.10. OpenShift Container Platform 4.11
- Container runtime: containerd
Tested Network Adapters
The following network adapters have been tested with the Network Operator:
- NVIDIA A100X
- ConnectX-6 Dx
- ConnectX-7
- BlueField-2 NIC Mode
Prerequisites
Component | Version | Notes |
---|---|---|
Kubernetes | >=1.21 and <=1.26 | - |
Helm | v.3.5+ | For information and methods of Helm installation, please refer to the official Helm Website. |
Component Versions
The following component versions are deployed by the Network Operator:
Component | Version | Comments |
---|---|---|
Node Feature Discovery | v0.10.1 | Optionally deployed. May already be present in the cluster with proper configuration. |
NVIDIA MLNX_OFED driver container | 5.8-1.0.1.1.2 | - |
nv-peer-mem driver container | 1.1-0 | - |
k8s-rdma-shared-device-plugin | v1.3.2 | - |
sriov-network-device-plugin | v3.5.1 | - |
containernetworking CNI | v0.8.7 | - |
whereabouts CNI | V0.5.2 | - |
multus CNI | v3.8 | - |
IPoIB CNI | v1.1.0 | - |
IB Kubernetes | v1.0.2 | - |
Bug Fixes
Version | Description |
---|---|
1.4.0 | Fixed a cluster scale-up issue. |
Fixed an issue with IPoIB CNI deployment in OCP. | |
1.3.0 | N/A |
1.2.0 | N/A |
1.1.0 | Fixed the Whereabouts IPAM plugin to work with Kubernetes v1.22. |
Fixed imagePullSecrets for Network Operator. | |
Enabled resource names for HostDeviceNetwork to be accepted both with and without a prefix. |
Known Limitations
Version | Description |
---|---|
23.1.0 | Only a single PKey can be configured per IPoIB workload pod. |
1.4.0 | The operator upgrade procedure does not reflect configuration changes. The RDMA Shared Device Plugin or SR-IOV Device Plugin should be restarted manually in case of configuration changes. |
The RDMA subsystem could be exclusive or shared only in one cluster. Mixed configuration is not supported. The RDMA Shared Device Plugin requires shared RDMA subsystem. | |
1.3.0 | MOFED container is not a supported configuration on the DGX platform. MOFED container deletion may lead to the driver's unloading: In this case, the mlx5_core kernel driver must be reloaded manually. Network connectivity could be affected if there are only NVIDIA NICs on the node. |
1.2.0 | N/A |
1.1.0 | NicClusterPolicy update is not supported at the moment. |
Network Operator is compatible only with NVIDIA GPU Operator v1.9.0 and above. | |
GPUDirect could have performance degradation if it is used with servers which are not optimized. Please see official GPUDirect documentation here. | |
Persistent NICs configuration for netplan or ifupdown scripts is required for SR-IOV and Shared RDMA interfaces on the host. | |
POD Security Policy admission controller should be enabled to use PSP with Network Operator. Please see Deployment with Pod Security Policy in the Network Operator Documentation for details. | |
1.0.0 | Network Operator is only compatible with NVIDIA GPU Operator v1.5.2 and above. |
Persistent NICs configuration for netplan or ifupdown scripts is required for SR-IOV and Shared RDMA interfaces on the host. |