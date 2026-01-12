4693674 Description: Performance degradation may occur when running MPI_Alltoall with two processes on systems using ConnectX-8 HCAs.

Workaround: Set the environment variable UCX_CUDA_COPY_BW=9500MBps to restore performance.

Keywords: Performance; MPI: collectives; GPU

Discovered in Version: 2.25

4662365 Description: In certain cases, performance degradation may occur with MPI_Alltoall , MPI_Alltoallv , MPI_Bcast , and MPI_Allgather operations.

Workaround: Set the environment variable UCX_RNDV_THRESH=auto,inter:12kb to restore performance.

Keywords: Performance; MPI; Collectives

Discovered in Version: 2.25

4546016 Description: HCOLL is not supported on GB200 and GB300 systems, with no plans for future support

Workaround: N/A

Keywords: HCOLL; GB200; GB300

Discovered in Version: 2.24

4422894 Description: When sending relatively large amounts of data, the following error may occur: rc_verbs_iface.c:128 send completion with error: remote access error

Workaround: To avoid this issue, exclude the rc_v transport from the list of available transports by setting the environment variable as follows: UCX_TLS=^rc_v

Keywords: UCX; remote access error

Discovered in Version: 2.23

4177839 Description: When using a bidirectional traffic pattern in setups with 4 NICs, each connected to a separate NUMA node, a 10% degradation in bandwidth performance is observed compared to the full wire speed (FWS).

Workaround: Use a single lane for the RNDV operation instead of the default two lanes by setting the following environment variable: UCX_MAX_RNDV_LANES=1

Keywords: NUMA; BW; FWS

Discovered in Version: 2.22.x

3995982 Description: GPU device variables (obtained from cudaGetSymbolAddress ) are not supported for MPI send/receive operations. Passing a pointer to a device variable may lead to a segmentation fault.

Workaround: Copy the contents of the device buffer to a bounce buffer allocated by cudaMalloc , and use that bounce buffer for communication.

Keywords: cudaGetSymbolAddress ; cudaMalloc ; segmentation fault; bounce buffer

Discovered in Version: 2.21.0

4050321 Description: Significant bandwidth degradation occurs when the Global VA feature is enabled (by setting UCX_GVA_ENABLE=y ), due to the failure to utilize PCIe relaxed ordering.

Workaround: Avoid setting UCX_GVA_ENABLE=y to prevent potential bandwidth degradation.

Keywords: Global VA; GVA; ODP

Discovered in Version: 2.21.0

4097336 Description: Enabling HW DCS (by setting UCX_DC_MLX5_TX_POLICY=dcs_hybrid ) may cause the application to hang due to an issue with scheduling work on DC initiator QPs.

Workaround: Avoid setting UCX_DC_MLX5_TX_POLICY=dcs_hybrid to prevent potential application hangs.

Keywords: DC; DCS; hang

Discovered in Version: 2.21.0

4139280 Description: Asynchronously allocated CUDA memory may not work correctly with the gdr_copy transport, potentially resulting in an error such as: gdr_copy_md.c:139 UCX ERROR gdr_pin_buffer failed.

Workaround: Set the UCX_TLS=^gdr_copy environment variable to disable gdr_copy transport.

Keywords: gdr_copy; memory registration; Stream Ordered CUDA Allocator

Discovered in Version: 2.21.0

4026461 Description: UCX atomic operations on Grace CPU may fail with Remote Access error.

Workaround: Disable DevX and KSM memory registration by setting UCX_IB_MLX5_DEVX=no

Keywords: Atomic; Grace

Discovered in Version: 2.20.0

3884209 Description: In certain scenarios, a significant performance degradation can be observed due to excessive memory registrations.

Workaround: Switch back to legacy protocols implementation by setting UCX_PROTO_ENABLE=n

Keywords: UCC, Performance

Discovered in Version: 2.19.0

3606732 Description: In some cases, when using CUDA buffers for intra-node transfers, the program may crash with an assertion ` offset <= key->b_len ' failed in cuda_ipc . This happens due to a conflict between cuda_ipc and gdrcopy memory registration on the same buffer. In other cases, the error message " gdr_map failed " can be printed.

Workaround: N/A

Keywords: gdr_copy, cuda_ipc

Discovered in Version: 2.17.0

3586369 Description: When UD transport is being used explicitly, the MPI or SHMEM job may hang during cleanup or MPI_Finalize , while waiting for UCX endpoint flush operation to complete.

Workaround: Disable adaptive progress optimization by setting the environment variable UCX_ADAPTIVE_PROGRESS=n , or don't select UD transport explicitly.

Keywords: Hang, UD, Flush

Discovered in Version: 2.17.0

3653404 Description: When registering a large memory region with ucp_mem_map() , and peer failure handling support is enabled on the UCX endpoint, the process may crash with the error "LRU push returned Unsupported operation" while sending a buffer belonging to that region. The issue happens because multi-threaded registration is being used for large regions, and it does not work well with peer failure support.

Workaround: Disable multi-thread registration by setting the environment variable " UCX_REG_MT_THRESH=inf ".

Keywords: Multi-Threaded, Indirect, Key Registration

Discovered in Version: 2.17.0

3606445 Description: The performance of osu_mbw_mr for some message sizes can be worse than the previous release. This can happen because of different default protocol thresholds.

Workaround: Revert to previous thresholds selection logic by setting the environment variable to UCX_PROTO_ENABLE=n

Keywords: Performance, osu_mbw_mr

Discovered in Version: 2.17.0

- Description: In order to get the best performance when running on ConnectX-7 NDR400 fabric, the following parameter should be set with mpirun. mpirun -x UCX_MAX_RNDV_LANES=4 -x UCX_RNDV_THRESH=20k …

Workaround: N/A

Keywords: ConnectX-7; UCX; mpirun

Discovered in Version: 2.11 (UCX 1.13)

- Description: Once the TCP detects a “Connection reset by a peer” failure on a connection, it stops sending data, and the MPI/SHMEM application hangs. Error printouts from the UCP/UCT can be seen in the log.

Workaround: On small scale cases, change the "UCX_TLS=tcp" to "UCX_TLS=sm,tcp" parameter. On larger scales this workaround is not applicable.

Keywords: UCX hang

Discovered in Version: 2.9 (UCX 1.11)

- Description: NCCL plugin works only with NCCL v2.8 or higher.

Workaround: Build plugin version v2.0 from the following source. https://github.com/Mellanox/nccl-rdma-sharp-plugins/tree/v2.0.x

Keywords: NCCL Plugin

Discovered in Version: 2.7 (NCCL 2.1)

- Description: UD timeout error may appear.

Workaround: Set UCX_UD_TIMEOUT=120 (the default is 30 seconds) Disable the UD transport and use DC instead. Set UCX_TLS=dc_x,self,sm

Keywords: UD, DC, timeout, UCX

Discovered in Version: 2.7 (UCX 1.9)

- Description: When using GPU memory on an InfiniBand network with GPUDirect enabled yet without gdrcopy library, performance of small messages can be low.

Workaround: Use the Rendezvous protocol by setting the UCX_RNDV_THRESH parameter to 0.

Keywords: GPU, GPUDirect, memory

Discovered in Version: 2.6 (UCX 1.8)

3672903/Github 4105 Description: Adaptive Routing is not supported when used with OpenSHMEM applications. (Github issue: https://github.com/openucx/ucx/issues/4105)

Workaround: Enable strong synchronization by adding -mca spml_ucx_strong_sync 2 parameter to oshrun command.

Keywords: Adaptive Routing, AR, OpenSHMEM, OSHMEM

Discovered in Version: 2.5 (OpenSHMEM 1.4)

- Description: When UCX requires more memory utilization than the memory space defined in /proc/sys/kernel/shmmni file, the following message is printed from UCX: “... total number of segments in the system (%lu) would exceed the limit in /proc/sys/kernel/shmmni (=%lu)... please check shared memory limits by 'ipcs -l”.

Workaround: Follow the instructions in the error message above and increase the value of shared memory segments in /proc/sys/kernel/shmmni file.

Keywords: UCX, memory

Discovered in Version: 2.1 (UCX 1.3)

1162 Description: UCX currently does not support canceling send requests. (Github issue: https://github.com/openucx/ucx/issues/1162)

Workaround: N/A

Keywords: UCX

Discovered in Version: 2.0

- Description: UCX job hangs with SocketDirect/MultiHost/SR-IOV.

Workaround: Set UCX_IB_ADDR_TYPE=ib_global

Keywords: UCX

- Description: As UCX embedded in the HPC-X is compiled with AVX support, UCX cannot be run on hosts without AVX support. In case the AVX is not available, recompile the UCX that is available in the HPC-X with the option: --with-avx=no

Workaround: Recompile UCX with AVX disabled: $ ./utils/hpcx_rebuild.sh --rebuild-ucx --ucx-extra-config "--with-avx=no"