Bug Fixes History
4025026
Description: Fixed a FETCH_ADD remote access error for ODP regions.
Keywords: Atomic operations; ODP; UCX
Discovered in Release: 2.19.0 (UCX 1.17)
Fixed in Release: 2.21.0
3955117
Description: Fixed an issue where a segmentation fault could take place in applications using cuda_ipc transport across multiple UCX contexts, due to incorrect handling of the connectivity map data structure.
Keywords: cuda_ipc; segfault
Discovered in Release: 2.19.0
Fixed in Release: 2.21.0
3763160
Description: Fixed an issue where MPI_Init experienced significant delays when there were many files in the /tmp directory. This occurred due to the use of the inotify mechanism for synchronization with a statistics monitoring tool.
Keywords: VFS; MPI_Init
Discovered in Release: 2.17.0
Fixed in Release: 2.21.0
3664432
Description: Fixed an issue where a multi-threaded MPI application using its own lock to synchronize MPI calls could experience crashes or data corruption, even when calling MPI_Init_thread with MPI_THREAD_SERIALIZED mode. The problem was caused by incorrect synchronization of the BlueFlame register.
Keywords: Data corruption; segfault; crash; multi-thread; MPI_THREAD_SERIALIZED
Discovered in Release: Open MPI 4.1
Fixed in Release: 2.21.0
3819771
Description: Fixed the issue where in certain scenarios, RDMA operations involving CUDA memory could encounter a failure, resulting in the following error:
Keywords: DMA buffer, memory registration,
Discovered in Release: 2.19.0
Fixed in Release: 2.21.0
3653404
Description: When registering a large memory region
Keywords: Multi-Threaded, Indirect, Key Registration
Discovered in Release: 2.17.0
Fixed in Release: 2.18.0
3837556
Description: Fixed UCX to not create an SRQ on RDMA network devices that do not support it. Before this fix, the application could fail with the error message "ibv_create_srq() failed: Operation not supported".
Keywords: SRQ, UCX
Discovered in Release: 2.17.0
Fixed in Release: 2.18.0
3774158
Description: Fixed a failure with the message "Local length error". The issue is caused by some compilers replacing direct assignments with memmove() function, leading to corruption while writing to IO memory.
Keywords: UCX, Local length error
Discovered in Release: 2.17.0
Fixed in Release: 2.18.0
3774153
Description: Fixed the issue where in some cases, there could be a race condition between RDMA_WRITE and shared memory write, leading to MPI receiving invalid data with large messages or collective operations between ranks on the same node.
Keywords: RDMA_WRITE
Discovered in Release: 2.17.0
Fixed in Release: 2.18.0
3762227
Description: Fixed the issue where the application may crash in UCX remote key packing procedure after failed memory registration.
Keywords: UCX, assertion
Discovered in Release: 2.17.1
Fixed in Release: 2.18.0
3748762
Description: Fixed the issue where the application may crash in UCX remote key packing procedure after failed memory registration.
Keywords: UCX, assertion
Discovered in Release: 2.17.1
Fixed in Release: 2.18.0
3712109
Description: Fixed UCC error in PyTorch 23.12 from HPC-X 2.17.0 upgrade
Keywords: UCC error, PyTorch, Upgrade
Discovered in Release: 2.17.0
Fixed in Release: 2.18.0
3436244
Description: On rare occasions, a 'group join' request may reach a timeout.
Keywords: NDR Switch, SHARP
Discovered in Version: 2.16
Fixed in Version: 2.16.2
3479712
Description: In virtualized environments, the performance of large messages can drop due to repeated failures to create indirect-atomic key (KSM).
Keywords: Virtualized Environments; Failure; Indericet-atomic Key; KSM;
Discovered in Version: 2.15
Fixed in Version: 2.16
3268964
Description: Improved performance in MPI_Bcast on AMD Genoa.
Note: To make use of these improvements, make sure UCC is explicitly enabled using:
Keywords: MPI_Bcast; AMD Genoa; UCC
Discovered in Version: 2.14
Fixed in Version: 2.15
3255925
Description: Fixed the issue where mpi_init was creating an internal CUDA context on GPU0, which could have an impact on CUDA applications behavior.
Keywords: CUDA; MPI
Discovered in Version: 2.13
Fixed in Version: 2.14
3223214
Description: Fixed the issue where shmem_ulong_wait_until() unsigned comparison was not working as expected.
Keywords: SHMEM
Discovered in Version: 2.13
Fixed in Version: 2.14
3261844
Description: Fixed the issue of when TCP transport was used on RDMA-capable setup, this led to lower performance and occasional hangs during mpi_finalize.
Keywords: TCP; RDMA; MPI; performance
Discovered in Version: 2.13
Fixed in Version: 2.13.1 LTS
3139906
Description: Port counters were not updated for UCX traffic when creating QP with DevX.
Keywords: UCX; QP; DevX
Discovered in Version: 2.13
Fixed in Version: 2.13.1 LTS
3084053
Description: Fixed the issue where performance of some applications was lower compared with HPC-X v2.10 and earlier.
Keywords: Performance
Discovered in Version: 2.12
Fixed in Version: 2.13
3163697
Description: Fixed the issue of when the client application used more than 1024 file descriptors (range limit defined by FD_SETSIZE), libsharp was prevented from using any more file descriptors. Using poll() instead of select() enables using the full range of allowed file descriptors by Linux.
Keywords: File descriptor; libsharp; HCOLL; HPC-X
Discovered in Version: 2.12
Fixed in Version: 2.13
3208615
Description: Fixed Data Integrity failure in Broadcast when using sparse subarray data type in OMPI with hcoll library by using the TRUE extent of the datatype, which includes any additional padding the datatype may require.
Keywords: OMPI; HCOLL; data integrity
Discovered in Version: 1.12
Fixed in Version: 2.13
4549
Description: Fixed the issue where UCX may have failed to compile with Clang compiler version 9 if
(Github issue: https://github.com/openucx/ucx/issues/4549)
Keywords: Clang compiler, UCX
Discovered in Version: 2.6 (UCX 1.8)
Fixed in Version: 2.11 (UCX 1.13)
-
Description: DevX does not work on architectures without "Write combining" support, such as some flavors of ARM, prompting the following error message.
Keywords: DevX, UCX, ARM
Discovered in Version: 2.8 (UCX 1.10)
Fixed in Version: 2.9 (UCX 1.11)
-
Description: NVIDIA SHARP library is not available in HPC-X for the Community OFED and Inbox OFED.
Keywords: NVIDIA SHARP library
Discovered in Version: 2.0
Fixed in Version: 2.9 (UCX 1.11)
2190337
Description: Fixed the issue where errors from the UCX TCP transport about refused connection may have appeared.
Keywords: UCX_TLS, UCX, TCP
Discovered in Version: 2.7 (UCX 1.9)
Fixed in Version: 2.8 (UCX 1.10)
2131893
Description: Fixed the issue where OpenSHMEM or MPI applications may have failed with the following error:
This could happen when running in heterogeneous environment, such as when different nodes in the job had different types of HCAs or PCI atomics configuration.
Keywords: OpenSHMEM, UCX, MPI
Discovered in Version: 2.7 (UCX 1.9)
Fixed in Version: 2.8 (UCX 1.10)
2084450
Description: Fixed the issue where the osu_ialltoallw and osu_iallgather benchmarks may have not performed well over RoCE with the ud_x transport starting messages of 8192 bytes.
Keywords: osu_ialltoallwת osu_iallgather, ud_x transport, RoCE, UCX
Discovered in Version: 2.6 (UCX 1.8)
Fixed in Version: 2.8 (UCX 1.10)
1886580
Description: Fixed the issue where the below error messages might have been received when running OMPI with ‘direct modex’, i.e. when the following command line parameters were used:
Error messages:
Keywords: OMPI, pmix, direct modex, full modex
Discovered in Version: 2.5 (OpenMPI 4.0.x)
Fixed in Version: 2.7 (OpenMPI 4.0.x)
4710
Description: Fixed the issue of when using UCX with XPMEM module on Kernels 4.10 and above, there might have been a "Bus error" due to an issue in the XPMEM driver.
(Github issue: https://github.com/openucx/ucx/issues/4710)
Keywords: UCX, XPMEM
Discovered in Version: 2.6 (UCX 1.8)
Fixed in Version: 2.7 (UCX 1.9)
2096036
Description: Fixed the issue where the verifier test may have failed with the following error when using the ud_x transport:
Keywords: ud_x transport, UCX
Discovered in Version: 2.6 (UCX 1.8)
Fixed in Version: 2.7 (UCX 1.9)
2095618
Description: Fixed the issue where the host may have run out of memory when enabling Hardware Tag-Matching.
Keywords: Hardware Tag-Matching, UCX
Discovered in Version: 2.6 (UCX 1.8)
Fixed in Version: 2.7 (UCX 1.9)
3758
Description: Fixed the issue of when running UCX with TCP transport on more than 16 hosts with full PPN (processes per node), the following error message might have appeared.
(Github issue: https://github.com/openucx/ucx/issues/3758)
Keywords: TCP, UCX, backlog
Discovered in Version: 2.5 (UCX 1.7)
Fixed in Version: 2.6 (UCX 1.8)
1582208
Description: Fixed the issue where sending data over multiple SHMEM contexts may lead to memory corruption or segmentation fault.
Keywords: Open SHMEM, segmentation fault
Discovered in Version: 2.3 (Open MPI v4.0.x, OpenSHMEM v1.4)
Fixed in Version: 2.5 (Open MPI v4.0.x, OpenSHMEM v1.4)
2934
Description: Fixed the issue where OpenMPI and OpenSHMEM applications may hang with DC transport.
(Github issue: https://github.com/openucx/ucx/issues/2934)
Keywords: UCX, Open MPI, DC
Discovered in Version: 2.3 (Open MPI v4.0.x, OpenSHMEM v1.4)
Fixed in Version: 2.5 (Open MPI v4.0.x, OpenSHMEM v1.4)
1307243
Description: Fixed the issue where one-sided tests may fail with a segmentation fault.
Keywords: OSC UCX, Open MPI, one-sided
Discovered in Version: 2.1 (Open MPI 3.1.x)
Fixed in Version: 2.5 (Open MPI 4.0.x)
-
Description: Fixed the issue where OpenSHMEM atomic operations AND/OR/XOR for datatypes int32/int64/uint32/uint64 were not implemented, which might have caused build failures.
Keywords: OpenSHMEM atomic, Open MPI
Discovered in Version: 2.3 (Open MPI v4.0.x, OpenSHMEM v1.4)
Fixed in Version: 2.4 (Open MPI v4.0.x, OpenSHMEM v1.4)
2226
Description: Fixed the issue where the following assertion may have failed in certain cases:
Assertion `ep->rx.ooo_pkts.head_sn == neth->psn' failed
(Gihub issue: https://github.com/openucx/ucx/issues/2226)
Keywords: UCX, assertion
Discovered in Version: 2.1 (UCX 1.3)
Fixed in Version: 2.4 (UCX 1.6)
-
Description: Fixed the issue where zero-length OpenSHMEM collectives might have failed due to incomplete implementation.
Keywords: OpenSHMEM atomic, Open MPI
Discovered in Version: 2.3 (Open MPI v4.0.x, OpenSHMEM v1.4)
Fixed in Version: 2.4 (Open MPI v4.0.x, OpenSHMEM v1.4)
-
Description: Fixed the issue where OSC UCX module was not selected by default on ConnectX-4/ConnectX-5 HCAs.
Keywords: OSC UCX, one-sided, Open MPI
Discovered in Version: 2.3 (Open MPI v4.0.x, OpenSHMEM v1.4)
Fixed in Version: 2.4 (Open MPI v4.0.x, OpenSHMEM v1.4)
-
Description: Fixed the issue where using UCX on ARM hosts may result in hangs due to a known issue in Open MPI when running on ARM.
Keywords: UCX
Discovered in Version: 1.3 (Open MPI 1.8.2)
Fixed in Version: 2.3 (Open MPI 4.0.x)
-
Description: MCA options rmaps_dist_device and rmaps_base_mapping_policy are now functional.
Keywords: Process binding policy, NUMA/HCA locality
Discovered in Version: 2.0 (Open MPI 3.0.0)
Fixed in Version: 2.3 (Open MPI 4.0.x)
2111
Description: Fixed the issue of when UCX was used in the multi-threaded mode, it might have taken the osu_latency_mt test a long time to be completed.
(Github issue: https://github.com/openucx/ucx/issues/2111)
Keywords: UCX, multi-threaded
Discovered in Version: 2.1 (UCX 1.3)
Fixed in Version: 2.3 (UCX 1.5)
2267
Description: Fixed the issue where the following error message might have appeared when running at the scale of 256 ranks with the RC transport, when UD is used for wireup only:
“ Fatal: send completion with error: Endpoint timeout ”.
(Github issue: https://github.com/openucx/ucx/issues/2267)
Keywords: UCX
Discovered in Version: 2.1 (UCX 1.3)
Fixed in Version: 2.3 (UCX 1.5)
2702
Description: Fixed the issue of when using the Hardware Tag Matching feature, the following error messages may have been printed:
(Github issue: https://github.com/openucx/ucx/issues/2702)
Keywords: Hardware Tag Matching
Discovered in Version: 2.2 (UCX 1.4)
Fixed in Version: 2.3 (UCX 1.5)
2454
Description: Fixed the issue where some one-sided benchmarks may have hung when using “osc ucx”.
For example: osu-micro-benchmarks-5.3.2/osu_get_acc_latency (Latency Test for accumulate with Active/Passive Synchronization).
(Github issue: https://github.com/openucx/ucx/issues/2454)
Keywords: UCX, one_sided
Discovered in Version: 2.2 (UCX 1.4)
Fixed in Version: 2.3 (UCX 1.5)
2670
Description: Fixed the issue of when enabling the Hardware Tag Matching feature on a large scale, the following error message may have been printed due to the increased threshold for BCOPY messages:
“mpool.c:177 UCX ERROR Failed to allocate memory pool chunk: Out of memory.”
(Github issue: https://github.com/openucx/ucx/issues/2670)
Keywords: Hardware Tag Matching
Discovered in Version: 2.2 (UCX 1.4)
Fixed in Version: 2.3 (UCX 1.5)
1295679
Description: Fixed the issue where OpenSHMEM group cache had a default limit of 100 entries, which might have resulted in OpenSHMEM application exiting with the following message: “ group cache overflow on rank xxx: cache_size = 100 ”.
Keywords: OpenSHMEM, Open MPI
Discovered in Version: 2.1 (Open MPI 3.1.x)
Fixed in Version: 2.2 (Open MPI 3.1.x)
-
Description: Fixed the issue where UCX did not work out-of-the-box with CUDA support.
Keywords: UCX, CUDA
Discovered in Version: 2.2 (UCX 1.4)
Fixed in Version: 2.1 (UCX 1.3)
1926
Description: Fixed the issue of when using multiple transports, invalid data was sent out-of-sync with Hardware Tag Matching traffic.
(Github issue: https://github.com/openucx/ucx/issues/1926)
Keywords: Hardware Tag Matching
Discovered in Version: 2.1 (UCX 1.3)
Fixed in Version: 2.2 (UCX 1.4)
1949
Description: Fixed the issue where Hardware Tag Matching might not have functioned properly with UCX over DC transport.
(Github issue: https://github.com/openucx/ucx/issues/1949)
Keywords: UCX, Hardware Tag Matching, DC transport
Discovered in Version: 2.0
Fixed in Version: 2.1
-
Description: Fixed job data transfer from SD to libsharp.
Keywords: NVIDIA SHARP library
Discovered in Release: 1.9
Fixed in Release: 1.9.7
884482
Description: Fixed internal HCOLL datatype mapping.
Keywords: HCOLL, FCA
Discovered in Release: 1.7.405
Fixed in Release: 1.7.406
884508
Description: Fixed internal HCOLL datatype lower bound calculation.
Keywords: HCOLL, FCA
Discovered in Release: 1.7.405
Fixed in Release: 1.7.406
884490
Description: Fixed allgather unpacking issues.
Keywords: HCOLL, FCA
Discovered in Release: 1.7.405
Fixed in Release: 1.7.406
885009
Description: Fixed wrong answer in alltoallv.
Keywords: HCOLL, FCA
Discovered in Release: 1.7.405
Fixed in Release: 1.7.406
882193
Description: Fixed mcast group leak in HCOLL.
Keywords: HCOLL, FCA
Discovered in Release: 1.7.405
Fixed in Release: 1.7.406
-
Description: Added IN_PLACE support for alltoall, alltoallv, and allgatherv.
Keywords: HCOLL, FCA
Discovered in Release: 1.7.405
Fixed in Release: 1.7.406
-
Description: Fixed an issue related to multi-threaded MPI_Bcast.
Keywords: HCOLL, FCA
Discovered in Release: 1.7.405
Fixed in Release: 1.7.406
Salesforce: 316541
Description: Fixed a memory barrier issue in MPI_Barrier on Power PPC systems.
Keywords: HCOLL, FCA
Discovered in Release: 1.7.405
Fixed in Release: 1.7.406
Salesforce: 316547
Description: Fixed multi-threaded MPI_COMM_DUP and MPI_COMM_SPLIT hanging issues.
Keywords: HCOLL, FCA
Discovered in Release: 1.7.405
Fixed in Release: 1.7.406
894346
Description: Fixed Quantum Espresso hanging issues.
Keywords: HCOLL, FCA
Discovered in Release: 1.7.405
Fixed in Release: 1.7.406
898283
Description: Fixed an issue which caused CP2K applications to hang when HCOLL was enabled.
Keywords: HCOLL, FCA
Discovered in Release: 1.7.405
Fixed in Release: 1.7.406
906155
Description: Fixed an issue which caused VASP applications to hang in MPI_Allreduce.
Keywords: HCOLL, FCA
Discovered in Release: 1.6
Fixed in Release: 1.7.406