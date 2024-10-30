3653404 Description: When registering a large memory region with ucp_mem_map() , and peer failure handling support is enabled on the UCX endpoint, the process may crash with the error "LRU push returned Unsupported operation" while sending a buffer belonging to that region. The issue happens because multi-threaded registration is being used for large regions, and it does not work well with peer failure support.

Keywords: Multi-Threaded, Indirect, Key Registration

Discovered in Release: 2.17.0

Fixed in Release: 2.18.0

3837556 Description: Fixed UCX to not create an SRQ on RDMA network devices that do not support it. Before this fix, the application could fail with the error message "ibv_create_srq() failed: Operation not supported".

Keywords: SRQ, UCX

Discovered in Release: 2.17.0

Fixed in Release: 2.18.0

3774158 Description: Fixed a failure with the message "Local length error". The issue is caused by some compilers replacing direct assignments with memmove() function, leading to corruption while writing to IO memory.

Keywords: UCX, Local length error

Discovered in Release: 2.17.0

Fixed in Release: 2.18.0

3774153 Description: Fixed the issue where in some cases, there could be a race condition between RDMA_WRITE and shared memory write, leading to MPI receiving invalid data with large messages or collective operations between ranks on the same node.

Keywords: RDMA_WRITE

Discovered in Release: 2.17.0

Fixed in Release: 2.18.0

3762227 Description: Fixed the issue where the application may crash in UCX remote key packing procedure after failed memory registration.

Keywords: UCX, assertion

Discovered in Release: 2.17.1

Fixed in Release: 2.18.0

3712109 Description: Fixed UCC error in PyTorch 23.12 from HPC-X 2.17.0 upgrade

Keywords: UCC error, PyTorch, Upgrade

Discovered in Release: 2.17.0

Fixed in Release: 2.18.0

3436244 Description: On rare occasions, a 'group join' request may reach a timeout.

Keywords: NDR Switch, SHARP

Discovered in Version: 2.16

Fixed in Version: 2.16.2

3479712 Description: In virtualized environments, the performance of large messages can drop due to repeated failures to create indirect-atomic key (KSM).

Keywords: Virtualized Environments; Failure; Indericet-atomic Key; KSM;

Discovered in Version: 2.15

Fixed in Version: 2.16

3268964 Description: Improved performance in MPI_Bcast on AMD Genoa. Note: To make use of these improvements, make sure UCC is explicitly enabled using: --mca coll_ucc_enable 1 --mca coll_ucc_priority 99 --mca coll ucc,basic,libnbc --mca coll_ucc_cls basic,hier

Keywords: MPI_Bcast; AMD Genoa; UCC

Discovered in Version: 2.14

Fixed in Version: 2.15

3255925 Description: Fixed the issue where mpi_init was creating an internal CUDA context on GPU0, which could have an impact on CUDA applications behavior.

Keywords: CUDA; MPI

Discovered in Version: 2.13

Fixed in Version: 2.14

3223214 Description: Fixed the issue where shmem_ulong_wait_until() unsigned comparison was not working as expected.

Keywords: SHMEM

Discovered in Version: 2.13

Fixed in Version: 2.14

3261844 Description: Fixed the issue of when TCP transport was used on RDMA-capable setup, this led to lower performance and occasional hangs during mpi_finalize.

Keywords: TCP; RDMA; MPI; performance

Discovered in Version: 2.13

Fixed in Version: 2.13.1 LTS

3139906 Description: Port counters were not updated for UCX traffic when creating QP with DevX.

Keywords: UCX; QP; DevX

Discovered in Version: 2.13

Fixed in Version: 2.13.1 LTS

3084053 Description: Fixed the issue where performance of some applications was lower compared with HPC-X v2.10 and earlier.

Keywords: Performance

Discovered in Version: 2.12

Fixed in Version: 2.13

3163697 Description: Fixed the issue of when the client application used more than 1024 file descriptors (range limit defined by FD_SETSIZE), libsharp was prevented from using any more file descriptors. Using poll() instead of select() enables using the full range of allowed file descriptors by Linux.

Keywords: File descriptor; libsharp; HCOLL; HPC-X

Discovered in Version: 2.12

Fixed in Version: 2.13

3208615 Description: Fixed Data Integrity failure in Broadcast when using sparse subarray data type in OMPI with hcoll library by using the TRUE extent of the datatype, which includes any additional padding the datatype may require.

Keywords: OMPI; HCOLL; data integrity

Discovered in Version: 1.12

Fixed in Version: 2.13

4549 Description: Fixed the issue where UCX may have failed to compile with Clang compiler version 9 if --dynamic-list-data flag was used in the compilation. (Github issue: https://github.com/openucx/ucx/issues/4549)

Keywords: Clang compiler, UCX

Discovered in Version: 2.6 (UCX 1.8)

Fixed in Version: 2.11 (UCX 1.13)

- Description: DevX does not work on architectures without "Write combining" support, such as some flavors of ARM, prompting the following error message. UCX ERROR mlx5dv_devx_alloc_uar() failed: Operation not supported

Keywords: DevX, UCX, ARM

Discovered in Version: 2.8 (UCX 1.10)

Fixed in Version: 2.9 (UCX 1.11)

- Description: NVIDIA SHARP library is not available in HPC-X for the Community OFED and Inbox OFED.

Keywords: NVIDIA SHARP library

Discovered in Version: 2.0

Fixed in Version: 2.9 (UCX 1.11)

2190337 Description: Fixed the issue where errors from the UCX TCP transport about refused connection may have appeared.

Keywords: UCX_TLS, UCX, TCP

Discovered in Version: 2.7 (UCX 1.9)

Fixed in Version: 2.8 (UCX 1.10)

2131893 Description: Fixed the issue where OpenSHMEM or MPI applications may have failed with the following error: “Fatal: endpoint reconfiguration not supported yet” This could happen when running in heterogeneous environment, such as when different nodes in the job had different types of HCAs or PCI atomics configuration.

Keywords: OpenSHMEM, UCX, MPI

Discovered in Version: 2.7 (UCX 1.9)

Fixed in Version: 2.8 (UCX 1.10)

2084450 Description: Fixed the issue where the osu_ialltoallw and osu_iallgather benchmarks may have not performed well over RoCE with the ud_x transport starting messages of 8192 bytes.

Keywords: osu_ialltoallwת osu_iallgather, ud_x transport, RoCE, UCX

Discovered in Version: 2.6 (UCX 1.8)

Fixed in Version: 2.8 (UCX 1.10)

1886580 Description: Fixed the issue where the below error messages might have been received when running OMPI with ‘direct modex’, i.e. when the following command line parameters were used: -mca pmix_base_async_modex 1 -mca mpi_add_procs_cutoff 0 -mca pmix_base_collect_data 0 Error messages: PMIX ERROR: NOT-FOUND in file server/pmix_server_get.c at line 751

PMIX ERROR: NOT-FOUND in file client/pmix_client_get.c at line 334

Keywords: OMPI, pmix, direct modex, full modex

Discovered in Version: 2.5 (OpenMPI 4.0.x)

Fixed in Version: 2.7 (OpenMPI 4.0.x)

4710 Description: Fixed the issue of when using UCX with XPMEM module on Kernels 4.10 and above, there might have been a "Bus error" due to an issue in the XPMEM driver. (Github issue: https://github.com/openucx/ucx/issues/4710)

Keywords: UCX, XPMEM

Discovered in Version: 2.6 (UCX 1.8)

Fixed in Version: 2.7 (UCX 1.9)

2096036 Description: Fixed the issue where the verifier test may have failed with the following error when using the ud_x transport: ib_mlx5_log.c:139 Local QP operation on mlx5_0:1/IB (synd 0x2 vend 0x68 hw_synd 0/66) ib_mlx5_log.c:139 UD QP 0x37161 wqe[368]: SEND --- [rqpn 0x36a01 rlid 93] [inl len 16]

Keywords: ud_x transport, UCX

Discovered in Version: 2.6 (UCX 1.8)

Fixed in Version: 2.7 (UCX 1.9)

2095618 Description: Fixed the issue where the host may have run out of memory when enabling Hardware Tag-Matching.

Keywords: Hardware Tag-Matching, UCX

Discovered in Version: 2.6 (UCX 1.8)

Fixed in Version: 2.7 (UCX 1.9)

3758 Description: Fixed the issue of when running UCX with TCP transport on more than 16 hosts with full PPN (processes per node), the following error message might have appeared. sock.c:228 UCX ERROR recv(fd=1377) failed: 104 (Github issue: https://github.com/openucx/ucx/issues/3758)

Keywords: TCP, UCX, backlog

Discovered in Version: 2.5 (UCX 1.7)

Fixed in Version: 2.6 (UCX 1.8)

1582208 Description: Fixed the issue where sending data over multiple SHMEM contexts may lead to memory corruption or segmentation fault.

Keywords: Open SHMEM, segmentation fault

Discovered in Version: 2.3 (Open MPI v4.0.x, OpenSHMEM v1.4)

Fixed in Version: 2.5 (Open MPI v4.0.x, OpenSHMEM v1.4)

2934 Description: Fixed the issue where OpenMPI and OpenSHMEM applications may hang with DC transport. (Github issue: https://github.com/openucx/ucx/issues/2934)

Keywords: UCX, Open MPI, DC

Discovered in Version: 2.3 (Open MPI v4.0.x, OpenSHMEM v1.4)

Fixed in Version: 2.5 (Open MPI v4.0.x, OpenSHMEM v1.4)

1307243 Description: Fixed the issue where one-sided tests may fail with a segmentation fault.

Keywords: OSC UCX, Open MPI, one-sided

Discovered in Version: 2.1 (Open MPI 3.1.x)

Fixed in Version: 2.5 (Open MPI 4.0.x)

- Description: Fixed the issue where OpenSHMEM atomic operations AND/OR/XOR for datatypes int32/int64/uint32/uint64 were not implemented, which might have caused build failures.

Keywords: OpenSHMEM atomic, Open MPI

Discovered in Version: 2.3 (Open MPI v4.0.x, OpenSHMEM v1.4)

Fixed in Version: 2.4 (Open MPI v4.0.x, OpenSHMEM v1.4)

2226 Description: Fixed the issue where the following assertion may have failed in certain cases: Assertion `ep->rx.ooo_pkts.head_sn == neth->psn' failed (Gihub issue: https://github.com/openucx/ucx/issues/2226)

Keywords: UCX, assertion

Discovered in Version: 2.1 (UCX 1.3)

Fixed in Version: 2.4 (UCX 1.6)

- Description: Fixed the issue where zero-length OpenSHMEM collectives might have failed due to incomplete implementation.

Keywords: OpenSHMEM atomic, Open MPI

Discovered in Version: 2.3 (Open MPI v4.0.x, OpenSHMEM v1.4)

Fixed in Version: 2.4 (Open MPI v4.0.x, OpenSHMEM v1.4)

- Description: Fixed the issue where OSC UCX module was not selected by default on ConnectX-4/ConnectX-5 HCAs.

Keywords: OSC UCX, one-sided, Open MPI

Discovered in Version: 2.3 (Open MPI v4.0.x, OpenSHMEM v1.4)

Fixed in Version: 2.4 (Open MPI v4.0.x, OpenSHMEM v1.4)

- Description: Fixed the issue where using UCX on ARM hosts may result in hangs due to a known issue in Open MPI when running on ARM.

Keywords: UCX

Discovered in Version: 1.3 (Open MPI 1.8.2)

Fixed in Version: 2.3 (Open MPI 4.0.x)

- Description: MCA options rmaps_dist_device and rmaps_base_mapping_policy are now functional.

Keywords: Process binding policy, NUMA/HCA locality

Discovered in Version: 2.0 (Open MPI 3.0.0)

Fixed in Version: 2.3 (Open MPI 4.0.x)

2111 Description: Fixed the issue of when UCX was used in the multi-threaded mode, it might have taken the osu_latency_mt test a long time to be completed. (Github issue: https://github.com/openucx/ucx/issues/2111)

Keywords: UCX, multi-threaded

Discovered in Version: 2.1 (UCX 1.3)

Fixed in Version: 2.3 (UCX 1.5)

2267 Description: Fixed the issue where the following error message might have appeared when running at the scale of 256 ranks with the RC transport, when UD is used for wireup only: “ Fatal: send completion with error: Endpoint timeout ”. (Github issue: https://github.com/openucx/ucx/issues/2267)

Keywords: UCX

Discovered in Version: 2.1 (UCX 1.3)

Fixed in Version: 2.3 (UCX 1.5)

2702 Description: Fixed the issue of when using the Hardware Tag Matching feature, the following error messages may have been printed: “rcache.c:481 UCX WARN failed to register region 0xdec25a0 [0x2b7139ae0020..0x2b7139ae2020]: Input/output error”

“ucp_mm.c:105 UCX ERROR failed to register address 0x2b7139ae0020 length 8192 on md[1]=ib/mlx5_0: Input/output error”

“ucp_request.c:259 UCX ERROR failed to register user buffer datatype 0x20 address 0x2b7139ae0020 len 8192: Input/output error” (Github issue: https://github.com/openucx/ucx/issues/2702)

Keywords: Hardware Tag Matching

Discovered in Version: 2.2 (UCX 1.4)

Fixed in Version: 2.3 (UCX 1.5)

2454 Description: Fixed the issue where some one-sided benchmarks may have hung when using “osc ucx”. For example: osu-micro-benchmarks-5.3.2/osu_get_acc_latency (Latency Test for accumulate with Active/Passive Synchronization). (Github issue: https://github.com/openucx/ucx/issues/2454)

Keywords: UCX, one_sided

Discovered in Version: 2.2 (UCX 1.4)

Fixed in Version: 2.3 (UCX 1.5)

2670 Description: Fixed the issue of when enabling the Hardware Tag Matching feature on a large scale, the following error message may have been printed due to the increased threshold for BCOPY messages: “mpool.c:177 UCX ERROR Failed to allocate memory pool chunk: Out of memory.” (Github issue: https://github.com/openucx/ucx/issues/2670)

Keywords: Hardware Tag Matching

Discovered in Version: 2.2 (UCX 1.4)

Fixed in Version: 2.3 (UCX 1.5)

1295679 Description: Fixed the issue where OpenSHMEM group cache had a default limit of 100 entries, which might have resulted in OpenSHMEM application exiting with the following message: “ group cache overflow on rank xxx: cache_size = 100 ”.

Keywords: OpenSHMEM, Open MPI

Discovered in Version: 2.1 (Open MPI 3.1.x)

Fixed in Version: 2.2 (Open MPI 3.1.x)

- Description: Fixed the issue where UCX did not work out-of-the-box with CUDA support.

Keywords: UCX, CUDA

Discovered in Version: 2.2 (UCX 1.4)

Fixed in Version: 2.1 (UCX 1.3)

1926 Description: Fixed the issue of when using multiple transports, invalid data was sent out-of-sync with Hardware Tag Matching traffic. (Github issue: https://github.com/openucx/ucx/issues/1926)

Keywords: Hardware Tag Matching

Discovered in Version: 2.1 (UCX 1.3)

Fixed in Version: 2.2 (UCX 1.4)

1949 Description: Fixed the issue where Hardware Tag Matching might not have functioned properly with UCX over DC transport. (Github issue: https://github.com/openucx/ucx/issues/1949)

Keywords: UCX, Hardware Tag Matching, DC transport

Discovered in Version: 2.0

Fixed in Version: 2.1

- Description: Fixed job data transfer from SD to libsharp.

Keywords: NVIDIA SHARP library

Discovered in Release: 1.9

Fixed in Release: 1.9.7

884482 Description: Fixed internal HCOLL datatype mapping.

Keywords: HCOLL, FCA

Discovered in Release: 1.7.405

Fixed in Release: 1.7.406

884508 Description: Fixed internal HCOLL datatype lower bound calculation.

Keywords: HCOLL, FCA

Discovered in Release: 1.7.405

Fixed in Release: 1.7.406

884490 Description: Fixed allgather unpacking issues.

Keywords: HCOLL, FCA

Discovered in Release: 1.7.405

Fixed in Release: 1.7.406

885009 Description: Fixed wrong answer in alltoallv.

Keywords: HCOLL, FCA

Discovered in Release: 1.7.405

Fixed in Release: 1.7.406

882193 Description: Fixed mcast group leak in HCOLL.

Keywords: HCOLL, FCA

Discovered in Release: 1.7.405

Fixed in Release: 1.7.406

- Description: Added IN_PLACE support for alltoall, alltoallv, and allgatherv.

Keywords: HCOLL, FCA

Discovered in Release: 1.7.405

Fixed in Release: 1.7.406

- Description: Fixed an issue related to multi-threaded MPI_Bcast.

Keywords: HCOLL, FCA

Discovered in Release: 1.7.405

Fixed in Release: 1.7.406

Salesforce: 316541 Description: Fixed a memory barrier issue in MPI_Barrier on Power PPC systems.

Keywords: HCOLL, FCA

Discovered in Release: 1.7.405

Fixed in Release: 1.7.406

Salesforce: 316547 Description: Fixed multi-threaded MPI_COMM_DUP and MPI_COMM_SPLIT hanging issues.

Keywords: HCOLL, FCA

Discovered in Release: 1.7.405

Fixed in Release: 1.7.406

894346 Description: Fixed Quantum Espresso hanging issues.

Keywords: HCOLL, FCA

Discovered in Release: 1.7.405

Fixed in Release: 1.7.406

898283 Description: Fixed an issue which caused CP2K applications to hang when HCOLL was enabled.

Keywords: HCOLL, FCA

Discovered in Release: 1.7.405

Fixed in Release: 1.7.406

906155 Description: Fixed an issue which caused VASP applications to hang in MPI_Allreduce.

Keywords: HCOLL, FCA

Discovered in Release: 1.6