NVIDIA HPC-X Software Toolkit Rev 2.21.0

Bug Fixes in this Version

Internal Reference Number

Issue

4088373

Description: Fixed an issue where the application could fail with the error: symbol lookup error: libuct_cuda_gdrcopy.so.0: undefined symbol: gdr_get_info_v2, caused by a version mismatch in the gdrcopylibrary.

Keywords: gdrcopy

Discovered in Release: 2.20.0

Fixed in Release: 2.21.0

4025026

Description: Fixed a FETCH_ADD remote access error for ODP regions.

Keywords: Atomic operations; ODP; UCX

Discovered in Release: 2.19.0 (UCX 1.17)

Fixed in Release: 2.21.0

3955117

Description: Fixed an issue where a segmentation fault could take place in applications using cuda_ipc transport across multiple UCX contexts, due to incorrect handling of the connectivity map data structure.

Keywords: cuda_ipc; segfault

Discovered in Release: 2.19.0

Fixed in Release: 2.21.0

3763160

Description: Fixed an issue where MPI_Init experienced significant delays when there were many files in the /tmp directory. This occurred due to the use of the inotify mechanism for synchronization with a statistics monitoring tool.

Keywords: VFS; MPI_Init

Discovered in Release: 2.17.0

Fixed in Release: 2.21.0

3664432

Description: Fixed an issue where a multi-threaded MPI application using its own lock to synchronize MPI calls could experience crashes or data corruption, even when calling MPI_Init_thread with MPI_THREAD_SERIALIZED mode. The problem was caused by incorrect synchronization of the BlueFlame register.

Keywords: Data corruption; segfault; crash; multi-thread; MPI_THREAD_SERIALIZED

Discovered in Release: Open MPI 4.1

Fixed in Release: 2.21.0

3819771

Description: Fixed the issue where in certain scenarios, RDMA operations involving CUDA memory could encounter a failure, resulting in the following error: UCX ERROR ibv_reg_dmabuf_mr(address=0xfff939e00000, length=16, access=0xf) failed: Invalid argument.

Keywords: DMA buffer, memory registration, ibv_reg_dmabuf_mr

Discovered in Release: 2.19.0

Fixed in Release: 2.21.0

© Copyright 2024, NVIDIA. Last updated on Jan 21, 2025.