Bug Fixes in this Version
| 
     Ref #  | 
                
     Description  | 
        
| 
     3712109  | 
                
     Description: Fixed UCC error in PyTorch 23.12 from HPC-X 2.17.0 upgrade  | 
        
| 
     Keywords: UCC error, PyTorch, Upgrade  | 
        |
| 
     Discovered in Release: 2.17.0  | 
        |
| 
     3653404  | 
                
     Description: When registering a large memory region with ucp_mem_map(), and peer failure handling support is enabled on the UCX endpoint, the process may crash with the error "LRU push returned Unsupported operation" while sending a buffer belonging to that region. The issue happens because multi-threaded registration is being used for large regions, and it does not work well with peer failure support.  | 
        
| 
     Keywords: Multi-Threaded, Indirect, Key Registration  | 
        |
| 
     Discovered in Version: 2.17.0  | 
        |
| 
     3837556  | 
                
     Description: Fixed UCX to not create an SRQ on RDMA network devices that do not support it. Before this fix, the application could fail with the error message "ibv_create_srq() failed: Operation not supported".  | 
        
| 
     Keywords: SRQ, UCX  | 
        |
| 
     Discovered in Release: 2.18.0  | 
        |
| 
     3774158  | 
                
     Description: Fixed a failure with the message "Local length error". The issue is caused by some compilers replacing direct assignments with memmove() function, leading to corruption while writing to IO memory.  | 
        
| 
     Keywords: UCX, Local length error  | 
        |
| 
     Discovered in Release: 2.18.0  | 
        |
| 
     3774153  | 
                
     Description: Fixed the issue where in some cases, there could be a race condition between RDMA_WRITE and shared memory write, leading to MPI receiving invalid data with large messages or collective operations between ranks on the same node.  | 
        
| 
     Keywords: RDMA_WRITE  | 
        |
| 
     Discovered in Release: 2.18.0  | 
        |
| 
     3762227  | 
                
     Description: Fixed the issue where the application may crash in UCX remote key packing procedure after failed memory registration.  | 
        
| 
     Keywords: UCX, assertion  | 
        |
| 
     Discovered in Release: 2.17.1  | 
        |
| 
     3748762  | 
                
     Description: Fixed the issue where the application may crash in UCX remote key packing procedure after failed memory registration.  | 
        
| 
     Keywords: UCX, assertion  | 
        |
| 
     Discovered in Release: 2.17.1  |