NCCL Release 2.8.4
This is the NCCL 2.8.4 release notes. For previous NCCL release notes, refer to the NCCL Archives.
Compatibility
Key Features and Enhancements
- 
                                 Added support for Zhaoxin CPUs 
Known Issues
Send/receive operations have a number of limitations:
- 
                              Using send/receive operations in combination to launch work on multiple GPUs from a single process can fail or hang if the GPUs process different amounts of data. Setting NCCL_LAUNCH_MODE=PARALLEL can work around the issue, but can also cause other problems. For more information, see the NCCL User Guide section Troubleshooting > Known Issues > Concurrency Between NCCL and CUDA calls. 
Fixed Issues
- 
                                 Fixed hang for some imbalanced send/recv operation (alltoallv).