DGL Release 24.11
This DGL container release is intended for use on the NVIDIA® Hopper Architecture GPU, NVIDIA H100, the NVIDIA® Ampere Architecture GPU, NVIDIA A100, and the associated NVIDIA CUDA® 12 and NVIDIA cuDNN 9 libraries.
Contents of the DGL container
This container image contains the complete source of the version of DGL in /opt/dgl/dgl-source
. It is pre-built and installed as a system Pyton module.
The container includes the following:
- DGL 2.5.
- RAPIDS 24.10
- This container also contains WholeGraph 24.08 with NVSHMEM support. WholeGraph is a part of the NVIDIA RAPIDS library which provides an underlying graph storage structure to enhance GNN training, especially optimized for NVIDIA hardware.
- NVIDIA CUDA® 12.6.3
- NVIDIA cuBLAS 12.6.4.1
- NVIDIA cuDNN 9.5.1.17
- NVIDIA NCCL 2.23.4
- Apex
- rdma-core 39.0
- NVIDIA HPC-X 2.21
- OpenMPI 4.1.7
- GDRCopy 2.4.1
- TensorBoard 2.12.0
- Nsight Compute 2024.3.2.3
- Nsight Systems 2024.6.1.90
- NVIDIA TensorRT™ 10.6.0.26
- Torch-TensorRT 2.6.0.a0
- NVIDIA DALI® 1.43
- MAGMA 2.6.2
- JupyterLab 2.3.2 including Jupyter-TensorBoard
- PyTorch quantization wheel v2.1.2
- TransformerEngine v1.12
- NVSHMEM 2.10.1
GPU Requirements
Release 24.11 supports CUDA compute capability 6.0 and later. This corresponds to GPUs in the NVIDIA Pascal, NVIDIA Volta™, NVIDIA Turing™, NVIDIA Ampere architecture, and NVIDIA Hopper™ architecture families. For a list of GPUs to which this compute capability corresponds, see CUDA GPUs. For additional support details, see Deep Learning Frameworks Support Matrix.
Key Features and Enhancements
This DGL release includes the following key features and enhancements.
- Starting with the 24.07 release, the DGL container supports distributed in-memory sampling and feature gathering with WholeGraph. It can be easily integrated with DGL GraphBolt dataloader, enabling out-of-the-box distributed GNN training.
- DGL GraphBolt does not depend on the deprecated torchdata package anymore.
- DGL GraphBolt changes the CUDA memory allocation configuration to reduce memory footprint.
Announcements
Volta GPU compute architecture support will be discontinued by the 25.01 release..
NVIDIA DGL Container Versions
The following table shows what versions of Ubuntu, CUDA, DGL, and TensorRT are supported in each NVIDIA containers for DGL. For older container versions, refer to the Frameworks Support Matrix.
Container Version | Ubuntu | CUDA Toolkit | DGL | PyTorch |
---|---|---|---|---|
24.11 | 24.04 | NVIDIA CUDA 12.6.3 | 2.5 | 24.11 |
24.09 | 22.04 | NVIDIA CUDA 12.6.1 | 2.4 | 24.09 |
24.07 | NVIDIA CUDA 12.5.1 | 2.3 | 24.07 | |
24.05 | NVIDIA CUDA 12.4.1 | 2.2 | 24.05 | |
24.04 | 2.1+e1f7738 | 24.04 | ||
24.03 | NVIDIA CUDA 12.4.0.41 | 2.1+7c51cd16 | 24.03 | |
24.01 | NVIDIA CUDA 12.3.2 | 1.2+c660f5c | 24.01 | |
23.11 | NVIDIA CUDA 12.3.0 | 1.1.2 | 23.11 | |
23.09 | NVIDIA CUDA 12.2.1 | 1.1.2 | 23.09 | |
23.07 | NVIDIA CUDA 12.1.1 | 1.1.1 | 23.07 |
Known Issues
- When cpu sampling is enabled (
use_uva=False and num_workers>0
), DGL sampling process would initialize cuda instance (issue-6561), which could result in a segmentation fault with the current cuda driver in the container. -
The tensors that are used as node features must be contiguous and cannot be views of other tensors when the
use_uva
flag is set toTrue
in thedgl.dataloading.Dataloader
class.When you attempt to use a graph with a non-contiguous or view tensors for edata or ndata, a
DGLError
will occur.