Grace Hopper Superchip
In 2023 Parabricks announced full optimization and support for the groundbreaking NVIDIA Grace Hopper Superchip. The NVIDIA GH200 Grace Hopper Superchip combines the NVIDIA Grace and Hopper architectures using NVIDIA NVLink-C2C to deliver a CPU+GPU coherent memory model for accelerated Artificial Intelligence (AI) and High Performance Computing (HPC) applications. This integration represents a significant leap forward in genomic data analysis, allowing researchers to tackle complex analyses with unprecedented speed and efficiency. The NVIDIA Grace Hopper Superchip is the first true heterogeneous accelerated platform for HPC and AI workloads. It accelerates applications with the strengths of both GPUs and CPUs while providing the simplest and most productive programming model for performance, portability, and productivity.
Feature | Description |
---|---|
Grace CPU cores (number) | Up to 72 cores |
CPU LPDDR5X bandwidth (GB/s) | Up to 500GB/s |
GPU HBM bandwidth (GB/s) | 4TB/s HBM3, 4.9TB/s HBM3e |
NVLink-C2C bandwidth (GB/s) | 900GB/s total, 450GB/s per direction |
CPU LPDDR5X capacity (GB) | Up to 480GB |
GPU HBM capacity (GB) | 96GB HBM3, 144GB HBM3e |
PCIe Gen 5 Lanes | 64x |
By harnessing the immense computational capabilities of the NVIDIA Grace Hopper Superchip, users can experience even greater acceleration and throughput for their genomic pipelines.
All tools in the Parabricks suite leverage the strengths of the NVIDIA GH200 Grace Hopper Superchip to maximize performance:
Vectorized Instructions: Parabricks tools utilize Grace CPU-specific vectorized instructions to accelerate computational tasks.
NVLink-C2C Interconnect: The high-bandwidth NVLink-C2C link minimizes latency for GPU-CPU data transfers, optimizing hybrid workflows.
Higher TDP and Dynamic Power Sharing: The GH200's elevated Thermal Design Power (TDP) supports sustained performance under heavy workloads. Power is dynamically allocated between the Grace CPU and Hopper GPU based on real-time workload demands.
High CPU Core Density: The Arm-based Grace CPU provides up to 72 cores per GPU, reducing CPU bottlenecks in hybrid workloads.
Example Performance: The deepvariant_germline tool processed a 30X Illumina dataset 1.4x faster on an NVIDIA GH200 Grace Hopper Superchip (480GB unified memory) compared to a system with one NVIDIA H100 NVL GPU.
All tools and pipelines from Parabricks 4.5.0-1 are now optimized and supported on the NVIDIA Grace Hopper Superchip, therefore we refer the users and developers to the Tool Reference.
To achieve optimal performance for all Parabricks tools on the NVIDIA Grace CPU we refer the users and developers to the Grace CPU benchmarking guide. This guide will illustrate recommendations and best practices directly related to the NVIDIA Grace CPU and help you realize the best possible performance for your particular system.
Parabricks is available as a multi-architecture container, the same Docker command works across systems - whether using an NVIDIA Grace Hopper Superchip or x86 nodes:
$ docker pull nvcr.io/nvidia/clara/clara-parabricks:4.5.0-1
and follow the Tutorials.
For any questions or support, please visit the NVIDIA Parabricks Community. Join a vibrant community of researchers and experts to exchange ideas, seek assistance, and stay updated on the latest developments in genomic data analysis.
To learn more about the NVIDIA Grace Hopper Superchip please visit here.