NVIDIA Docs Hub Homepage NVIDIA Clara Welcome to NVIDIA Parabricks v4.7.0 Grace Hopper Superchip

Grace Hopper Superchip

In 2023 Parabricks announced full optimization and support for the groundbreaking NVIDIA Grace Hopper Superchip. The NVIDIA GH200 Grace Hopper Superchip combines the NVIDIA Grace and Hopper architectures using NVIDIA NVLink-C2C to deliver a CPU+GPU coherent memory model for accelerated Artificial Intelligence (AI) and High Performance Computing (HPC) applications. This integration represents a significant leap forward in genomic data analysis, allowing researchers to tackle complex analyses with unprecedented speed and efficiency. The NVIDIA Grace Hopper Superchip is the first true heterogeneous accelerated platform for HPC and AI workloads. It accelerates applications with the strengths of both GPUs and CPUs while providing the simplest and most productive programming model for performance, portability, and productivity.

Key Features of the NVIDIA Grace Hopper Superchip GH200

Feature	Description
Grace CPU cores (number)	Up to 72 cores
CPU LPDDR5X bandwidth (GB/s)	Up to 500GB/s
GPU HBM bandwidth (GB/s)	4TB/s HBM3, 4.9TB/s HBM3e
NVLink-C2C bandwidth (GB/s)	900GB/s total, 450GB/s per direction
CPU LPDDR5X capacity (GB)	Up to 480GB
GPU HBM capacity (GB)	96GB HBM3, 144GB HBM3e
PCIe Gen 5 Lanes	64x

By harnessing the immense computational capabilities of the NVIDIA Grace Hopper Superchip, users can experience even greater acceleration and throughput for their genomic pipelines.

Optimizations and Performance on GH200

All tools in the Parabricks suite leverage the strengths of the NVIDIA GH200 Grace Hopper Superchip to maximize performance:

Vectorized Instructions: Parabricks tools utilize Grace CPU-specific vectorized instructions to accelerate computational tasks.
NVLink-C2C Interconnect: The high-bandwidth NVLink-C2C link minimizes latency for GPU-CPU data transfers, optimizing hybrid workflows.
Higher TDP and Dynamic Power Sharing: The GH200's elevated Thermal Design Power (TDP) supports sustained performance under heavy workloads. Power is dynamically allocated between the Grace CPU and Hopper GPU based on real-time workload demands.
High CPU Core Density: The Arm-based Grace CPU provides up to 72 cores per GPU, reducing CPU bottlenecks in hybrid workloads.

Example Performance: The deepvariant_germline tool processed a 30X Illumina dataset 1.4x faster on an NVIDIA GH200 Grace Hopper Superchip (480GB unified memory) compared to a system with one NVIDIA H100 NVL GPU.

Documentation

All tools and pipelines from Parabricks 4.5.1-1 are now optimized and supported on the NVIDIA Grace Hopper Superchip, therefore we refer the users and developers to the Tool Reference.

Performance Tuning

To achieve optimal performance for all Parabricks tools on the NVIDIA Grace CPU we refer the users and developers to the Grace CPU benchmarking guide. This guide will illustrate recommendations and best practices directly related to the NVIDIA Grace CPU and help you realize the best possible performance for your particular system.

Getting Started

Parabricks is available as a multi-architecture container, the same Docker command works across systems - whether using an NVIDIA Grace Hopper Superchip or x86 nodes:
Copy

Copied!
```
            
              $ docker pull nvcr.io/nvidia/clara/clara-parabricks:4.5.1-1
        
```
and follow the Tutorials.
For any questions or support, please visit the NVIDIA Parabricks Community. Join a vibrant community of researchers and experts to exchange ideas, seek assistance, and stay updated on the latest developments in genomic data analysis.
To learn more about the NVIDIA Grace Hopper Superchip please visit here.

Previous Getting the Best Performance

Next Uninstalling the software