User Guide (24.04.01)

The qualification tool analyzes Spark event logs generated from CPU-based Spark applications to determine which applications are a candidate to migrate to the GPU.

The tool analyzes a CPU event log and extracts various metrics to help determine how the workload would run on GPU. The tool then uses data from historical queries and benchmarks to estimate a speed-up at an individual operator level to calculate how a workload would accelerate on GPU. The estimations for GPU duration are available for different environments and are based on benchmarks that were run in the applicable environments. The table below lists the cluster information used to run the benchmarks.

The tool combines the estimation along with other relevant heuristics to qualify workloads for migration to GPU. In addition to generating the qualified workload list, the tool provides two outputs to assist in the migration to GPU:

  • Optimized Spark configs for GPU: the tool calculates a set of configurations that impact the performance of Apache Spark apps executing on GPU. Those calculations can leverage cluster information (e.g. memory, cores, Spark default configurations) as well as information processed in the application event logs. Note that the tool also will recommend settings for the application assuming that the job will be able to use all the cluster resources (CPU and GPU) when it is running.

  • Recommended GPU cluster shape (for CSPs only): the tool will generate a recommended instance type and count along with GPU information that is to be used for the migration.

This tool is intended to give the users a starting point and does not guarantee that the queries or applications with the highest recommendation will be accelerated the most. Currently, it reports by looking at the amount of time spent in tasks of SQL Dataframe operations. Note that the qualification tool estimates assume that the application is run on a dedicated cluster where it can use all of the available Spark resources.

The Qualification tool can be run as a command-line interface via a pip package for CSP environments (Google Dataproc, AWS EMR, Databricks-AWS, and Databricks-Azure) in addition to on-prem environments.

For more information on running the Qualification tool from the pip-package, visit the quick start guide.

Cluster information for the ETL benchmarks used for the estimate. Note that all benchmarks were run using the NDS benchmark at SF3K (3 TB).


CPU Cluster

GPU Cluster

On-prem 8x 128-core 8x 128-core + 8x A100 40 GB
Dataproc (T4) 4x n1-standard-32 4x n1-standard-32 + 8x T4 16GB
Dataproc (L4) 8x n1-standard-16 8x g2-standard-16
EMR 8x m5d.8xlarge 4x g4dn.12xlarge
Databricks AWS 8x m6gd.8xlage 8x g5.8xlarge
Databricks Azure 8x E8ds_v4 8x NC8as_T4_v3
Previous Overview
Next Quickstart
© Copyright 2024, NVIDIA. Last updated on Jun 12, 2024.