Overview

The qualification tool analyzes Spark events generated from CPU-based Spark applications to help quantify the expected acceleration of migrating a Spark application or query to GPU.

The tool first analyzes the CPU event log and determines which operators are likely to run on the GPU. The tool then uses estimates from historical queries and benchmarks to estimate a speed-up at an individual operator level to calculate how much a specific operator would accelerate on GPU for the specific query or application. It calculates an Estimated GPU App Duration by adding up the accelerated operator durations along with durations that could not run on GPU because they are unsupported operators or not SQL/Dataframe.

This tool is intended to give the users a starting point and does not guarantee that the queries or applications with the highest recommendation will be accelerated the most. Currently, it reports by looking at the amount of time spent in tasks of SQL Dataframe operations. Note that the qualification tool estimates assume that the application is run on a dedicated cluster where it can use all of the available Spark resources.

The estimations for GPU duration are available for different environments and are based on benchmarks run in the applicable environments. The following table lists the cluster information used to run the benchmarks.

In addition to GPU estimates, the tool optionally provides optimized RAPIDS configurations based on the worker’s information (see Auto-Tuner Support).

Cluster information for the ETL benchmarks used for the estimate. Note that all benchmarks were run using the NDS benchmark at SF3K (3 TB).

Environment

CPU Cluster

GPU Cluster

On-prem 8x 128-core 8x 128-core + 8x A100 40 GB
Dataproc (T4) 4x n1-standard-32 4x n1-standard-32 + 8x T4 16GB
Dataproc (L4) 8x n1-standard-16 8x g2-standard-16
EMR 8x m5d.8xlarge 4x g4dn.12xlarge
Databricks AWS 8x m6gd.8xlage 8x g5.8xlarge
Databricks Azure 8x E8ds_v4 8x NC8as_T4_v3
Important

Estimates provided by the qualification tool are based on the currently supported “SparkPlan” or “Executor Nodes” used in the application. It currently does not handle all the expressions or datatypes used. Please refer to Understanding Execs report section and the Supported operators guide to check the types and expressions you are using are supported.

Auto-Tuner aims at optimizing Apache Spark applications by recommending a set of configurations to tune the performance of Rapids accelerator.

Currently, the Auto-Tuner calculates a set of configurations that impact the performance of Apache Spark apps executing on GPU. Those calculations can leverage cluster information (e.g. memory, cores, Spark default configurations) as well as information processed in the application event logs. Note that the tool also will recommend settings for the application assuming that the job will be able to use all the cluster resources (CPU and GPU) when it is running.

RAPIDS Accelerator for Apache Spark CLI tool

The simplest way to run the Qualification tool. In running the Qualification tool standalone on Spark event logs, the tool can be run as a user tool command via a pip package for CSP environments (Google Dataproc, AWS EMR, Databricks-AWS, and Databricks-Azure) in addition to on-prem.

The tool output the applications recommended for acceleration along with estimated speed-up and cost saving metrics. Additionally, it provides a set of tuning recommendations specifically tailored for Spark applications running on GPU clusters, as part of the default output from the Auto-Tuner feature. For more information on running the Qualification tool from the pip-package, visit the quick start guide

Java API

The java API can be used for other environments that are not supported by the CLI tool.

This allows it to run in three different ways:

  1. As a standalone tool on the Spark event logs after the application(s) have run,

  2. To be integrated into a running Spark application using explicit API calls, and

  3. to install a Spark listener that can output results on a per SQL query basis.

Previous Overview
Next Quickstart
© Copyright 2024, NVIDIA. Last updated on Apr 23, 2024.