Currently, the Auto-Tuner calculates a set of configurations that impact the performance of Apache Spark apps executing on GPU. Those calculations can leverage cluster information (for example, memory, cores, Spark default configurations) as well as information processed in the application event logs. The tool will recommend settings for the application assuming that the job will be able to use all the cluster resources (CPU and GPU) when it’s running. The values loaded from the app logs have higher precedence than the default configs.

The recommendations span several categories, ordered from most to least impactful:

  • RAPIDS plugin & GPU resources (required for GPU execution): spark.plugins (must include com.nvidia.spark.SQLPlugin), spark.rapids.sql.enabled, spark.executor.resource.gpu.amount, spark.task.resource.gpu.amount, and spark.shuffle.manager (RAPIDS Shuffle Manager).

  • Executor sizing: spark.executor.cores, spark.executor.instances, spark.executor.memory, spark.executor.memoryOverhead.

  • GPU runtime: spark.rapids.sql.concurrentGpuTasks, spark.rapids.memory.pinnedPool.size, spark.rapids.sql.batchSizeBytes.

  • Shuffle and AQE: spark.sql.shuffle.partitions, spark.sql.files.maxPartitionBytes, spark.sql.adaptive.advisoryPartitionSizeInBytes, spark.sql.adaptive.coalescePartitions.parallelismFirst.

  • Dynamic allocation: spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors, and spark.dynamicAllocation.maxExecutors, sized against the CPU-to-GPU core ratio. When recommended, the Auto-Tuner enforces minExecutors <= initialExecutors <= maxExecutors.

  • Platform-specific plugins: additional recommendations may be emitted for EMR (JVM options that disable Transparent Huge Pages) and Delta Lake (GPU-accelerated Delta write via spark.rapids.sql.format.delta.write.enabled, plus version-compatibility and support comments).

The Auto-Tuner also tunes secondary properties such as Kryo serialization settings, multi-threaded reader/writer threads, RAPIDS file cache, data locality wait, and platform compatibility flags where applicable.

Note

Auto-Tuner limitations:

  • It’s assumed that all the worker nodes on the cluster are homogenous.

To run the Auto-Tuner, enable the auto-tuner flag. Optionally, provide target cluster information using --target-cluster-info <FILE_PATH> to specify the GPU worker node configuration for generating optimized recommendations. The file path can be local or remote (for example, HDFS).

If the --target-cluster-info argument isn’t supplied, the Auto-Tuner will use platform-specific default worker instance types for tuning recommendations. See AutoTuner Configuration for details on default instance types, supported platforms, and how to customize AutoTuner behavior.