Is this page helpful?

AutoTuner Configuration#

The RAPIDS Accelerator tools include an AutoTuner module that automatically generates optimized Spark configuration recommendations for GPU clusters. The AutoTuner can be customized through two types of YAML configuration files to better match your specific cluster and workload requirements.

These configuration options are available for both the Qualification and Profiling tools, and can be used with either the Tools JAR or the Tools CLI.

Target Cluster Information#

The --target-cluster-info argument provides a platform-aware way to specify cluster configuration. It accepts simplified cluster information such as instance types, which the tool uses to automatically determine system specifications.

Usage#

When using Tools JAR: --target-cluster-info /path/to/targetCluster.yaml
When using Tools CLI: --target_cluster_info /path/to/targetCluster.yaml

Examples#

For more sample target cluster configuration files, see the targetClusterInfo samples directory in the GitHub repository.

Example 1: CSP configuration with instance type#

# Simple CSP configuration using instance type.
# The tool will automatically determine system specifications based
# on the instance type.
workerInfo:
  instanceType: g2-standard-24

Dataproc n1-standard instances support 1, 2, or 4 GPUs, so the GPU count must be specified explicitly in the configuration:

Example 2: Dataproc n1-standard with explicit GPU count#

workerInfo:
  instanceType: n1-standard-16
  gpu:
    count: 1

Example 3: OnPrem configuration with custom Spark properties#

# OnPrem configuration with explicit resource specifications
# and Spark property controls.
workerInfo:
  cpuCores: 8
  memoryGB: 40
  gpu:
    count: 1
    name: l4
sparkProperties:
  # Enforced properties override AutoTuner recommendations
  enforced:
    spark.rapids.sql.concurrentGpuTasks: 2
    spark.executor.cores: 8
  # Properties preserved from the source application
  preserve:
    - spark.sql.shuffle.partitions
    - spark.sql.files.maxPartitionBytes
  # Properties excluded from AutoTuner recommendations
  exclude:
    - spark.rapids.shuffle.multiThreaded.reader.threads
    - spark.rapids.shuffle.multiThreaded.writer.threads
    - spark.rapids.sql.multiThreadedRead.numThreads

Note

The sparkProperties section is optional but allows fine-grained control over which properties are enforced, preserved from the source cluster, or excluded from tuning recommendations.

Instance Types by Platform#

The following table shows the default instance types used when --target-cluster-info is not provided, as well as the supported instance types that can be specified in the target cluster configuration file:

Default and Supported Instance Types#
Platform	Default Instance Type	Supported Instance Types
EMR	g6.4xlarge	G6 series: g6.xlarge, g6.2xlarge, g6.4xlarge, g6.8xlarge, g6.12xlarge, g6.16xlarge
Databricks AWS	g5.8xlarge	G5 series: g5.xlarge, g5.2xlarge, g5.4xlarge, g5.8xlarge, g5.12xlarge, g5.16xlarge
Databricks Azure	Standard_NC8as_T4_v3	Standard_NC*as_T4_v3 series: Standard_NC4as_T4_v3, Standard_NC8as_T4_v3, Standard_NC16as_T4_v3, Standard_NC64as_T4_v3
Dataproc	g2-standard-16	g2-standard series and n1-standard series with GPU attachments. For n1-standard instances, GPU count must be specified explicitly (see Example 2)
Dataproc-GKE	g2-standard-16	g2-standard series and n1-standard series with GPU attachments. For n1-standard instances, GPU count must be specified explicitly (see Example 2)
Dataproc-Serverless	g2-standard-16	g2-standard series
OnPrem	16 cores with L4 GPU	Any configuration using `cpuCores`, `memoryGB`, and `gpu` properties (see Example 3)

Note

Support for additional instance types is planned for future releases. In the meantime, for unsupported instance types on CSP platforms, you can use the OnPrem configuration format by specifying cpuCores, memoryGB, and gpu properties directly (see Example 3).

Custom Tuning Configurations#

The --tuning-configs argument allows you to override default AutoTuner tuning parameters. The AutoTuner uses a set of predefined constants for calculations such as memory allocation, GPU task concurrency, and partition sizing. You can customize these values to match your specific workload requirements.

Usage#

When using Tools JAR: --tuning-configs /path/to/custom.yaml
When using Tools CLI: --tuning_configs /path/to/custom.yaml

Example#

The default tuning configuration parameters and their descriptions are available in the tuningConfigs.yaml file. For more examples, see the customTuningConfigs.yaml in the GitHub repository.

Example custom tuning configuration file#

# Custom tuning configurations override default AutoTuner parameters.
# Only specify parameters that need to be changed from defaults.
# Description and usedBy fields are optional.
default:
  - name: CONC_GPU_TASKS
    max: 1
  - name: HEAP_PER_CORE
    default: 0.8g

Note

When providing custom tuning configurations, you only need to specify the parameters you want to override. All other parameters will use their default values.

Platform-Specific Tuning Plugins#

Beyond the global defaults and user-supplied overrides, the Auto-Tuner ships with built-in plugins that tailor recommendations to specific platforms and data sources. These plugins fire automatically when the tool detects a matching context in the event log and layer platform-aware rules on top of the base recommendations.

Examples of built-in plugins include:

EMR: adds Spark JVM options (spark.driver.extraJavaOptions and spark.executor.extraJavaOptions) that disable Transparent Huge Pages for applications running on Amazon EMR.
Delta Lake (OSS): recommends enabling GPU-accelerated Delta Lake writes via spark.rapids.sql.format.delta.write.enabled and emits version-compatibility and support comments when Delta Lake is detected in the application.

The plugin framework is internal to the tools and does not require end-user configuration. To add a new platform or data-source plugin, refer to the TuningPluginTrait extension points under core/src/main/scala/com/nvidia/spark/rapids/tool/tuning/plugins/ in the spark-rapids-tools repository.