Spark3 GPU Configuration Guide on Yarn 3.2.1#

Following files recommended to be configured to enable GPU scheduling on Yarn 3.2.1 and later.

GPU resource discovery script - /usr/lib/spark/scripts/gpu/getGpusResources.sh:

1mkdir -p /usr/lib/spark/scripts/gpu/
2cd /usr/lib/spark/scripts/gpu/
3wget https://raw.githubusercontent.com/apache/spark/master/examples/src/main/scripts/getGpusResources.sh
4chmod a+rwx -R /usr/lib/spark/scripts/gpu/

Spark config - /etc/spark/conf/spark-default.conf:

 1spark.rapids.sql.concurrentGpuTasks=2
 2spark.executor.resource.gpu.amount=1
 3spark.executor.cores=8
 4spark.task.cpus=1
 5spark.task.resource.gpu.amount=0.125
 6spark.rapids.memory.pinnedPool.size=2G
 7spark.executor.memoryOverhead=2G
 8spark.plugins=com.nvidia.spark.SQLPlugin
 9spark.executor.extraJavaOptions='-Dai.rapids.cudf.prefer-pinned=true'
10spark.executor.resource.gpu.discoveryScript=/usr/lib/spark/scripts/gpu/getGpusResources.sh # this match the location of discovery script
11spark.sql.files.maxPartitionBytes=512m

Yarn Scheduler config - /etc/hadoop/conf/capacity-scheduler.xml:

1<configuration>
2  <property>
3    <name>yarn.scheduler.capacity.resource-calculator</name>
4    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
5  </property>
6</configuration>

Yarn config - /etc/hadoop/conf/yarn-site.xml:

 1<configuration>
 2  <property>
 3    <name>yarn.nodemanager.resource-plugins</name>
 4    <value>yarn.io/gpu</value>
 5  </property>
 6  <property>
 7     <name>yarn.resource-types</name>
 8     <value>yarn.io/gpu</value>
 9  </property>
10  <property>
11     <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name>
12     <value>auto</value>
13  </property>
14  <property>
15     <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name>
16     <value>/usr/bin</value>
17  </property>
18  <property>
19     <name>yarn.nodemanager.linux-container-executor.cgroups.mount</name>
20     <value>true</value>
21  </property>
22  <property>
23     <name>yarn.nodemanager.linux-container-executor.cgroups.mount-path</name>
24     <value>/sys/fs/cgroup</value>
25  </property>
26  <property>
27     <name>yarn.nodemanager.linux-container-executor.cgroups.hierarchy</name>
28     <value>yarn</value>
29  </property>
30  <property>
31    <name>yarn.nodemanager.container-executor.class</name>
32    <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
33  </property>
34  <property>
35    <name>yarn.nodemanager.linux-container-executor.group</name>
36    <value>yarn</value>
37  </property>
38</configuration>

/etc/hadoop/conf/container-executor.cfg - user yarn as service account:

1yarn.nodemanager.linux-container-executor.group=yarn
2
3#--Original container-exectuor.cfg Content--
4
5[gpu]
6module.enabled=true
7[cgroups]
8root=/sys/fs/cgroup
9yarn-hierarchy=yarn

Need to share node manager local dir to all user, run below in bash:

1chmod a+rwx -R /sys/fs/cgroup/cpu,cpuacct
2chmod a+rwx -R /sys/fs/cgroup/devices
3local_dirs=$(bdconfig get_property_value \
4    --configuration_file /etc/hadoop/conf/yarn-site.xml \
5    --name yarn.nodemanager.local-dirs 2>/dev/null)
6mod_local_dirs=${local_dirs//\,/ }
7chmod a+rwx -R ${mod_local_dirs}

In the end, restart node manager and resource manager service:

On all workers:

sudo systemctl restart hadoop-yarn-nodemanager.service

On all masters:

sudo systemctl restart hadoop-yarn-resourcemanager.service

Note

If cgroup is mounted on tmpfs and a node is rebooted, the cgroup directory permission gets reverted. Please check the cgroup documentation for your platform for more details.

Below is one example of how this can be handled:

Update the cgroup permissions:

1chmod a+rwx -R /sys/fs/cgroup/cpu,cpuacct
2chmod a+rwx -R /sys/fs/cgroup/devices

Or the operation can be added in the systemd scripts:

Create mountCgroup scripts:

 1sudo bash -c "cat >/etc/systemd/system/mountCgroup.service" <<EOF
 2[Unit]
 3Description=startup
 4[Service]
 5ExecStart=/etc/mountCgroup.sh
 6Type=oneshot
 7[Install]
 8WantedBy=multi-user.target
 9EOF
10
11sudo bash -c "cat >/etc/mountCgroup.sh" <<EOF
12#!/bin/sh
13chmod a+rwx -R /sys/fs/cgroup/cpu,cpuacct
14chmod a+rwx -R /sys/fs/cgroup/devices
15EOF
16
17sudo chmod 644 /etc/systemd/system/mountCgroup.service
18sudo chmod 655 /etc/mountCgroup.sh

Then start the mountCgroup service:

1systemctl enable mountCgroup.service
2systemctl start mountCgroup.service