Spark3 GPU Configuration Guide on Yarn 3.2.1#

Following files recommended to be configured to enable GPU scheduling on Yarn 3.2.1 and later.

GPU resource discovery script - /usr/lib/spark/scripts/gpu/getGpusResources.sh:

mkdir -p /usr/lib/spark/scripts/gpu/
cd /usr/lib/spark/scripts/gpu/
wget https://raw.githubusercontent.com/apache/spark/master/examples/src/main/scripts/getGpusResources.sh
chmod a+rwx -R /usr/lib/spark/scripts/gpu/

Spark config - /etc/spark/conf/spark-default.conf:

spark.rapids.sql.concurrentGpuTasks=2
spark.executor.resource.gpu.amount=1
spark.executor.cores=8
spark.task.cpus=1
spark.task.resource.gpu.amount=0.125
spark.rapids.memory.pinnedPool.size=2G
spark.executor.memoryOverhead=2G
spark.plugins=com.nvidia.spark.SQLPlugin
spark.executor.extraJavaOptions='-Dai.rapids.cudf.prefer-pinned=true'
spark.executor.resource.gpu.discoveryScript=/usr/lib/spark/scripts/gpu/getGpusResources.sh # this match the location of discovery script
spark.sql.files.maxPartitionBytes=512m

Yarn Scheduler config - /etc/hadoop/conf/capacity-scheduler.xml:

<configuration>
  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
  </property>
</configuration>

Yarn config - /etc/hadoop/conf/yarn-site.xml:

<configuration>
  <property>
    <name>yarn.nodemanager.resource-plugins</name>
    <value>yarn.io/gpu</value>
  </property>
  <property>
     <name>yarn.resource-types</name>
     <value>yarn.io/gpu</value>
  </property>
  <property>
     <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name>
     <value>auto</value>
  </property>
  <property>
     <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name>
     <value>/usr/bin</value>
  </property>
  <property>
     <name>yarn.nodemanager.linux-container-executor.cgroups.mount</name>
     <value>true</value>
  </property>
  <property>
     <name>yarn.nodemanager.linux-container-executor.cgroups.mount-path</name>
     <value>/sys/fs/cgroup</value>
  </property>
  <property>
     <name>yarn.nodemanager.linux-container-executor.cgroups.hierarchy</name>
     <value>yarn</value>
  </property>
  <property>
    <name>yarn.nodemanager.container-executor.class</name>
    <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
  </property>
  <property>
    <name>yarn.nodemanager.linux-container-executor.group</name>
    <value>yarn</value>
  </property>
</configuration>

/etc/hadoop/conf/container-executor.cfg - user yarn as service account:

yarn.nodemanager.linux-container-executor.group=yarn

#--Original container-exectuor.cfg Content--

[gpu]
module.enabled=true
[cgroups]
root=/sys/fs/cgroup
yarn-hierarchy=yarn

Need to share node manager local dir to all user, run below in bash:

chmod a+rwx -R /sys/fs/cgroup/cpu,cpuacct
chmod a+rwx -R /sys/fs/cgroup/devices
local_dirs=$(bdconfig get_property_value \
    --configuration_file /etc/hadoop/conf/yarn-site.xml \
    --name yarn.nodemanager.local-dirs 2>/dev/null)
mod_local_dirs=${local_dirs//\,/ }
chmod a+rwx -R ${mod_local_dirs}

In the end, restart node manager and resource manager service:

On all workers:

sudo systemctl restart hadoop-yarn-nodemanager.service

On all masters:

sudo systemctl restart hadoop-yarn-resourcemanager.service

Note

If cgroup is mounted on tmpfs and a node is rebooted, the cgroup directory permission gets reverted. Please check the cgroup documentation for your platform for more details.

Below is one example of how this can be handled:

Update the cgroup permissions:

chmod a+rwx -R /sys/fs/cgroup/cpu,cpuacct
chmod a+rwx -R /sys/fs/cgroup/devices

Or the operation can be added in the systemd scripts:

Create mountCgroup scripts:

sudo bash -c "cat >/etc/systemd/system/mountCgroup.service" <<EOF
[Unit]
Description=startup
[Service]
ExecStart=/etc/mountCgroup.sh
Type=oneshot
[Install]
WantedBy=multi-user.target
EOF

sudo bash -c "cat >/etc/mountCgroup.sh" <<EOF
#!/bin/sh
chmod a+rwx -R /sys/fs/cgroup/cpu,cpuacct
chmod a+rwx -R /sys/fs/cgroup/devices
EOF

sudo chmod 644 /etc/systemd/system/mountCgroup.service
sudo chmod 655 /etc/mountCgroup.sh

Then start the mountCgroup service:

systemctl enable mountCgroup.service
systemctl start mountCgroup.service