Spark3 GPU Configuration Guide on Yarn 3.2.1
Following files recommended to be configured to enable GPU scheduling on Yarn 3.2.1 and later.
GPU resource discovery script - /usr/lib/spark/scripts/gpu/getGpusResources.sh
:
mkdir -p /usr/lib/spark/scripts/gpu/
cd /usr/lib/spark/scripts/gpu/
wget https://raw.githubusercontent.com/apache/spark/master/examples/src/main/scripts/getGpusResources.sh
chmod a+rwx -R /usr/lib/spark/scripts/gpu/
Spark config - /etc/spark/conf/spark-default.conf
:
spark.rapids.sql.concurrentGpuTasks=2
spark.executor.resource.gpu.amount=1
spark.executor.cores=8
spark.task.cpus=1
spark.task.resource.gpu.amount=0.125
spark.rapids.memory.pinnedPool.size=2G
spark.executor.memoryOverhead=2G
spark.plugins=com.nvidia.spark.SQLPlugin
spark.executor.extraJavaOptions='-Dai.rapids.cudf.prefer-pinned=true'
spark.executor.resource.gpu.discoveryScript=/usr/lib/spark/scripts/gpu/getGpusResources.sh # this match the location of discovery script
spark.sql.files.maxPartitionBytes=512m
Yarn Scheduler config - /etc/hadoop/conf/capacity-scheduler.xml
:
<configuration>
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>
</configuration>
Yarn config - /etc/hadoop/conf/yarn-site.xml
:
<configuration>
<property>
<name>yarn.nodemanager.resource-plugins</name>
<value>yarn.io/gpu</value>
</property>
<property>
<name>yarn.resource-types</name>
<value>yarn.io/gpu</value>
</property>
<property>
<name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name>
<value>auto</value>
</property>
<property>
<name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name>
<value>/usr/bin</value>
</property>
<property>
<name>yarn.nodemanager.linux-container-executor.cgroups.mount</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.linux-container-executor.cgroups.mount-path</name>
<value>/sys/fs/cgroup</value>
</property>
<property>
<name>yarn.nodemanager.linux-container-executor.cgroups.hierarchy</name>
<value>yarn</value>
</property>
<property>
<name>yarn.nodemanager.container-executor.class</name>
<value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
</property>
<property>
<name>yarn.nodemanager.linux-container-executor.group</name>
<value>yarn</value>
</property>
</configuration>
/etc/hadoop/conf/container-executor.cfg
- user yarn as service account:
yarn.nodemanager.linux-container-executor.group=yarn
#--Original container-exectuor.cfg Content--
[gpu]
module.enabled=true
[cgroups]
root=/sys/fs/cgroup
yarn-hierarchy=yarn
Need to share node manager local dir to all user, run below in bash:
chmod a+rwx -R /sys/fs/cgroup/cpu,cpuacct
chmod a+rwx -R /sys/fs/cgroup/devices
local_dirs=$(bdconfig get_property_value \
--configuration_file /etc/hadoop/conf/yarn-site.xml \
--name yarn.nodemanager.local-dirs 2>/dev/null)
mod_local_dirs=${local_dirs//\,/ }
chmod a+rwx -R ${mod_local_dirs}
In the end, restart node manager and resource manager service:
On all workers:
sudo systemctl restart hadoop-yarn-nodemanager.service
On all masters:
sudo systemctl restart hadoop-yarn-resourcemanager.service
If cgroup is mounted on tmpfs and a node is rebooted, the cgroup directory permission gets reverted. Check the cgroup documentation for your platform for more details.
Below is one example of how this can be handled:
Update the cgroup permissions:
chmod a+rwx -R /sys/fs/cgroup/cpu,cpuacct
chmod a+rwx -R /sys/fs/cgroup/devices
Or the operation can be added in the systemd scripts:
Create mountCgroup scripts:
sudo bash -c "cat >/etc/systemd/system/mountCgroup.service" <<EOF
[Unit]
Description=startup
[Service]
ExecStart=/etc/mountCgroup.sh
Type=oneshot
[Install]
WantedBy=multi-user.target
EOF
sudo bash -c "cat >/etc/mountCgroup.sh" <<EOF
#!/bin/sh
chmod a+rwx -R /sys/fs/cgroup/cpu,cpuacct
chmod a+rwx -R /sys/fs/cgroup/devices
EOF
sudo chmod 644 /etc/systemd/system/mountCgroup.service
sudo chmod 655 /etc/mountCgroup.sh
Then start the mountCgroup service:
systemctl enable mountCgroup.service
systemctl start mountCgroup.service