Creating an Application Container#
Application Container Dockerfile#
The nvidia-driver-daemonset houses an nvidia-fs-sidecar container, which loads the NVIDIA GPU driver and GDS kernel modules into the host, which can then be accessed by privileged containers running in other pods. In order to launch your own application container and utilize GDS, the user space libraries must be installed in your application container. This can be done easily by using a CUDA container image as the base image in the application container Dockerfile:
1FROM nvcr.io/nvidia/cuda:11.7.1-devel-ubuntu20.04
2RUN apt-get update && apt-get install -y libcufile-dev
If the full CUDA base container image is not desired, an Ubuntu base image can be used and the following commands can be added to the Dockerfile to install the CUDA toolkit and libcufile user space libraries without installing CUDA in its entirety:
1FROM ubuntu:20.04
2RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb && \
3 dpkg -i cuda-keyring_1.0-1_all.deb && \
4 apt-get update && \
5 apt-get -y install cuda-toolkit-<major>-<minor> libcufile-dev
Where major
and minor
correspond to CUDA version. Eg. CUDA 11.8 would have major = 11 and minor = 8
Kubernetes Pod YAML File#
After the application container is built, a YAML file must be created to describe the specification of a pod which will be launched in Kubernetes. If your application container image is hosted locally with Docker, then you must run the following commands to create a private Docker registry and push the image to it so Kubernetes can access it:
1sudo docker run -d -p 5000:5000 --restart=always --name registry registry:2
2sudo docker tag <your image name>:<your image version> localhost:5000/<your image name>:<your image version>
3sudo docker push localhost:5000/<your image name>:<your image version>
The following YAML example gives the minimum requirements of what must be in the YAML (aside from name, image location, and command to run):
1apiVersion: v1
2kind: Pod
3metadata:
4name: gds-application
5spec:
6hostNetwork: true
7hostIPC: true
8containers:
9- name: gds-application
10 image: <your application image>
11 imagePullPolicy: Always
12 command: [ "/bin/bash", "-c", "--" ]
13 args: [ "while true; do sleep 30; done;" ]
14 securityContext:
15 privileged: true
16 volumeMounts:
17 - name: udev
18 mountPath: /run/udev
19 volumeMounts:
20 - name: kernel-config
21 mountPath: /sys/kernel/config
22 volumeMounts:
23 - name: dev
24 mountPath: /run/dev
25 volumeMounts:
26 - name: sys
27 mountPath: /sys
28 volumeMounts:
29 - name: results
30 mountPath: /results
31 volumeMounts:
32 - name: lib
33 mountPath: /lib/modules
34volumes:
35 - name: udev
36 hostPath:
37 path: /run/udev
38 - name: kernel-config
39 hostPath:
40 path: /sys/kernel/config
41 - name: dev
42 hostPath:
43 path: /run/dev
44 - name: sys
45 hostPath:
46 path: /sys
47 - name: results
48 hostPath:
49 path: /results
50 - name: lib
51 hostPath:
52 path: /lib/modules