Creating an Application Container#

Application Container Dockerfile#

The nvidia-driver-daemonset houses an nvidia-fs-sidecar container, which loads the NVIDIA GPU driver and GDS kernel modules into the host, which can then be accessed by privileged containers running in other pods. In order to launch your own application container and utilize GDS, the user space libraries must be installed in your application container. This can be done easily by using a CUDA container image as the base image in the application container Dockerfile:

1FROM nvcr.io/nvidia/cuda:11.7.1-devel-ubuntu20.04
2RUN  apt-get update && apt-get install -y libcufile-dev

If the full CUDA base container image is not desired, an Ubuntu base image can be used and the following commands can be added to the Dockerfile to install the CUDA toolkit and libcufile user space libraries without installing CUDA in its entirety:

1FROM ubuntu:20.04
2RUN  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb && \
3        dpkg -i cuda-keyring_1.0-1_all.deb && \
4        apt-get update && \
5        apt-get -y install cuda-toolkit-<major>-<minor> libcufile-dev

Where major and minor correspond to CUDA version. Eg. CUDA 11.8 would have major = 11 and minor = 8

Kubernetes Pod YAML File#

After the application container is built, a YAML file must be created to describe the specification of a pod which will be launched in Kubernetes. If your application container image is hosted locally with Docker, then you must run the following commands to create a private Docker registry and push the image to it so Kubernetes can access it:

1sudo docker run -d -p 5000:5000 --restart=always --name registry registry:2
2sudo docker tag <your image name>:<your image version> localhost:5000/<your image name>:<your image version>
3sudo docker push localhost:5000/<your image name>:<your image version>

The following YAML example gives the minimum requirements of what must be in the YAML (aside from name, image location, and command to run):

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4name: gds-application
 5spec:
 6hostNetwork: true
 7hostIPC: true
 8containers:
 9- name: gds-application
10    image: <your application image>
11    imagePullPolicy: Always
12    command: [ "/bin/bash", "-c", "--" ]
13    args: [ "while true; do sleep 30; done;" ]
14    securityContext:
15    privileged: true
16    volumeMounts:
17    - name: udev
18        mountPath: /run/udev
19    volumeMounts:
20    - name: kernel-config
21        mountPath: /sys/kernel/config
22    volumeMounts:
23    - name: dev
24        mountPath: /run/dev
25    volumeMounts:
26    - name: sys
27        mountPath: /sys
28    volumeMounts:
29    - name: results
30        mountPath: /results
31    volumeMounts:
32    - name: lib
33        mountPath: /lib/modules
34volumes:
35    - name: udev
36    hostPath:
37        path: /run/udev
38    - name: kernel-config
39    hostPath:
40        path: /sys/kernel/config
41    - name: dev
42    hostPath:
43        path: /run/dev
44    - name: sys
45    hostPath:
46        path: /sys
47    - name: results
48    hostPath:
49        path: /results
50    - name: lib
51    hostPath:
52        path: /lib/modules