Creating an Application Container#

Application Container Dockerfile#

The nvidia-driver-daemonset houses an nvidia-fs-sidecar container, which loads the NVIDIA GPU driver and GDS kernel modules into the host, which can then be accessed by privileged containers running in other pods. In order to launch your own application container and utilize GDS, the user space libraries must be installed in your application container. This can be done easily by using a CUDA container image as the base image in the application container Dockerfile:

FROM nvcr.io/nvidia/cuda:11.7.1-devel-ubuntu20.04
RUN  apt-get update && apt-get install -y libcufile-dev

If the full CUDA base container image is not desired, an Ubuntu base image can be used and the following commands can be added to the Dockerfile to install the CUDA toolkit and libcufile user space libraries without installing CUDA in its entirety:

FROM ubuntu:20.04
RUN  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb && \
        dpkg -i cuda-keyring_1.0-1_all.deb && \
        apt-get update && \
        apt-get -y install cuda-toolkit-<major>-<minor> libcufile-dev

Where major and minor correspond to CUDA version. Eg. CUDA 11.8 would have major = 11 and minor = 8

Kubernetes Pod YAML File#

After the application container is built, a YAML file must be created to describe the specification of a pod which will be launched in Kubernetes. If your application container image is hosted locally with Docker, then you must run the following commands to create a private Docker registry and push the image to it so Kubernetes can access it:

sudo docker run -d -p 5000:5000 --restart=always --name registry registry:2
sudo docker tag <your image name>:<your image version> localhost:5000/<your image name>:<your image version>
sudo docker push localhost:5000/<your image name>:<your image version>

The following YAML example gives the minimum requirements of what must be in the YAML (aside from name, image location, and command to run):

apiVersion: v1
kind: Pod
metadata:
name: gds-application
spec:
hostNetwork: true
hostIPC: true
containers:
- name: gds-application
    image: <your application image>
    imagePullPolicy: Always
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]
    securityContext:
    privileged: true
    volumeMounts:
    - name: udev
        mountPath: /run/udev
    volumeMounts:
    - name: kernel-config
        mountPath: /sys/kernel/config
    volumeMounts:
    - name: dev
        mountPath: /run/dev
    volumeMounts:
    - name: sys
        mountPath: /sys
    volumeMounts:
    - name: results
        mountPath: /results
    volumeMounts:
    - name: lib
        mountPath: /lib/modules
volumes:
    - name: udev
    hostPath:
        path: /run/udev
    - name: kernel-config
    hostPath:
        path: /sys/kernel/config
    - name: dev
    hostPath:
        path: /run/dev
    - name: sys
    hostPath:
        path: /sys
    - name: results
    hostPath:
        path: /results
    - name: lib
    hostPath:
        path: /lib/modules