Creating an Application Container

The nvidia-driver-daemonset houses an nvidia-fs-sidecar container, which loads the NVIDIA GPU driver and GDS kernel modules into the host, which can then be accessed by privileged containers running in other pods. In order to launch your own application container and utilize GDS, the user space libraries must be installed in your application container. This can be done easily by using a CUDA container image as the base image in the application container Dockerfile:

Copy
Copied!
            

FROM nvcr.io/nvidia/cuda:11.7.1-devel-ubuntu20.04 RUN apt-get update && apt-get install -y libcufile-dev

If the full CUDA base container image is not desired, an Ubuntu base image can be used and the following commands can be added to the Dockerfile to install the CUDA toolkit and libcufile user space libraries without installing CUDA in its entirety:

Copy
Copied!
            

FROM ubuntu:20.04 RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb && \ dpkg -i cuda-keyring_1.0-1_all.deb && \ apt-get update && \ apt-get -y install cuda-toolkit-<major>-<minor> libcufile-dev

Where major and minor correspond to CUDA version. Eg. CUDA 11.8 would have major = 11 and minor = 8

After the application container is built, a YAML file must be created to describe the specification of a pod which will be launched in Kubernetes. If your application container image is hosted locally with Docker, then you must run the following commands to create a private Docker registry and push the image to it so Kubernetes can access it:

Copy
Copied!
            

sudo docker run -d -p 5000:5000 --restart=always --name registry registry:2 sudo docker tag <your image name>:<your image version> localhost:5000/<your image name>:<your image version> sudo docker push localhost:5000/<your image name>:<your image version>

The following YAML example gives the minimum requirements of what must be in the YAML (aside from name, image location, and command to run):

Copy
Copied!
            

apiVersion: v1 kind: Pod metadata: name: gds-application spec: hostNetwork: true hostIPC: true containers: - name: gds-application image: <your application image> imagePullPolicy: Always command: [ "/bin/bash", "-c", "--" ] args: [ "whiletrue;dosleep30;done;" ] securityContext: privileged: true volumeMounts: - name: udev mountPath: /run/udev volumeMounts: - name: kernel-config mountPath: /sys/kernel/config volumeMounts: - name: dev mountPath: /run/dev volumeMounts: - name: sys mountPath: /sys volumeMounts: - name: results mountPath: /results volumeMounts: - name: lib mountPath: /lib/modules volumes: - name: udev hostPath: path: /run/udev - name: kernel-config hostPath: path: /sys/kernel/config - name: dev hostPath: path: /run/dev - name: sys hostPath: path: /sys - name: results hostPath: path: /results - name: lib hostPath: path: /lib/modules

Previous Deployment Steps
Next Validate PCI Switch System Topology
© Copyright 2024, NVIDIA. Last updated on Apr 2, 2024.