Architecture Overview
The NVIDIA container stack is architected so that it can be targeted to support any container runtime in the ecosystem. The components of the stack include:
The NVIDIA Container Runtime (
nvidia-container-runtime
)The NVIDIA Container Runtime Hook (
nvidia-container-toolkit
/nvidia-container-runtime-hook
)The NVIDIA Container Library and CLI (
libnvidia-container1
,nvidia-container-cli
)
The components of the NVIDIA container stack are packaged as the NVIDIA Container Toolkit.
How these components are used depends on the container runtime being used. For docker
or containerd
, the NVIDIA Container Runtime (nvidia-container-runtime
) is
configured as an OCI-compliant runtime, with the flow through the various components is shown in the following diagram:
The flow through components for cri-o
and lxc
are shown in the following diagram. It should be noted that in this
case the NVIDIA Container Runtime component is not required.
Let’s take a brief look at each of the components in the NVIDIA container stack, starting with the lowest level component and working up
Components and Packages
The main packages of the NVIDIA Container Toolkit are:
nvidia-container-toolkit
nvidia-container-toolkit-base
libnvidia-container-tools
libnvidia-container1
With the dependencies between these packages shown below:
├─ nvidia-container-toolkit (version)
│ ├─ libnvidia-container-tools (>= version)
│ └─ nvidia-container-toolkit-base (version)
│
├─ libnvidia-container-tools (version)
│ └─ libnvidia-container1 (>= version)
└─ libnvidia-container1 (version)
where version
is used to represent the NVIDIA Container Toolkit version.
Note
In the past the nvidia-docker2
and nvidia-container-runtime
packages were also discussed as part of the NVIDIA container stack.
These packages should be considered deprecated as their functionality has been merged with the nvidia-container-toolkit
package.
The packages may still be available to introduce dependencies on nvidia-container-toolkit
and ensure that older workflows continue to function.
For more information on these packages, see the documentation archive for version older than v1.12.0
.
The NVIDIA Container Library and CLI
These components are packaged as the libnvidia-container-tools
and libnvidia-container1
packages, respectively.
These components provide a library and a simple CLI utility to automatically configure GNU/Linux containers leveraging NVIDIA GPUs. The implementation relies on kernel primitives and is designed to be agnostic of the container runtime.
libnvidia-container
provides a well-defined API and a wrapper CLI (called nvidia-container-cli
) that different runtimes can invoke to
inject NVIDIA GPU support into their containers.
The NVIDIA Container Runtime Hook
This component is included in the nvidia-container-toolkit
package.
This component includes an executable that implements the interface required by a runC
prestart
hook. This script is invoked by runC
after a container has been created, but before it has been started, and is given access to the config.json
associated with the container
(e.g. this config.json ). It then takes
information contained in the config.json
and uses it to invoke the nvidia-container-cli
CLI with an appropriate set of flags. One of the
most important flags being which specific GPU devices should be injected into the container.
The NVIDIA Container Runtime
This component is included in the nvidia-container-toolkit-base
package.
This component used to be a complete fork of runC
with NVIDIA-specific code injected into it. Since 2019, it is a thin wrapper around the native
runC
installed on the host system. nvidia-container-runtime
takes a runC
spec as input, injects the NVIDIA Container Runtime Hook as
a prestart
hook into it, and then calls out to the native runC
, passing it the modified runC
spec with that hook set.
For versions of the NVIDIA Container Runtime from v1.12.0
, this runtime also performs additional modifications to the OCI runtime spec to inject
specific devices and mounts not handled by the NVIDIA Container CLI.
It’s important to note that this component is not necessarily specific to docker (but it is specific to runC
).
The NVIDIA Container Toolkit CLI
This component is included in the nvidia-container-toolkit-base
package.
This component is a CLI that includes a number of utilities for interacting with the NVIDIA Container Toolkit. This functionality includes configuring
runtimes such as docker
for use with the NVIDIA Container Toolkit or generating Container Device Interface (CDI) specifications.
Which package should I use then?
Installing the nvidia-container-toolkit
package is sufficient for all use cases. This
package is continuously being enhanced with additional functionality and tools that simplify working with containers and
NVIDIA devices.
To use Kubernetes with Docker, you need to configure the Docker daemon.json
to include
a reference to the NVIDIA Container Runtime and set this runtime as the default. The NVIDIA Container Toolkit contains a utility to update this file
as highlighted in the docker
-specific installation instructions.
See the Installing the NVIDIA Container Toolkit for more information on installing the NVIDIA Container Toolkit on various Linux distributions.
Package Repository
The packages for the various components listed above are available in the gh-pages
branch of the GitHub repos of these projects. This is particularly
useful for air-gapped deployments that may want to get access to the actual packages (.deb
and .rpm
) to support offline installs.
For the different components:
nvidia-container-toolkit
https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/
libnvidia-container
https://github.com/NVIDIA/libnvidia-container/tree/gh-pages/
Note
As of the release of version 1.6.0
of the NVIDIA Container Toolkit the packages for all components are
published to the libnvidia-container
repository <https://nvidia.github.io/libnvidia-container/>
listed above. For older package versions please see the documentation archives.
Releases of the software are also hosted on experimental
branch of the repository and are graduated to stable
after test/validation. To get access to the latest
experimental
features of the NVIDIA Container Toolkit, you may need to add the experimental
branch to the apt
or yum
repository listing. The installation instructions
include information on how to add these repository listings for the package manager.