Customize Triton Container
Customize Triton Container#
Two Docker images are available from NVIDIA GPU Cloud (NGC) that make it possible to easily construct customized versions of Triton. By customizing Triton you can significantly reduce the size of the Triton image by removing functionality that you don’t require.
Currently the customization is limited as described below but future releases will increase the amount of customization that is available. It is also possible to build Triton from source to get more exact customization.
Use the compose.py script#
compose.py script can be found in the server repository.
Simply clone the repository and run
compose.py to create a custom container.
Note: Created container version will depend on the branch that was cloned.
For example branch r23.02
should be used to create a image based on the NGC 23.02 Triton release.
--repoagent options that allow you to
specify which backends and repository agents to include in the custom image.
For example, the following creates a new docker image that
contains only the TensorFlow 1 and TensorFlow 2 backends and the checksum
python3 compose.py --backend tensorflow1 --backend tensorflow2 --repoagent checksum
will provide a container
tritonserver locally. You can access the container with
$ docker run -it tritonserver:latest
compose.py is run on release versions
r23.02 and earlier,
the resulting container will have DCGM version 2.2.3 installed.
This may result in different GPU statistic reporting behavior.
Compose a specific version of Triton#
compose.py requires two containers: a
min container which is the
base the compose container is built from and a
full container from which the
script will extract components. The version of the
is determined by the branch of Triton
compose.py is on.
For example, running
python3 compose.py --backend tensorflow1 --repoagent checksum
on branch r23.02 pulls:
Alternatively, users can specify the version of Triton container to pull from any branch by either:
--container-version <container version>to branch
python3 compose.py --backend tensorflow1 --repoagent checksum --container-version 23.02
--image min,<min container image name> --image full,<full container image name>. The user is responsible for specifying compatible
python3 compose.py --backend tensorflow1 --repoagent checksum --image min,nvcr.io/nvidia/tritonserver:23.02-py3-min --image full,nvcr.io/nvidia/tritonserver:23.02-py3
Method 1 and 2 will result in the same composed container. Furthermore,
--image flag overrides the
--container-version flag when both are specified.
CPU-only container composition#
CPU-only containers are not yet available for customization. Please see build documentation for instructions to build a full CPU-only container. When including TensorFlow or PyTorch backends in the composed container, an additional
gpu-min container is needed
since this container provided the CUDA stubs and runtime dependencies which are not provided in the CPU only min container.
Build it yourself#
If you would like to do what
compose.py is doing under the hood yourself, you can run
compose.py with the
--dry-run option and then modify the
Dockerfile.compose file to satisfy your needs.
Triton with Unsupported and Custom Backends#
You can create and build your own Triton
result of that build should be a directory containing your backend
shared library and any additional files required by the
backend. Assuming your backend is called “mybackend” and that the
directory is “./mybackend”, adding the following to the Dockerfile
created will create a Triton image that contains all the supported Triton backends plus your
COPY ./mybackend /opt/tritonserver/backends/mybackend
You also need to install any additional dependencies required by your backend as part of the Dockerfile. Then use Docker to create the image.
$ docker build -t tritonserver_custom -f Dockerfile.compose .