.. # Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # * Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # * Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # * Neither the name of NVIDIA CORPORATION nor the names of its # contributors may be used to endorse or promote products derived # from this software without specific prior written permission. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR # PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, # EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, # PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR # PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY # OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. .. _section-building: Building ======== The TensorRT Inference Server can be built in two ways: * Build using Docker and the TensorFlow and PyTorch containers from `NVIDIA GPU Cloud (NGC) `_. Before building you must install Docker and nvidia-docker and login to the NGC registry by following the instructions in :ref:`section-installing-prebuilt-containers`. * Build using CMake and the dependencies (for example, TensorFlow or TensorRT library) that you build or install yourself. .. _section-building-the-server-with-docker: Building the Server with Docker ------------------------------- To build a release version of the TensorRT Inference Server container, change directory to the root of the repo and checkout the release version of the branch that you want to build (or the master branch if you want to build the under-development version):: $ git checkout r19.07 Then use docker to build:: $ docker build --pull -t tensorrtserver . Incremental Builds with Docker ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For typical development you will want to run the *build* container with your local repo’s source files mounted so that your local changes can be incrementally built. This is done by first building the *tensorrtserver_build* container:: $ docker build --pull -t tensorrtserver_build --target trtserver_build . By mounting /path/to/tensorrtserver/src into the container at /workspace/src, changes to your local repo will be reflected in the container:: $ nvidia-docker run -it --rm -v/path/to/tensorrtserver/src:/workspace/src tensorrtserver_build Within the container you can perform an incremental server build with:: # cd /workspace/builddir # make -j16 trtis The first build attempt may fail with a "protobuf_generate_cpp" error. If that happens simply run the make command again and the build should succeed. When the build completes the binary, libraries and headers can be found in trtis/install. To overwrite the existing versions:: # cp trtis/install/bin/trtserver /opt/tensorrtserver/bin/. # cp trtis/install/lib/libtrtserver.so /opt/tensorrtserver/lib/. You can reconfigure the build by running *cmake* as described in :ref:`section-building-the-server-with-cmake`. .. _section-building-the-server-with-cmake: Building the Server with CMake ------------------------------ To build a release version of the TensorRT Inference Server with CMake, change directory to the root of the repo and checkout the release version of the branch that you want to build (or the master branch if you want to build the under-development version):: $ git checkout r19.07 Next you must build or install each framework backend you want to enable in the inference server, configure the inference server to enable the desired features, and finally build the server. .. _section-cmake-dependencies: Dependencies ^^^^^^^^^^^^ To include GPU support in the inference server you must install the necessary CUDA libraries. Similarly, to include support for a particular framework backend, you must build the appropriate libraries for that framework and make them available to the inference server build. In general, the Dockerfile build steps guide how each of these frameworks can be built for use in the interence server. CUDA, cuBLAS, cuDNN ................... For the inference server to support NVIDIA GPUs you must install CUDA, cuBLAS and cuDNN. These libraries must be installed on system include and library paths so that they are available for the CMake build. The version of the libraries used in the Dockerfile build can be found in the `Framework Containers Support Matrix `_. For a given version of the inference server you can attempt to build with non-supported versions of the libraries but you may have build or execution issues since non-supported versions are not tested. Once you have CUDA, cuBLAS and cuDNN installed you can enable GPUs with the CMake option -DTRTIS_ENABLE_GPU=ON as described below. TensorRT ........ The TensorRT includes and libraries must be installed on system include and library paths so that they are available for the CMake build. The version of TensorRT used in the Dockerfile build can be found in the `Framework Containers Support Matrix `_. For a given version of the inference server you can attempt to build with non-supported versions of TensorRT but you may have build or execution issues since non-supported versions are not tested. Once you have TensorRT installed you can enable the TensorRT backend in the inference server with the CMake option -DTRTIS_ENABLE_TENSORRT=ON as described below. You must also specify -DTRTIS_ENABLE_GPU=ON because TensorRT requires GPU support. TensorFlow .......... The version of TensorFlow used in the Dockerfile build can be found in the `Framework Containers Support Matrix `_. The trtserver_tf section of the Dockerfile shows how to build the required TensorFlow libary from the `NGC `_ TensorFlow container. You can build and install a different version of the TensorFlow library but you must build with the equivalent options indicated by the patches used in the Dockerfile, and you must include tensorflow_backend_tf.cc and tensorflow_backend_tf.h. The patch to tensorflow/BUILD and the build options shown in nvbuildopts cause TensorFlow backend to be built into a single library, libtensorflow_cc.so, that includes all the functionality required by the inference server. Once you have the TensorFlow library built and installed you can enable the TensorFlow backend in the inference server with the CMake option -DTRTIS_ENABLE_TENSORRT=ON as described below. You must also specify -DTRTIS_ENABLE_GPU=ON because TensorRT requires GPU support. You can install the TensorFlow library in a system library path or you can specify the path with the CMake option TRTIS_EXTRA_LIB_PATHS. Multiple paths can be specified by separating them with a semicolon, for example, -DTRTIS_EXTRA_LIB_PATHS="/path/a;/path/b". ONNX Runtime ............ The version of the ONNX Runtime used in the Dockerfile build can be found in the trtserver_onnx section of the Dockerfile. That section also details the steps that can be used to build the backend. You can attempt to build a different version of the ONNX Runtime or use a different build process but you may have build or execution issues. Your build should produce the ONNX Runtime library, libonnxruntime.so. You can enable the ONNX Runtime backend in the inference server with the CMake option -DTRTIS_ENABLE_ONNXRUNTIME=ON as described below. You can install the library in a system library path or you can specify the path with the CMake option TRTIS_EXTRA_LIB_PATHS. Multiple paths can be specified by separating them with a semicolon, for example, -DTRTIS_EXTRA_LIB_PATHS="/path/a;/path/b". You must also provide the path to the ONNX Runtime headers using the -DTRTIS_ONNXRUNTIME_INCLUDE_PATHS option. Multiple paths can be specified by separating them with a semicolon. PyTorch and Caffe2 .................. The version of PyTorch and Caffe2 used in the Dockerfile build can be found in the `Framework Containers Support Matrix `_. The trtserver_caffe2 section of the Dockerfile shows how to build the required PyTorch and Caffe2 libaries from the `NGC `_ PyTorch container. You can build and install a different version of the libraries but if you want to enable the Caffe2 backend you must include netdef_backend_c2.cc and netdef_backend.c2.h in the build, as shown in the Dockerfile. Once you have the libraries built and installed you can enable the PyTorch backend in the inference server with the CMake option -DTRTIS_ENABLE_PYTORCH=ON and the Caffe2 backend with -DTRTIS_ENABLE_CAFFE2=ON as described below. You can install the PyTorch library, libtorch.so, and all the required Caffe2 libraries (see Dockerfile) in a system library path or you can specify the path with the CMake option TRTIS_EXTRA_LIB_PATHS. Multiple paths can be specified by separating them with a semicolon, for example, -DTRTIS_EXTRA_LIB_PATHS="/path/a;/path/b". For the PyTorch backend you must also provide the path to the PyTorch headers using the -DTRTIS_PYTORCH_INCLUDE_PATHS option. Multiple paths can be specified by separating them with a semicolon. Configure Inference Server ^^^^^^^^^^^^^^^^^^^^^^^^^^ Use cmake to configure the TensorRT Inference Server:: $ mkdir builddir $ cd builddir $ cmake -D ... -D ../build The following options are used to enable and disable the different backends. To enable a backend set the corresponding option to ON, for example -DTRTIS_ENABLE_TENSORRT=ON. To disable a backend set the corresponding option to OFF, for example -DTRTIS_ENABLE_TENSORRT=OFF. By default no backends are enabled. See the section on :ref:`dependencies` for information on additional requirements for enabling a backend. * **TRTIS_ENABLE_TENSORRT**: Use -DTRTIS_ENABLE_TENSORRT=ON to enable the TensorRT backend. The TensorRT libraries must be on your library path or you must add the path to TRTIS_EXTRA_LIB_PATHS. * **TRTIS_ENABLE_TENSORFLOW**: Use -DTRTIS_ENABLE_TENSORFLOW=ON to enable the TensorFlow backend. The TensorFlow library libtensorflow_cc.so must be built as described above and must be on your library path or you must add the path to TRTIS_EXTRA_LIB_PATHS. * **TRTIS_ENABLE_ONNXRUNTIME**: Use -DTRTIS_ENABLE_ONNXRUNTIME=ON to enable the OnnxRuntime backend. The library libonnxruntime.so must be built as described above and must be on your library path or you must add the path to TRTIS_EXTRA_LIB_PATHS. * **TRTIS_ENABLE_PYTORCH**: Use -DTRTIS_ENABLE_PYTORCH=ON to enable the PyTorch backend. The library libtorch.so must be built as described above and must be on your library path or you must add the path to TRTIS_EXTRA_LIB_PATHS. * **TRTIS_ENABLE_CAFFE2**: Use -DTRTIS_ENABLE_CAFFE2=ON to enable the Caffe2 backend. The library libcaffe2.so and all the other required libraries must be built as described above and must be on your library path or you must add the path to TRTIS_EXTRA_LIB_PATHS. * **TRTIS_ENABLE_CUSTOM**: Use -DTRTIS_ENABLE_CUSTOM=ON to enable support for custom backends. See :ref:`section-building-a-custom-backend` for information on how to build a custom backend. These additional options may be specified: * **TRTIS_ENABLE_METRICS**: By default the inference server reports :ref:`Prometheus metrics` on an HTTP endpoint. Use -DTRTIS_ENABLE_METRICS=OFF to disable. * **TRTIS_ENABLE_GPU**: By default the inference server supports NVIDIA GPUs. Use -DTRTIS_ENABLE_GPU=OFF to disable GPU support. When GPUs are disable the inference server will :ref:`run models on CPU when possible`. Build Inference Server ^^^^^^^^^^^^^^^^^^^^^^ After configuring, build the inference server with make:: $ cd builddir $ make -j16 trtis The first build attempt may fail with a "protobuf_generate_cpp" error. If that happens simply run the make command again and the build should succeed. When the build completes the binary, libraries and headers can be found in trtis/install. .. _section-building-a-custom-backend: Building A Custom Backend ------------------------- The source repository contains several example custom backends in the `src/custom directory `_. These custom backends are built using CMake:: $ mkdir builddir $ cd builddir $ cmake ../build $ make -j16 trtis-custom-backends The first build attempt may fail with a "protobuf_generate_cpp" error. If that happens simply run the make command again and the build should succeed. When the build completes the custom backend libraries can be found in trtis-custom-backends/install. A custom backend is not built-into the inference server. Instead if is built as a separate shared library that the inference server dynamically loads when the model repository contains a model that uses that custom backend. There are a couple of ways you can build your custom backend into a shared libary, as described in the following sections. Build Using CMake ^^^^^^^^^^^^^^^^^ One way to build your own custom backend is to use the inference server's CMake build. Simply copy and modify one of the existing example custom backends and then build your backend using CMake. You can then use the resulting shared library in your model repository as described in :ref:`section-custom-backends`. Build Using Custom Backend SDK ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The custom backend SDK includes all the header files you need to build your custom backend as well as a static library which provides all the model configuration and protobuf utility functions you will need. You can either build the custom backend SDK yourself using Dockerfile.custombackend:: docker build -t tensorrtserver_cbe -f Dockerfile.custombackend . Or you can download a pre-build version of the SDK from the `GitHub release page `_ corresponding to the release you are interested in. The custom backend SDK is found in the "Assets" section of the release page in a tar file named after the version of the release and the OS, for example, v1.2.0_ubuntu1604.custombackend.tar.gz. Once you have the SDK you can use the include/ directory and static library when you compile your custom backend source code. For example, the SDK includes the source for the *param* custom backend in src/param.cc. You can create a custom backend from that source using the following command:: g++ -fpic -shared -std=c++11 -o libparam.so custom-backend-sdk/src/param.cc -Icustom-backend-sdk/include custom-backend-sdk/lib/libcustombackend.a Building the Client Libraries and Examples ------------------------------------------ The provided Dockerfile.client and CMake support can be used to build the client libraries and examples. .. include:: client.rst :start-after: build-client-begin-marker-do-not-remove :end-before: build-client-end-marker-do-not-remove Building the Documentation -------------------------- The inference server documentation is found in the docs/ directory and is based on `Sphinx `_. `Doxygen `_ integrated with `Exhale `_ is used for C++ API docuementation. To build the docs install the required dependencies:: $ apt-get update $ apt-get install -y --no-install-recommends doxygen $ pip install --upgrade sphinx sphinx-rtd-theme nbsphinx exhale To get the Python client library API docs the TensorRT Inference Server Python package must be installed:: $ pip install --upgrade tensorrtserver-*.whl Then use Sphinx to build the documentation into the build/html directory:: $ cd docs $ make clean html