.. # Copyright (c) 2018-2020, NVIDIA CORPORATION. All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # * Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # * Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # * Neither the name of NVIDIA CORPORATION nor the names of its # contributors may be used to endorse or promote products derived # from this software without specific prior written permission. # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR # PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, # EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, # PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR # PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY # OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. .. _section-running-triton: Running Triton ============== For best performance the Triton Inference Server should be run on a system that contains Docker, nvidia-docker, CUDA and one or more supported GPUs, as explained in :ref:`section-running-triton-with-gpu`. Triton can also be run on non-CUDA, non-GPU systems as described in :ref:`section-running-triton-without-gpu`. If you :ref:`build Triton outside of Docker `, you can then run Triton without Docker, as explained in :ref:`section-running-triton-without-docker`. .. _section-example-model-repository: Example Model Repository ------------------------ Before running the Triton, you must first set up a model repository containing the models that the server will make available for inferencing. An example model repository containing a Caffe2 ResNet50, a TensorFlow Inception model and an ONNX densenet model. The example repository also contains two simple TensorFlow GraphDef models that are used by the :ref:`example client applications `. These models are provided in the `docs/examples/model_repository `_ directory. Before using the example model repository you must fetch any missing model definition files from their public model zoos. Be sure to checkout the release version of the branch that corresponds to the server you are using (or the master branch if you are using a server build from master):: $ git checkout r20.09 $ cd docs/examples $ ./fetch_models.sh An example ensemble model repository is also provided in the `docs/examples/ensemble_model_repository `_ directory. It contains a custom image preprocess model, Caffe2 ResNet50, and an ensemble model that are used by the :ref:`ensemble example `. Before using the example ensemble model repository, in addition to fetching public model definition files as mentioned above, you must build the custom backend for the custom image preprocess model (see :ref:`section-building-a-custom-backend` for instructions). Also note that although ensemble models are fully specified in their model configuration, empty version directories are required for them to be recognized as valid model directories:: $ cd docs/examples $ mkdir -p ensemble_model_repository/preprocess_resnet50_ensemble/1 .. _section-running-triton-with-gpu: Running Triton On A System With A GPU ------------------------------------- Before running Triton you must first set up a model repository containing the models that you want to be available for inferencing. Section :ref:`section-model-repository` describes how to create your own model repository. You can also follow the steps above in :ref:`section-example-model-repository` to set up an example model repository. Assuming the model repository is available in /path/to/model/repository, the following command runs the container you pulled from NGC or built locally:: $ docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/path/to/model/repository:/models tritonserver --model-repository=/models Where ** will be something like **nvcr.io/nvidia/tritonserver:20.09-py3** if you :ref:`pulled the container from the NGC registry `, or **tritonserver** if you :ref:`built it from source `. The docker -v option maps /path/to/model/repository on the host into the container at /models, and the -\\-model-repository option to Triton is used to point to /models as the model repository. The -p flags expose the container ports where Triton listens for HTTP requests (port 8000), listens for GRPC requests (port 8001), and reports Prometheus metrics (port 8002). You may also want to use the -\\-shm-size and -\\-ulimit flags to improve the server's performance depending on how you are using Triton. For more information on the Prometheus metrics provided by Triton see :ref:`section-metrics`. .. _section-running-triton-without-gpu: Running Triton On A System Without A GPU ---------------------------------------- On a system without GPUs, Triton should be run without using the -\\-gpus flag to docker, but is otherwise identical to what is described in :ref:`section-running-triton-with-gpu`:: $ docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/path/to/model/repository:/models tritonserver --model-repository=/models Because the -\\-gpus flag is not used, a GPU is not available and Triton will therefore be unable to load any model configuration that requires a GPU or that specifies a GPU instance by an :ref:`instance-group ` configuration. .. _section-running-triton-without-docker: Running Triton Without Docker ----------------------------- After :ref:`building Triton outside of Docker `, the *tritonserver* executable will be in builddir/server/install/bin and the required shared libraries will be in builddir/server/install/lib. The *tritonserver* executable and libraries are configured to be installed and executed from the /opt/tritonserver directory, so copy builddir/server/install/* to /opt/tritonserver/. . Then execute *tritonserver* with the desired arguments:: $ /opt/tritonserver/bin/tritonserver --model-repository=/models .. _section-checking-inference-server-status: Checking Triton Status ---------------------- The simplest way to verify that the inference server is running and ready to perform inference is to use the server *ready* API to query the server’s status. From the host system use curl to access the HTTP endpoint that indicates server status. For example:: $ curl -v localhost:8000/v2/health/ready ... < HTTP/1.1 200 OK < Content-Length: 0 < Content-Type: text/plain The HTTP request returns status 200 if Triton is ready and non-200 if it is not ready. Triton provides several additional ready, status and metadata APIs. See :ref:`section-http-grpc-protocol` for more information on the HTTP/REST and GRPC protocols that are supported by Triton.