..
  # Copyright (c) 2018-2020, NVIDIA CORPORATION. All rights reserved.
  #
  # Redistribution and use in source and binary forms, with or without
  # modification, are permitted provided that the following conditions
  # are met:
  #  * Redistributions of source code must retain the above copyright
  #    notice, this list of conditions and the following disclaimer.
  #  * Redistributions in binary form must reproduce the above copyright
  #    notice, this list of conditions and the following disclaimer in the
  #    documentation and/or other materials provided with the distribution.
  #  * Neither the name of NVIDIA CORPORATION nor the names of its
  #    contributors may be used to endorse or promote products derived
  #    from this software without specific prior written permission.
  #
  # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
  # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
  # PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
  # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
  # EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
  # PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
  # PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
  # OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

.. _section-client-examples:

Client Examples
===============

After you have Triton :ref:`running <section-running-triton>` you can
send inference and other requests to it using the :ref:`HTTP/REST or
GRPC protocols <section-http-and-grpc-api>` from your *client*
application.

To simplify communication with Triton, the Triton project provides C++
and Python :ref:`client libraries <section-client-libraries>`, and
several example applications that show how to use these libraries.

* C++ and Python versions of *image\_client*, an example application
  that uses the C++ or Python client library to execute image
  classification models on Triton. See
  :ref:`section-image-classification-example`.

* Several simple `C++ examples
  <https://github.com/NVIDIA/triton-inference-server/tree/master/src/clients/c%2B%2B/examples>`_
  show how to use the C++ library to communicate with Triton to
  perform inferencing and other task. The C++ examples demonstrating
  the HTTP/REST client are named with a *simple_http_* prefix and the
  examples demonstrating the GRPC client are named with a
  *simple_grpc_* prefix.

* Several simple `Python examples
  <https://github.com/NVIDIA/triton-inference-server/tree/master/src/clients/python/examples>`_
  show how to use the Python library to communicate with Triton to
  perform inferencing and other task. The Python examples
  demonstrating the HTTP/REST client are named with a *simple_http_*
  prefix and the examples demonstrating the GRPC client are named with
  a *simple_grpc_* prefix.

* A couple of Python examples that communicate with Triton using a
  Python GRPC API generated by the `protoc compiler
  <https://grpc.io/docs/guides/>`_. *grpc_client.py* is a simple
  example that shows simple API usage. *grpc\_image\_client.py* is
  functionally equivalent to *image\_client* but that uses a generated
  GRPC client stub to communicate with Triton.

* The `protoc compiler <https://grpc.io/docs/guides/>`_ can generate a
  GRPC API in a large number of programming languages. Examples for
  languages other than Python can be found in subdirectories of
  src/clients.

.. _section-getting-the-client-examples:

Getting the Client Examples
---------------------------

The provided Dockerfile.client and CMake support can be used to build
the examples, or the pre-built examples can be downloaded from GitHub
or a pre-built Docker image containing the client libraries from
`NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_.

Build Using Dockerfile
^^^^^^^^^^^^^^^^^^^^^^

To build the examples using Docker follow the description in
:ref:`section-client-libraries-build-using-dockerfile`.

After the build completes the tritonserver_client docker image will
contain the built client examples, and will also be configured with
all the dependencies required to run those examples within the
container. The easiest way to try the examples described in the
following sections is to run the client image with -\\-net=host so
that the client examples can access Triton running in its own
container. To be able to use system shared memory you need to run the
client and server image with -\\-ipc=host so that Triton can access
the system shared memory in the client container.  Additionally, to
create system shared memory regions that are larger than 64MB, the
-\\-shm-size flag is needed while running the client image. To be able
to use CUDA shared memory you need to use the appropriate Docker
arguments when running the client image. (see
:ref:`section-running-triton` for more information about running
Triton)::

  $ docker run -it --rm --net=host tritonserver_client

In the tritonserver_client image you can find the example
executables in /workspace/install/bin, and the
Python examples in /workspace/install/python.

Build Using CMake
^^^^^^^^^^^^^^^^^

To build the examples using CMake follow the description in
:ref:`section-client-libraries-build-using-cmake`.

Ubuntu 18.04
............

When the build completes, the examples can be found in
client/install. To use the examples, you need to include the path to
the client library in environment variable "LD_LIBRARY_PATH". By
default it is /path/to/tritonserver/repo/build/client/install/lib. In
addition to that, you also need to install the client Python packages
and other packages required by the examples::

  $ pip3 install --upgrade client/install/python/triton*.whl numpy pillow

Windows 10
..........

When the build completes the examples can be found in
client/install. The C++ client examples will not be generated
as those examples have not yet been ported to Windows. However, you
can use the Python examples to test if the build is successful. To use
the Python examples, you need to install the Python wheels::

  > pip3 install --upgrade client/install/python/triton*.whl numpy pillow

Download From GitHub
^^^^^^^^^^^^^^^^^^^^

To download the examples follow the description in
:ref:`section-client-libraries-download-from-github`.

To use the C++ examples you must install some dependencies. For Ubuntu
18.04::

  $ apt-get update
  $ apt-get install curl libcurl4-openssl-dev

The Python examples require that you additionally install the wheel
files and some other dependencies::

  $ apt-get install python python-pip
  $ pip3 install --user --upgrade python/triton*.whl numpy pillow

The C++ image_client example uses OpenCV for image manipulation so for
that example you must install the following::

  $ apt-get install libopencv-dev libopencv-core-dev

Download Docker Image From NGC
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To download the Docker image follow the description in
:ref:`section-client-libraries-download-from-ngc`.

The docker image contains the built client examples and will also be
configured with all the dependencies required to run those examples
within the container. The easiest way to try the examples described in
the following sections is to run the client image with -\\-net=host so
that the client examples can access Triton running in its own
container. To be able to use system shared memory you need to run the
client and server image with -\\-ipc=host so that Triton
can access the system shared memory in the client container.
Additionally, to create system shared memory regions that are larger
than 64MB, the -\\-shm-size flag is needed while running the client
image. To be able to use CUDA shared memory you need to use the
appropriate Docker arguments when running the client image. (see
:ref:`section-running-triton` for more information about running
Triton)::

  $ docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:<xx.yy>-py3-clientsdk

In the image you can find the example executables in
/workspace/install/bin, and the Python examples in
/workspace/install/python.

.. _section-simple-examples:

Simple Example Applications
---------------------------

This section describes several of the simple example applications and
the features that they illustrate.

String Datatype
^^^^^^^^^^^^^^^

Some frameworks support tensors where each element in the tensor is a
string (see :ref:`section-datatypes` for information on supported
datatypes).

String tensors are demonstrated in the C++ example applications
`simple\_http\_string\_infer\_client.cc and
simple\_grpc\_string\_infer\_client.cc
<https://github.com/NVIDIA/triton-inference-server/tree/master/src/clients/c%2B%2B/examples>`_.
String tensors are demonstrated in the Python example application
`simple\_http\_string\_infer\_client.py and
simple\_grpc\_string\_infer\_client.py
<https://github.com/NVIDIA/triton-inference-server/tree/master/src/clients/python/examples>`_.

System Shared Memory
^^^^^^^^^^^^^^^^^^^^

Using system shared memory to communicate tensors between the client
library and Triton can significantly improve performance in some
cases. Using system shared memory is demonstrated in the C++ example
applications `simple\_http\_shm\_client.cc and
simple\_grpc\_shm\_client.cc
<https://github.com/NVIDIA/triton-inference-server/tree/master/src/clients/c%2B%2B/examples>`_.
Using system shared memory is demonstrated in the Python example
application `simple\_http\_shm\_client.py and
simple\_grpc\_shm\_client.py
<https://github.com/NVIDIA/triton-inference-server/tree/master/src/clients/python/examples>`_.

Python does not have a standard way of allocating and accessing shared
memory so as an example a simple `system shared memory module
<https://github.com/NVIDIA/triton-inference-server/tree/master/src/clients/python/library/shared_memory>`_
is provided that can be used with the Python client library to create,
set and destroy system shared memory.

CUDA Shared Memory
^^^^^^^^^^^^^^^^^^

Using CUDA shared memory to communicate tensors between the client
library and Triton can significantly improve performance in some
cases. Using CUDA shared memory is demonstrated in the C++ example
applications `simple\_http\_cudashm\_client.cc and
simple\_grpc\_cudashm\_client.cc
<https://github.com/NVIDIA/triton-inference-server/tree/master/src/clients/c%2B%2B/examples>`_.
Using CUDA shared memory is demonstrated in the Python example
application `simple\_http\_cudashm\_client.py and
simple\_grpc\_cudashm\_client.py
<https://github.com/NVIDIA/triton-inference-server/tree/master/src/clients/python/examples>`_.

Python does not have a standard way of allocating and accessing shared
memory so as an example a simple `CUDA shared memory module
<https://github.com/NVIDIA/triton-inference-server/tree/master/src/clients/python/library/cuda_shared_memory>`_
is provided that can be used with the Python client library to create,
set and destroy CUDA shared memory.

.. _section-client-api-stateful-models:

Client API for Stateful Models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When performing inference using a :ref:`stateful model
<section-stateful-models>`, a client must identify which inference
requests belong to the same sequence and also when a sequence starts
and ends.

Each sequence is identified with a sequence ID that is provided when
an inference request is made. It is up to the clients to create a
unique sequence ID. For each sequence the first inference request
should be marked as the start of the sequence and the last inference
requests should be marked as the end of the sequence.

The use of sequence ID and start and end flags are demonstrated in the
C++ example applications
`simple\_http\_sequence\_stream\_infer\_client.cc and
simple\_grpc\_sequence\_stream\_infer\_client.cc
<https://github.com/NVIDIA/triton-inference-server/tree/master/src/clients/c%2B%2B/examples>`_.
The use of sequence ID and start and end flags are demonstrated in the
Python example application
`simple\_http\_sequence\_stream\_infer\_client.py and
simple\_grpc\_sequence\_stream\_infer\_client.py
<https://github.com/NVIDIA/triton-inference-server/tree/master/src/clients/python/examples>`_.

.. _section-image-classification-example:

Image Classification Example Application
----------------------------------------

The image classification example that uses the C++ client API is
available at `src/clients/c++/examples/image\_client.cc
<https://github.com/NVIDIA/triton-inference-server/blob/master/src/clients/c%2B%2B/examples/image_client.cc>`_. The
Python version of the image classification client is available at
`src/clients/python/examples/image\_client.py
<https://github.com/NVIDIA/triton-inference-server/blob/master/src/clients/python/examples/image_client.py>`_.

To use image\_client (or image\_client.py) you must first have a
running Triton that is serving one or more image classification
models. The image\_client application requires that the model have a
single image input and produce a single classification output. If you
don't have a model repository with image classification models see
:ref:`section-example-model-repository` for instructions on how to
create one.

Follow the instructions in :ref:`section-running-triton` to launch
Triton using the model repository. Once Triton is running you can
use the image\_client application to send inference requests. You can
specify a single image or a directory holding images. Here we send a
request for the resnet50_netdef model from the :ref:`example model
repository <section-example-model-repository>` for an image from the
`qa/images
<https://github.com/NVIDIA/triton-inference-server/tree/master/qa/images>`_
directory::

  $ image_client -m resnet50_netdef -s INCEPTION qa/images/mug.jpg
  Request 0, batch size 1
  Image '../qa/images/mug.jpg':
      504 (COFFEE MUG) = 0.723991

The Python version of the application accepts the same command-line
arguments::

  $ python image_client.py -m resnet50_netdef -s INCEPTION qa/images/mug.jpg
  Request 0, batch size 1
  Image '../qa/images/mug.jpg':
      504 (COFFEE MUG) = 0.778078556061

The image\_client and image\_client.py applications use the client
library to talk to Triton. By default image\_client instructs the
client library to use HTTP/REST protocol, but you can use the GRPC
protocol by providing the \-i flag. You must also use the \-u flag to
point at the GRPC endpoint on Triton::

  $ image_client -i grpc -u localhost:8001 -m resnet50_netdef -s INCEPTION qa/images/mug.jpg
  Request 0, batch size 1
  Image '../qa/images/mug.jpg':
      504 (COFFEE MUG) = 0.723991

By default the client prints the most probable classification for the
image. Use the \-c flag to see more classifications::

  $ image_client -m resnet50_netdef -s INCEPTION -c 3 qa/images/mug.jpg
  Request 0, batch size 1
  Image '../qa/images/mug.jpg':
      504 (COFFEE MUG) = 0.723991
      968 (CUP) = 0.270953
      967 (ESPRESSO) = 0.00115996

The \-b flag allows you to send a batch of images for inferencing.
The image\_client application will form the batch from the image or
images that you specified. If the batch is bigger than the number of
images then image\_client will just repeat the images to fill the
batch::

  $ image_client -m resnet50_netdef -s INCEPTION -c 3 -b 2 qa/images/mug.jpg
  Request 0, batch size 2
  Image '../qa/images/mug.jpg':
      504 (COFFEE MUG) = 0.778078556061
      968 (CUP) = 0.213262036443
      967 (ESPRESSO) = 0.00293014757335
  Image '../qa/images/mug.jpg':
      504 (COFFEE MUG) = 0.778078556061
      968 (CUP) = 0.213262036443
      967 (ESPRESSO) = 0.00293014757335

Provide a directory instead of a single image to perform inferencing
on all images in the directory::

  $ image_client -m resnet50_netdef -s INCEPTION -c 3 -b 2 qa/images
  Request 0, batch size 2
  Image '../qa/images/car.jpg':
      817 (SPORTS CAR) = 0.836187
      511 (CONVERTIBLE) = 0.0708251
      751 (RACER) = 0.0597549
  Image '../qa/images/mug.jpg':
      504 (COFFEE MUG) = 0.723991
      968 (CUP) = 0.270953
      967 (ESPRESSO) = 0.00115996
  Request 1, batch size 2
  Image '../qa/images/vulture.jpeg':
      23 (VULTURE) = 0.992326
      8 (HEN) = 0.00231854
      84 (PEACOCK) = 0.00201471
  Image '../qa/images/car.jpg':
      817 (SPORTS CAR) = 0.836187
      511 (CONVERTIBLE) = 0.0708251
      751 (RACER) = 0.0597549

The `/grpc\_image\_client.py
<https://github.com/NVIDIA/triton-inference-server/blob/master/src/clients/python/examples/grpc_image_client.py>`_
application behaves the same as the image\_client except that instead
of using the client library it uses the GRPC generated library to
communicate with Triton.

.. _section-ensemble-image-classification-example:

Ensemble Image Classification Example Application
-------------------------------------------------

In comparison to the image classification example above, this example
uses an ensemble of an image-preprocessing model implemented as a
custom backend and a Caffe2 ResNet50 model. This ensemble allows you
to send the raw image binaries in the request and receive
classification results without preprocessing the images on the
client. The ensemble image classification example that uses the C++
client API is available at
`src/clients/c++/examples/ensemble\_image\_client.cc
<https://github.com/NVIDIA/triton-inference-server/blob/master/src/clients/c%2B%2B/examples/ensemble_image_client.cc>`_. The
Python version of the image classification client is available at
`src/clients/python/examples/ensemble\_image\_client.py
<https://github.com/NVIDIA/triton-inference-server/blob/master/src/clients/python/examples/ensemble_image_client.py>`_.

To use ensemble\_image\_client (or ensemble\_image\_client.py) you
must first have a running Triton that is serving the
"preprocess_resnet50_ensemble" model and the models it depends on. The
models are provided in an example ensemble model repository. See
:ref:`section-example-model-repository` for instructions on how to
create one.

Follow the instructions in :ref:`section-running-triton` to launch
Truton using the ensemble model repository. Once Triton is running you
can use the ensemble\_image\_client application to send inference
requests. You can specify a single image or a directory holding
images. Here we send a request for the ensemble from the :ref:`example
ensemble model repository <section-example-model-repository>` for an
image from the `qa/images
<https://github.com/NVIDIA/triton-inference-server/tree/master/qa/images>`_
directory::

  $ ensemble_image_client qa/images/mug.jpg
  Image 'qa/images/mug.jpg':
      504 (COFFEE MUG) = 0.723991

The Python version of the application accepts the same command-line
arguments::

  $ python ensemble_image_client.py qa/images/mug.jpg
  Image 'qa/images/mug.jpg':
      504 (COFFEE MUG) = 0.778078556061

Similar to image\_client, by default ensemble\_image\_client instructs
the client library to use HTTP protocol to talk to Triton, but you can
use GRPC protocol by providing the \-i flag. You must also use the \-u
flag to point at the GRPC endpoint on Triton::

  $ ensemble_image_client -i grpc -u localhost:8001 qa/images/mug.jpg
  Image 'qa/images/mug.jpg':
      504 (COFFEE MUG) = 0.723991

By default the client prints the most probable classification for the
image. Use the \-c flag to see more classifications::

  $ ensemble_image_client -c 3 qa/images/mug.jpg
  Image 'qa/images/mug.jpg':
      504 (COFFEE MUG) = 0.723991
      968 (CUP) = 0.270953
      967 (ESPRESSO) = 0.00115996

Provide a directory instead of a single image to perform inferencing
on all images in the directory. If the number of images exceeds the maximum
batch size of the ensemble, only the images within the maximum batch size
will be sent::

  $ ensemble_image_client -c 3 qa/images
  Image 'qa/images/car.jpg':
      817 (SPORTS CAR) = 0.836187
      511 (CONVERTIBLE) = 0.0708251
      751 (RACER) = 0.0597549
  Image 'qa/images/mug.jpg':
      504 (COFFEE MUG) = 0.723991
      968 (CUP) = 0.270953
      967 (ESPRESSO) = 0.00115996
  Image 'qa/images/vulture.jpeg':
      23 (VULTURE) = 0.992326
      8 (HEN) = 0.00231854
      84 (PEACOCK) = 0.00201471