NvTritonExt
-----------
NVIDIA Triton Inference components. This extension is intended to be used with Triton 2.5.0 (x86_64) and 2.7.0 (L4T).

.. _NVIDIA Triton: https://github.com/triton-inference-server/server

Refer to the official `NVIDIA Triton`_ documentation for support matrix and more.

* UUID: a3c95d1c-c06c-4a4e-a2f9-8d9078ab645c
* Version: 0.0.3
* Author: NVIDIA
* License: Proprietary

Components
~~~~~~~~~~

nvidia::triton::TritonServer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. _Triton C API: https://github.com/triton-inference-server/server/blob/main/docs/inference_protocols.md#c-api

Triton inference server component using the `Triton C API`_.

* Component ID: 26228984-ffc4-4162-9af5-6e3008aa2982
* Base Type: nvidia::gxf::Component


Parameters
++++++++++

**log_level**

Logging level for Triton.

Valid values:

  0: Error

  1: Warn

  2: Info

  3+: Verbose

* Flags: GXF_PARAMETER_FLAGS_NONE (1 = default)
* Type: GXF_PARAMETER_TYPE_UINT32

|

**enable_strict_model_config**

Enables strict model configuration to enforce presence of config. If disabled, TensorRT,
TensorFlow saved-model, and ONNX models do not require a model configuration file. Triton can derive all the required settings automatically.

* Flags: GXF_PARAMETER_FLAGS_NONE (true = default)
* Type: GXF_PARAMETER_TYPE_BOOL

|

**min_compute_capability**

Minimum compute capability for the GPU Triton will use.

* Flags: GXF_PARAMETER_FLAGS_NONE (6.0 = default)
* Type: GXF_PARAMETER_TYPE_FLOAT64

|

**model_repository_path**

Path to Triton model repository.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_STRING

|

**tf_gpu_memory_fraction**

The portion of GPU memory to be reserved for TensorFlow models.

* Flags: GXF_PARAMETER_FLAGS_NONE (0.0 = default)
* Type: GXF_PARAMETER_TYPE_FLOAT64

|

**tf_disable_soft_placement_**

Allow Tensorflow to use CPU operation when GPU implementation is not available.

* Flags: GXF_PARAMETER_FLAGS_NONE (true = default)
* Type: GXF_PARAMETER_TYPE_BOOL

|

**backend_directory_path**

Path to Triton backend directory.

* Flags: GXF_PARAMETER_FLAGS_NONE ("" = default)
* Type: GXF_PARAMETER_TYPE_STRING

|

**model_control_mode**

Triton model control mode.

**Valid values**:

   - "none": Load all models in the model repository at startup.

   - "explicit": Allow models to load when needed.

* Flags: GXF_PARAMETER_FLAGS_NONE ("explicit" = default)
* Type: GXF_PARAMETER_TYPE_STRING

|

**backend_configs**

Triton backend configurations in the format: ``backend,setting=value``.

* Flags: GXF_PARAMETER_FLAGS_OPTIONAL
* Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings)

|


nvidia::triton::TritonInferencerInterface
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Helper component that provides an interface for Triton inferencing.

* Component ID: 1661c015-6b1c-422d-a6f0-248cdc197b1a
* Base Type: nvidia::gxf::Component

nvidia::triton::TritonInferencerDirectImpl
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Component that implements the ``TritonInferencerInterface`` to obtain inferences from the ``TritonServer``.

* Component ID: b84cf267-b223-4df5-ac82-752d9fae1014
* Base Type: nvidia::triton::TritonInferencerInterface

Parameters
++++++++++

**server**

Triton server.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_HANDLE
* Handle Type: nvidia::triton::TritonServer

|

**model_name**

Triton model name to run inference.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_STRING

|

**model_version**

Triton model version of the model name to run inference.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_INT64

|

**max_batch_size**

Max batch size to run inference. This should match the value in the Triton model repository.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_UINT32

|

**num_concurrent_requests**

Maximum number of concurrent inference requests for this model version. This is used to define a
pool of requests.

* Flags: GXF_PARAMETER_FLAGS_NONE (1 = default)
* Type: GXF_PARAMETER_TYPE_UINT32

|

**async_scheduling_term**

Asynchronous scheduling term that determines when a response is ready.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_HANDLE
* Handle Type: nvidia::gxf::AsynchronousSchedulingTerm

|

nvidia::triton::TritonInferenceRequest
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Generic codelet that requests a Triton Inference. This will use a handle to an ``InferencerImpl`` to
interface with Triton.

* Component ID: 34395920-232c-446f-b5b7-46f642ce84df
* Base Type: nvidia::gxf::Codelet


Parameters
++++++++++

**inferencer**

Handle to Triton inference implementation. This is used to request an inference.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_HANDLE
* Handle Type: nvidia::triton::TritonInferencerInterface

|

**rx**

Receivers for the input tensors for the inference request.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_CUSTOM (List of Handles)
* Handle Type: nvidia::gxf::Receiver

|

**input_tensor_names**

Names of input tensors that exist in the ordered receivers in ``rx``.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings)

|

**input_binding_names**

Names of input bindings corresponding to Triton's config inputs in the same order of what is
provided in ``input_tensor_names``.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings)

|

nvidia::triton::TritonInferenceResponse
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Generic codelet that obtains a response from a Triton Inference. This will use a handle to an
``InferencerImpl`` to interface with Triton.

* Component ID: 4dd957a7-aa55-4117-90d3-9a98e31ee176
* Base Type: nvidia::gxf::Codelet

Parameters
++++++++++

**inferencer**

Handle to Triton inference implementation. This is used to request an inference.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_HANDLE
* Handle Type: nvidia::triton::TritonInferencerInterface

|

**output_tensor_names**

Names of output tensors in the order to be retrieved from the model.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings)

|

**output_binding_names**

Names of output bindings in the model in the same order of of what is provided in
``output_tensor_names``.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings)

|

**tx**

Single transmitter to publish output tensors.

* Flags: GXF_PARAMETER_FLAGS_NONE
* Type: GXF_PARAMETER_TYPE_HANDLE
* Handle Type: nvidia::gxf::Transmitter

|