.. Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved. NVIDIA CORPORATION and its licensors retain all intellectual property and proprietary rights in and to this software, related documentation and any modifications thereto. Any use, reproduction, disclosure or distribution of this software and related documentation without an express license agreement from NVIDIA CORPORATION is strictly prohibited. NvTritonExt ----------- NVIDIA Triton Inference components. This extension is intended to be used with Triton 2.20.0 (x86_64) and 2.20.0 (Jetpack 5.0). .. _NVIDIA Triton: https://github.com/triton-inference-server/server Refer to the official `NVIDIA Triton`_ documentation for support matrix and more. * UUID: a3c95d1c-c06c-4a4e-a2f9-8d9078ab645c * Version: 0.0.7 * Author: NVIDIA * License: Proprietary Components ~~~~~~~~~~ nvidia::triton::TritonServer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. _Triton C API: https://github.com/triton-inference-server/server/blob/main/docs/inference_protocols.md#c-api Triton inference server component using the `Triton C API`_. * Component ID: 26228984-ffc4-4162-9af5-6e3008aa2982 * Base Type: nvidia::gxf::Component Parameters ++++++++++ **log_level** Logging level for Triton. Valid values: 0: Error 1: Warn 2: Info 3+: Verbose * Flags: GXF_PARAMETER_FLAGS_NONE (1 = default) * Type: GXF_PARAMETER_TYPE_UINT32 | **enable_strict_model_config** Enables strict model configuration to enforce presence of config. If disabled, TensorRT, TensorFlow saved-model, and ONNX models do not require a model configuration file. Triton can derive all the required settings automatically. * Flags: GXF_PARAMETER_FLAGS_NONE (true = default) * Type: GXF_PARAMETER_TYPE_BOOL | **min_compute_capability** Minimum compute capability for the GPU Triton will use. * Flags: GXF_PARAMETER_FLAGS_NONE (6.0 = default) * Type: GXF_PARAMETER_TYPE_FLOAT64 | **model_repository_paths** Path to Triton model repository. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings) | **tf_gpu_memory_fraction** The portion of GPU memory to be reserved for TensorFlow models. * Flags: GXF_PARAMETER_FLAGS_NONE (0.0 = default) * Type: GXF_PARAMETER_TYPE_FLOAT64 | **tf_disable_soft_placement_** Allow Tensorflow to use CPU operation when GPU implementation is not available. * Flags: GXF_PARAMETER_FLAGS_NONE (true = default) * Type: GXF_PARAMETER_TYPE_BOOL | **backend_directory_path** Path to Triton backend directory. * Flags: GXF_PARAMETER_FLAGS_NONE ("" = default) * Type: GXF_PARAMETER_TYPE_STRING | **model_control_mode** Triton model control mode. **Valid values**: - "none": Load all models in the model repository at startup. - "explicit": Allow models to load when needed. * Flags: GXF_PARAMETER_FLAGS_NONE ("explicit" = default) * Type: GXF_PARAMETER_TYPE_STRING | **backend_configs** Triton backend configurations in the format: ``backend,setting=value``. * Flags: GXF_PARAMETER_FLAGS_OPTIONAL * Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings) | nvidia::triton::TritonInferencerInterface ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Helper component that provides an interface for Triton inferencing. * Component ID: 1661c015-6b1c-422d-a6f0-248cdc197b1a * Base Type: nvidia::gxf::Component nvidia::triton::TritonInferencerImpl ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Component that implements the ``TritonInferencerInterface`` to obtain inferences from the ``TritonServer`` component or from an external Triton instance. * Component ID: b84cf267-b223-4df5-ac82-752d9fae1014 * Base Type: nvidia::triton::TritonInferencerInterface Parameters ++++++++++ **server** Triton server. This optional handle must be specified if the ``inference_mode`` of this component is ``Direct``. * Flags: GXF_PARAMETER_FLAGS_OPTIONAL * Type: GXF_PARAMETER_TYPE_HANDLE * Handle Type: nvidia::triton::TritonServer | **model_name** Triton model name to run inference. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_STRING | **model_version** Triton model version of the model name to run inference. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_INT64 | **max_batch_size** Max batch size to run inference. This should match the value in the Triton model repository. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_UINT32 | **num_concurrent_requests** Maximum number of concurrent inference requests for this model version. This is used to define a pool of requests. * Flags: GXF_PARAMETER_FLAGS_NONE (1 = default) * Type: GXF_PARAMETER_TYPE_UINT32 | **async_scheduling_term** Asynchronous scheduling term that determines when a response is ready. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_HANDLE * Handle Type: nvidia::gxf::AsynchronousSchedulingTerm | **inference_mode** Triton inferencing mode. Valid values: ``Direct``: This mode requires a ``TritonServer`` component handle to be passed to the optional ``server`` parameter. ``RemoteGrpc``: This mode requires the optional ``server_endpoint`` point to an external Triton gRPC server URL. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_STRING | **server_endpoint** Server endpoint URL for an external Triton instance. This optional string must be specified if the ``inference_mode`` of this component is of the ``Remote`` variety. * Flags: GXF_PARAMETER_FLAGS_OPTIONAL * Type: GXF_PARAMETER_TYPE_STRING | nvidia::triton::TritonInferenceRequest ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Generic codelet that requests a Triton Inference. This will use a handle to an ``InferencerImpl`` to interface with Triton. * Component ID: 34395920-232c-446f-b5b7-46f642ce84df * Base Type: nvidia::gxf::Codelet Parameters ++++++++++ **inferencer** Handle to Triton inference implementation. This is used to request an inference. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_HANDLE * Handle Type: nvidia::triton::TritonInferencerInterface | **rx** Receivers for the input tensors for the inference request. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_CUSTOM (List of Handles) * Handle Type: nvidia::gxf::Receiver | **input_tensor_names** Names of input tensors that exist in the ordered receivers in ``rx``. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings) | **input_binding_names** Names of input bindings corresponding to Triton's config inputs in the same order of what is provided in ``input_tensor_names``. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings) | nvidia::triton::TritonInferenceResponse ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Generic codelet that obtains a response from a Triton Inference. This will use a handle to an ``InferencerImpl`` to interface with Triton. * Component ID: 4dd957a7-aa55-4117-90d3-9a98e31ee176 * Base Type: nvidia::gxf::Codelet Parameters ++++++++++ **inferencer** Handle to Triton inference implementation. This is used to request an inference. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_HANDLE * Handle Type: nvidia::triton::TritonInferencerInterface | **output_tensor_names** Names of output tensors that will published to ``tx``. These names should exist with their corresponding order to ``output_binding_names``. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings) | **output_binding_names** Names of output bindings corresponding to Triton's config outputs in the same order of what is provided in ``output_tensor_names``. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings) | **tx** Single transmitter to publish output tensors. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_HANDLE * Handle Type: nvidia::gxf::Transmitter | nvidia::triton::TritonOptions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Generic struct that represent Triton Inference Options for model control and sequence control. * Component ID: 087696ed-229d-4199-876f-05b92d3887f0 nvidia::triton::TritonRequestReceptiveSchedulingTerm ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Triton Scheduling Term that schedules Request Codelet when the inferencer can accept a new request. * Component ID: f8602412-1242-4e43-9dbf-9c559d496b84 * Base Type: nvidia::gxf::SchedulingTerm Parameters ++++++++++ **inferencer** Handle to Triton inference implementation. This is used to check the accecptability of a new request. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_HANDLE * Handle Type: nvidia::triton::TritonInferencerInterface |