NvTritonExt ----------- NVIDIA Triton Inference components. This extension is intended to be used with Triton 2.5.0 (x86_64) and 2.7.0 (L4T). .. _NVIDIA Triton: https://github.com/triton-inference-server/server Refer to the official `NVIDIA Triton`_ documentation for support matrix and more. * UUID: a3c95d1c-c06c-4a4e-a2f9-8d9078ab645c * Version: 0.0.3 * Author: NVIDIA * License: Proprietary Components ~~~~~~~~~~ nvidia::triton::TritonServer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. _Triton C API: https://github.com/triton-inference-server/server/blob/main/docs/inference_protocols.md#c-api Triton inference server component using the `Triton C API`_. * Component ID: 26228984-ffc4-4162-9af5-6e3008aa2982 * Base Type: nvidia::gxf::Component Parameters ++++++++++ **log_level** Logging level for Triton. Valid values: 0: Error 1: Warn 2: Info 3+: Verbose * Flags: GXF_PARAMETER_FLAGS_NONE (1 = default) * Type: GXF_PARAMETER_TYPE_UINT32 | **enable_strict_model_config** Enables strict model configuration to enforce presence of config. If disabled, TensorRT, TensorFlow saved-model, and ONNX models do not require a model configuration file. Triton can derive all the required settings automatically. * Flags: GXF_PARAMETER_FLAGS_NONE (true = default) * Type: GXF_PARAMETER_TYPE_BOOL | **min_compute_capability** Minimum compute capability for the GPU Triton will use. * Flags: GXF_PARAMETER_FLAGS_NONE (6.0 = default) * Type: GXF_PARAMETER_TYPE_FLOAT64 | **model_repository_path** Path to Triton model repository. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_STRING | **tf_gpu_memory_fraction** The portion of GPU memory to be reserved for TensorFlow models. * Flags: GXF_PARAMETER_FLAGS_NONE (0.0 = default) * Type: GXF_PARAMETER_TYPE_FLOAT64 | **tf_disable_soft_placement_** Allow Tensorflow to use CPU operation when GPU implementation is not available. * Flags: GXF_PARAMETER_FLAGS_NONE (true = default) * Type: GXF_PARAMETER_TYPE_BOOL | **backend_directory_path** Path to Triton backend directory. * Flags: GXF_PARAMETER_FLAGS_NONE ("" = default) * Type: GXF_PARAMETER_TYPE_STRING | **model_control_mode** Triton model control mode. **Valid values**: - "none": Load all models in the model repository at startup. - "explicit": Allow models to load when needed. * Flags: GXF_PARAMETER_FLAGS_NONE ("explicit" = default) * Type: GXF_PARAMETER_TYPE_STRING | **backend_configs** Triton backend configurations in the format: ``backend,setting=value``. * Flags: GXF_PARAMETER_FLAGS_OPTIONAL * Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings) | nvidia::triton::TritonInferencerInterface ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Helper component that provides an interface for Triton inferencing. * Component ID: 1661c015-6b1c-422d-a6f0-248cdc197b1a * Base Type: nvidia::gxf::Component nvidia::triton::TritonInferencerDirectImpl ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Component that implements the ``TritonInferencerInterface`` to obtain inferences from the ``TritonServer``. * Component ID: b84cf267-b223-4df5-ac82-752d9fae1014 * Base Type: nvidia::triton::TritonInferencerInterface Parameters ++++++++++ **server** Triton server. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_HANDLE * Handle Type: nvidia::triton::TritonServer | **model_name** Triton model name to run inference. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_STRING | **model_version** Triton model version of the model name to run inference. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_INT64 | **max_batch_size** Max batch size to run inference. This should match the value in the Triton model repository. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_UINT32 | **num_concurrent_requests** Maximum number of concurrent inference requests for this model version. This is used to define a pool of requests. * Flags: GXF_PARAMETER_FLAGS_NONE (1 = default) * Type: GXF_PARAMETER_TYPE_UINT32 | **async_scheduling_term** Asynchronous scheduling term that determines when a response is ready. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_HANDLE * Handle Type: nvidia::gxf::AsynchronousSchedulingTerm | nvidia::triton::TritonInferenceRequest ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Generic codelet that requests a Triton Inference. This will use a handle to an ``InferencerImpl`` to interface with Triton. * Component ID: 34395920-232c-446f-b5b7-46f642ce84df * Base Type: nvidia::gxf::Codelet Parameters ++++++++++ **inferencer** Handle to Triton inference implementation. This is used to request an inference. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_HANDLE * Handle Type: nvidia::triton::TritonInferencerInterface | **rx** Receivers for the input tensors for the inference request. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_CUSTOM (List of Handles) * Handle Type: nvidia::gxf::Receiver | **input_tensor_names** Names of input tensors that exist in the ordered receivers in ``rx``. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings) | **input_binding_names** Names of input bindings corresponding to Triton's config inputs in the same order of what is provided in ``input_tensor_names``. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings) | nvidia::triton::TritonInferenceResponse ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Generic codelet that obtains a response from a Triton Inference. This will use a handle to an ``InferencerImpl`` to interface with Triton. * Component ID: 4dd957a7-aa55-4117-90d3-9a98e31ee176 * Base Type: nvidia::gxf::Codelet Parameters ++++++++++ **inferencer** Handle to Triton inference implementation. This is used to request an inference. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_HANDLE * Handle Type: nvidia::triton::TritonInferencerInterface | **output_tensor_names** Names of output tensors in the order to be retrieved from the model. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings) | **output_binding_names** Names of output bindings in the model in the same order of of what is provided in ``output_tensor_names``. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_CUSTOM (List of strings) | **tx** Single transmitter to publish output tensors. * Flags: GXF_PARAMETER_FLAGS_NONE * Type: GXF_PARAMETER_TYPE_HANDLE * Handle Type: nvidia::gxf::Transmitter |