NVIDIA Morpheus (24.10.01)
(Latest Version)

morpheus.stages.inference.triton_inference_stage.TritonInferenceWorker

class TritonInferenceWorker(inf_queue, c, model_name, server_url, force_convert_inputs, input_mapping=None, output_mapping=None, use_shared_memory=False, needs_logits=False)[source]

Bases: <a href="morpheus.stages.inference.inference_stage.InferenceWorker.html#morpheus.stages.inference.inference_stage.InferenceWorker">morpheus.stages.inference.inference_stage.InferenceWorker</a>

Inference worker class for all Triton inference server requests.

Parameters
inf_queue<a href="morpheus.utils.producer_consumer_queue.ProducerConsumerQueue.html#morpheus.utils.producer_consumer_queue.ProducerConsumerQueue">morpheus.utils.producer_consumer_queue.ProducerConsumerQueue</a>

Inference queue.

c<a href="morpheus.config.Config.html#morpheus.config.Config">morpheus.config.Config</a>

Pipeline configuration instance.

model_namestr

Name of the model specifies which model can handle the inference requests that are sent to Triton inference server.

server_urlstr

Triton server gRPC URL including the port.

force_convert_inputs: bool

Whether to convert the inputs to the type specified by Triton. This will happen automatically if no data would be lost in the conversion (i.e., float -> double). Set this to True to convert the input even if data would be lost (i.e., double -> float).

inout_mappingdict[str, str]

Dictionary used to map pipeline input/output names to Triton input/output names. Use this if the Morpheus names do not match the model.

use_shared_memory: bool, default = False

Whether to use CUDA Shared IPC Memory for transferring data to Triton. Using CUDA IPC reduces network transfer time but requires that Morpheus and Triton are located on the same machine.

needs_logitsbool, default = False

Determines whether a logits calculation is needed for the value returned by the Triton inference response.

Attributes
needs_logits

Methods

<a href="#morpheus.stages.inference.triton_inference_stage.TritonInferenceWorker.build_output_message">build_output_message</a>(msg) Create initial inference response message with result values initialized to zero.
<a href="#morpheus.stages.inference.triton_inference_stage.TritonInferenceWorker.calc_output_dims">calc_output_dims</a>(msg) Calculates the dimensions of the inference output message data given an input message.
<a href="#morpheus.stages.inference.triton_inference_stage.TritonInferenceWorker.init">init</a>() This function instantiate triton client and memory allocation for inference input and output.
<a href="#morpheus.stages.inference.triton_inference_stage.TritonInferenceWorker.process">process</a>(batch, callback) This function sends batch of events as a requests to Triton inference server using triton client API.
<a href="#morpheus.stages.inference.triton_inference_stage.TritonInferenceWorker.stop">stop</a>() Override this function to stop the inference workers or carry out any additional cleanups.
supports_cpp_node
build_output_message(msg)[source]

Create initial inference response message with result values initialized to zero. Results will be set in message as each inference mini-batch is processed.

Parameters
msg<a href="morpheus.messages.html#morpheus.messages.ControlMessage">morpheus.messages.ControlMessage</a>

Batch of ControlMessage.

Returns
<a href="morpheus.messages.html#morpheus.messages.ControlMessage">morpheus.messages.ControlMessage</a>

Response message with probabilities calculated from inference results.

calc_output_dims(msg)[source]

Calculates the dimensions of the inference output message data given an input message.

Parameters
msg<a href="morpheus.messages.html#morpheus.messages.ControlMessage">morpheus.messages.ControlMessage</a>

Pipeline inference input batch before splitting into smaller inference batches.

Returns
tuple

Output dimensions of response.

init()[source]

This function instantiate triton client and memory allocation for inference input and output.

process(batch, callback)[source]

This function sends batch of events as a requests to Triton inference server using triton client API.

Parameters
batch<a href="morpheus.messages.html#morpheus.messages.ControlMessage">morpheus.messages.ControlMessage</a>

Mini-batch of inference messages.

callbacktyping.Callable[[morpheus.pipeline.messages.TensorMemory], None]

Callback to set the values for the inference response.

stop()[source]

Override this function to stop the inference workers or carry out any additional cleanups.

Previous morpheus.stages.inference.triton_inference_stage.TritonInferenceStage
Next morpheus.stages.input
© Copyright 2024, NVIDIA. Last updated on Dec 3, 2024.