Class InferenceClientStage

Base Type

  • public mrc::pymrc::PythonNode< std::shared_ptr< MultiInferenceMessage >, std::shared_ptr< MultiResponseMessage > >

class InferenceClientStage : public mrc::pymrc::PythonNode<std::shared_ptr<MultiInferenceMessage>, std::shared_ptr<MultiResponseMessage>>

Perform inference with Triton Inference Server. This class specifies which inference implementation category (Ex: NLP/FIL) is needed for inferencing.

Public Types

using base_t = mrc::pymrc::PythonNode<std::shared_ptr<MultiInferenceMessage>, std::shared_ptr<MultiResponseMessage>>

Public Functions

InferenceClientStage(std::string model_name, std::string server_url, bool force_convert_inputs, bool use_shared_memory, bool needs_logits, std::map<std::string, std::string> inout_mapping = {})

Construct a new Inference Client Stage object.

Parameters
  • model_name – : Name of the model specifies which model can handle the inference requests that are sent to Triton inference

  • server_url – : Triton server URL.

  • force_convert_inputs – : Instructs the stage to convert the incoming data to the same format that Triton is expecting. If set to False, data will only be converted if it would not result in the loss of data.

  • use_shared_memory – : Whether or not to use CUDA Shared IPC Memory for transferring data to Triton. Using CUDA IPC reduces network transfer time but requires that Morpheus and Triton are located on the same machine.

  • needs_logits – : Determines if logits are required.

  • inout_mapping – : Dictionary used to map pipeline input/output names to Triton input/output names. Use this if the Morpheus names do not match the model.

Previous Class HttpServerSourceStage
Next Class InferenceMemory
© Copyright 2023, NVIDIA. Last updated on Feb 2, 2024.