Class InferenceClientStage
Defined in File triton_inference.hpp
Base Type
public mrc::pymrc::PythonNode< std::shared_ptr< MultiInferenceMessage >, std::shared_ptr< MultiResponseMessage > >
-
class InferenceClientStage : public mrc::pymrc::PythonNode<std::shared_ptr<MultiInferenceMessage>, std::shared_ptr<MultiResponseMessage>>
Perform inference with Triton Inference Server. This class specifies which inference implementation category (Ex: NLP/FIL) is needed for inferencing.
Public Types
- using base_t = mrc::pymrc::PythonNode<std::shared_ptr<MultiInferenceMessage>, std::shared_ptr<MultiResponseMessage>>
Public Functions
-
InferenceClientStage(std::string model_name, std::string server_url, bool force_convert_inputs, bool use_shared_memory, bool needs_logits, std::map<std::string, std::string> inout_mapping = {})
Construct a new Inference Client Stage object.
- Parameters
model_name – : Name of the model specifies which model can handle the inference requests that are sent to Triton inference
server_url – : Triton server URL.
force_convert_inputs – : Instructs the stage to convert the incoming data to the same format that Triton is expecting. If set to False, data will only be converted if it would not result in the loss of data.
use_shared_memory – : Whether or not to use CUDA Shared IPC Memory for transferring data to Triton. Using CUDA IPC reduces network transfer time but requires that Morpheus and Triton are located on the same machine.
needs_logits – : Determines if logits are required.
inout_mapping – : Dictionary used to map pipeline input/output names to Triton input/output names. Use this if the Morpheus names do not match the model.