Class InferenceClientStage

Base Type

  • public mrc::pymrc::AsyncioRunnable< std::shared_ptr< MultiInferenceMessage >, std::shared_ptr< MultiResponseMessage > >

class InferenceClientStage : public mrc::pymrc::AsyncioRunnable<std::shared_ptr<MultiInferenceMessage>, std::shared_ptr<MultiResponseMessage>>

Perform inference with Triton Inference Server. This class specifies which inference implementation category (Ex: NLP/FIL) is needed for inferencing.

Public Types

using sink_type_t = std::shared_ptr<MultiInferenceMessage>

using source_type_t = std::shared_ptr<MultiResponseMessage>

Public Functions

InferenceClientStage(std::unique_ptr<IInferenceClient> &&client, std::string model_name, bool needs_logits, std::vector<TensorModelMapping> input_mapping, std::vector<TensorModelMapping> output_mapping)

Construct a new Inference Client Stage object.

Parameters
  • client – : Inference client instance.

  • model_name – : Name of the model specifies which model can handle the inference requests that are sent to Triton inference

  • needs_logits – : Determines if logits are required.

  • inout_mapping – : Dictionary used to map pipeline input/output names to Triton input/output names. Use this if the Morpheus names do not match the model.

mrc::coroutines::AsyncGenerator<std::shared_ptr<MultiResponseMessage>> on_data(std::shared_ptr<MultiInferenceMessage> &&data, std::shared_ptr<mrc::coroutines::Scheduler> on) override

Process a single MultiInferenceMessage by running the constructor-provided inference client against it’s Tensor, and yields the result as a MultiResponseMessage

Previous Class IInferenceClientSession
Next Class InferenceMemory
© Copyright 2024, NVIDIA. Last updated on Apr 11, 2024.