NVIDIA Holoscan SDK v2.4.0
Holoscan v2.4.0

Class InferenceOp

Base Type

class InferenceOp : public holoscan::Operator

Inference Operator class to perform single/multi model inference.

==Named Inputs==

  • receivers : multi-receiver accepting nvidia::gxf::Tensor(s)

    • Any number of upstream ports may be connected to this receivers port. The operator will search across all messages for tensors matching those specified in in_tensor_names. These are the set of input tensors used by the models in inference_map.

==Named Outputs==

  • transmitter : nvidia::gxf::Tensor(s)

    • A message containing tensors corresponding to the inference results from all models will be emitted. The names of the tensors transmitted correspond to those in out_tensor_names.

==Parameters==

For more details on <a class="reference internal" href="#classholoscan_1_1ops_1_1InferenceOp" target="_self">InferenceOp</a> parameters, see Customizing the Inference Operator or refer to Inference.

  • backend: Backend to use for inference. Set "trt" for TensorRT, "torch" for LibTorch and "onnxrt" for the ONNX runtime.

  • allocator: Memory allocator to use for the output.

  • inference_map: Tensor to model map.

  • model_path_map: Path to the ONNX model to be loaded.

  • pre_processor_map: Pre processed data to model map.

  • device_map: Mapping of model (<a class="reference internal" href="structholoscan_1_1ops_1_1InferenceOp_1_1DataMap.html#structholoscan_1_1ops_1_1InferenceOp_1_1DataMap" target="_self">DataMap</a>) to GPU ID for inference. Optional.

  • backend_map: Mapping of model (<a class="reference internal" href="structholoscan_1_1ops_1_1InferenceOp_1_1DataMap.html#structholoscan_1_1ops_1_1InferenceOp_1_1DataMap" target="_self">DataMap</a>) to backend type for inference. Backend options: "trt" or "torch". Optional.

  • temporal_map: Mapping of model (<a class="reference internal" href="structholoscan_1_1ops_1_1InferenceOp_1_1DataMap.html#structholoscan_1_1ops_1_1InferenceOp_1_1DataMap" target="_self">DataMap</a>) to a frame delay for model inference. Optional.

  • activation_map: Mapping of model (<a class="reference internal" href="structholoscan_1_1ops_1_1InferenceOp_1_1DataMap.html#structholoscan_1_1ops_1_1InferenceOp_1_1DataMap" target="_self">DataMap</a>) to a activation state for model inference. Optional.

  • in_tensor_names: Input tensors (std::vector&lt;std::string&gt;). Optional.

  • out_tensor_names: Output tensors (std::vector&lt;std::string&gt;). Optional.

  • infer_on_cpu: Whether to run the computation on the CPU instead of GPU. Optional (default: false).

  • parallel_inference: Whether to enable parallel execution. Optional (default: true).

  • input_on_cuda: Whether the input buffer is on the GPU. Optional (default: true).

  • output_on_cuda: Whether the output buffer is on the GPU. Optional (default: true).

  • transmit_on_cuda: Whether to transmit the message on the GPU. Optional (default: true).

  • enable_fp16: Use 16-bit floating point computations. Optional (default: false).

  • is_engine_path: Whether the input model path mapping is for trt engine files. Optional (default: false).

  • cuda_stream_pool: <a class="reference internal" href="classholoscan_1_1CudaStreamPool.html#classholoscan_1_1CudaStreamPool" target="_self">holoscan::CudaStreamPool</a> instance to allocate CUDA streams. Optional (default: nullptr).

==Device Memory Requirements==

When using this operator with a <a class="reference internal" href="classholoscan_1_1BlockMemoryPool.html#classholoscan_1_1BlockMemoryPool" target="_self">BlockMemoryPool</a>, num_blocks must be greater than or equal to the number of output tensors that will be produced. The block_size in bytes must be greater than or equal to the largest output tensor (in bytes). If output_on_cuda is true, the blocks should be in device memory (storage_type=1), otherwise they should be CUDA pinned host memory (storage_type=0).

Public Functions

HOLOSCAN_OPERATOR_FORWARD_ARGS (InferenceOp) InferenceOp()=default
virtual void setup(OperatorSpec &spec) override

Define the operator specification.

Parameters

spec – The reference to the operator specification.

virtual void initialize() override

Initialize the operator.

This function is called when the fragment is initialized by Executor::initialize_fragment().

virtual void start() override

Implement the startup logic of the operator.

This method is called multiple times over the lifecycle of the operator according to the order defined in the lifecycle, and used for heavy initialization tasks such as allocating memory resources.

virtual void compute(InputContext &op_input, OutputContext &op_output, ExecutionContext &context) override

Implement the compute method.

This method is called by the runtime multiple times. The runtime calls this method until the operator is stopped.

Parameters
  • op_input – The input context of the operator.

  • op_output – The output context of the operator.

  • context – The execution context of the operator.

virtual void stop() override

Implement the shutdown logic of the operator.

This method is called multiple times over the lifecycle of the operator according to the order defined in the lifecycle, and used for heavy deinitialization tasks such as deallocation of all resources previously assigned in start.

struct DataMap

DataMap specification

Public Functions

DataMap() = default
inline explicit operator bool() const noexcept
inline void insert(const std::string &key, const std::string &value)
inline std::map<std::string, std::string> get_map() const

Public Members

std::map<std::string, std::string> mappings_

struct DataVecMap

DataVecMap specification

Public Functions

DataVecMap() = default
inline explicit operator bool() const noexcept
inline void insert(const std::string &key, const std::vector<std::string> &value)
inline std::map<std::string, std::vector<std::string>> get_map() const

Public Members

std::map<std::string, std::vector<std::string>> mappings_

Previous Class HolovizOp
Next Class InferenceProcessorOp
© Copyright 2022-2024, NVIDIA. Last updated on Oct 1, 2024.