Class OnnxInfer
- Defined in File core.hpp 
Base Type
- public holoscan::inference::InferBase(Class InferBase)
- 
class OnnxInfer : public holoscan::inference::InferBase
- Onnxruntime based inference class - Public Functions - 
OnnxInfer(const std::string &model_file_path, bool enable_fp16, int32_t dla_core, bool dla_gpu_fallback, bool cuda_flag, bool cuda_buf_in, bool cuda_buf_out, std::function<cudaStream_t(int32_t device_id)> allocate_cuda_stream)
- Constructor. - Parameters
- model_file_path – Path to onnx model file 
- enable_fp16 – Flag showing if trt engine file conversion will use FP16. 
- dla_core – The DLA core index to execute the engine on, starts at 0. Set to -1 to disable DLA. 
- dla_gpu_fallback – If DLA is enabled, use the GPU if a layer cannot be executed on DLA. If the fallback is disabled, engine creation will fail if a layer cannot executed on DLA. 
- cuda_flag – Flag to show if inference will happen using CUDA 
- cuda_buf_in – Flag to demonstrate if input data buffer is on cuda 
- cuda_buf_out – Flag to demonstrate if output data buffer will be on cuda 
- allocate_cuda_stream – Function to allocate a CUDA stream (optional) 
 
 
 - 
~OnnxInfer()
- Destructor. 
 - Does the Core inference using Onnxruntime. Input and output buffer are supported on Host. Inference is supported on host and device. The provided CUDA data event is used to prepare the input data any execution of CUDA work should be in sync with this event. If the inference is using CUDA it should record a CUDA event and pass it back in - cuda_event_inference.- Parameters
- input_data – Input DataBuffer 
- output_buffer – Output DataBuffer, is populated with inferred results 
- cuda_event_data – CUDA event to synchronize input data preparation 
- cuda_event_inference – Pointer to CUDA event for inference synchronization 
 
- Returns
 
 - 
void populate_model_details()
- Populate class parameters with model details and values. 
 - 
void print_model_details()
- Print model details. 
 - 
int set_holoscan_inf_onnx_session_options()
- Create session options for inference. 
 - 
virtual std::vector<std::vector<int64_t>> get_input_dims() const
- Get input data dimensions to the model. - Returns
- Vector of input dimensions. Each dimension is a vector of int64_t corresponding to the shape of the input tensor. 
 
 - 
virtual std::vector<std::vector<int64_t>> get_output_dims() const
- Get output data dimensions from the model. - Returns
- Vector of input dimensions. Each dimension is a vector of int64_t corresponding to the shape of the input tensor. 
 
 - 
virtual std::vector<holoinfer_datatype> get_input_datatype() const
- Get input data types from the model. - Returns
- Vector of values as datatype per input tensor 
 
 - 
virtual bool set_dynamic_input_dimension(const std::vector<std::string> &input_tensors, const std::map<std::string, std::vector<int>> &dims_per_tensor)
- Updates the dimensions per tensor in case of dynamic inputs. Using the input Holoscan tensors and their dimension mapping, the internal input size vector is updated. - Parameters
- input_tensors – Vector of input Holoscan tensor names 
- dims_per_tensor – Map storing the dimensions as values and Holoscan tensor names as keys. 
 
- Returns
- true if the dynamic input dimensions were successfully updated, false otherwise 
 
 - 
virtual std::vector<holoinfer_datatype> get_output_datatype() const
- Get output data types from the model. - Returns
- Vector of values as datatype per output tensor 
 
 - 
virtual void cleanup()
 
- 
OnnxInfer(const std::string &model_file_path, bool enable_fp16, int32_t dla_core, bool dla_gpu_fallback, bool cuda_flag, bool cuda_buf_in, bool cuda_buf_out, std::function<cudaStream_t(int32_t device_id)> allocate_cuda_stream)