NVIDIA Docs Hub Homepage NVIDIA Holoscan NVIDIA Holoscan SDK v.3.9.0 Struct InferenceSpecs

Struct InferenceSpecs

Defined in File holoinfer_buffer.hpp

Struct Documentation

struct InferenceSpecs

Struct that holds specifications related to inference, along with input and output data buffer.

Public Functions

InferenceSpecs() = default

inline InferenceSpecs(const std::string &backend, const Mappings &backend_map, const Mappings &model_path_map, const MultiMappings &pre_processor_map, const MultiMappings &inference_map, const Mappings &device_map, const Mappings &dla_core_map, const Mappings &temporal_map, const Mappings &activation_map, const MultiMappings &trt_opt_profile, bool dynamic_input_dims, bool is_engine_path, bool oncpu, bool parallel_proc, bool use_fp16, bool cuda_buffer_in, bool cuda_buffer_out, bool use_cuda_graphs, int32_t dla_core, bool dla_gpu_fallback, std::function<cudaStream_t(int32_t)> allocate_cuda_stream)

Constructor.

Parameters

backend – Backend inference (trt or onnxrt)
backend_map – Backend inference map with model name as key, and backend as value
model_path_map – Map with model name as key, path to model as value
pre_processor_map – Map with model name as key, input tensor names in vector form as value
inference_map – Map with model name as key, output tensor names in vector form as value
device_map – Map with model name as key, GPU ID for inference as value
dla_core_map – Map with model name as key, DLA core index for inference as value
temporal_map – Map with model name as key, frame number to skip for inference as value
activation_map – Map with key as model name and activation state for inference as value
trt_opt_profile – Vector of values for TensorRT optimization profile during engine creation
dynamic_input_dims – Input dimensions to the model is dynamic
is_engine_path – Input path to model is trt engine
oncpu – Perform inference on CPU
parallel_proc – Perform parallel inference of multiple models
use_fp16 – Use FP16 conversion, only supported for trt
cuda_buffer_in – Input buffers on CUDA
cuda_buffer_out – Output buffers on CUDA
use_cuda_graphs – Use CUDA graphs, only supported for trt
dla_core – The DLA core index to execute the engine on, only supported for trt. Set to -1 to disable DLA.
dla_gpu_fallback – If DLA is enabled, use the GPU if a layer cannot be executed on DLA.
allocate_cuda_stream – Function to allocate a CUDA stream (optional)

inline Mappings get_path_map() const

Get the model data path map.

Returns: Mappings data

inline Mappings get_backend_map() const

Get the model backend map.

Returns: Mappings data

inline Mappings get_device_map() const

Get the device map.

Returns: Mappings data

inline Mappings get_dla_core_map() const

Get the DLA core map.

Returns: Mappings data

inline Mappings get_temporal_map() const

Get the Temporal map.

Returns: Mappings data

inline Mappings get_activation_map() const

Get the Activation map.

Returns: Mappings data

inline void set_activation_map(const Mappings &activation_map)

Set the Activation map.

Parameters: activation_map – Map that will be used to update the activation_map_ of specs.

Public Members

std::string backend_type_ = {""}: Backend type (for all models)

Mappings backend_map_: Backend map.

Mappings model_path_map_: Map with key as model name and value as model file path.

MultiMappings pre_processor_map_: Map with key as model name and value as vector of input tensor names.

MultiMappings inference_map_: Map with key as model name and value as inferred tensor name.

Mappings device_map_: Map with key as model name and value as GPU ID for inference.

Mappings dla_core_map_: Map with key as model name and value as DLA core index for inference.

Mappings temporal_map_: Map with key as model name and frame number to skip for inference as value.

Mappings activation_map_: Map with key as model name and activation state for inference as value.

std::map<std::string, std::vector<int>> dims_per_tensor_: Map holding dimensions per tensor. Key is tensor name and value is a vector with dimensions.

MultiMappings trt_opt_profile_: TensorRT optimization profile during engine creation for dynamic inputs.

bool dynamic_input_dims_ = false: Flag showing if the the inputs to the models are dynamic.

bool is_engine_path_ = false: Flag showing if input model path is path to engine files.

bool oncuda_ = true: Flag showing if inference on CUDA. Default is True.

bool parallel_processing_ = false: Flag to enable parallel inference. Default is True.

bool use_fp16_ = false: Flag showing if trt engine file conversion will use FP16. Default is False.

bool cuda_buffer_in_ = true: Flag showing if input buffers are on CUDA. Default is True.

bool cuda_buffer_out_ = true: Flag showing if output buffers are on CUDA. Default is True.

bool use_cuda_graphs_ = true: Flag showing if using CUDA Graphs. Default is True.

int32_t dla_core_ = -1: The DLA core index to execute the engine on, starts at 0. Set to -1 (the default) to disable DLA.

bool dla_gpu_fallback_ = true: If DLA is enabled, use the GPU if a layer cannot be executed on DLA. If the fallback is disabled, engine creation will fail if a layer cannot executed on DLA.

DataMap data_per_tensor_: Input Data Map with key as tensor name and value as DataBuffer.

DataMap output_per_model_: Output Data Map with key as tensor name and value as DataBuffer.

std::function<cudaStream_t(int32_t device_id)> allocate_cuda_stream_: Function to allocate a CUDA stream.

Previous Struct ActivationSpec

Next Struct NetworkOptions