Struct InferenceSpecs
Defined in File holoinfer_buffer.hpp
-
struct InferenceSpecs
Struct that holds specifications related to inference, along with input and output data buffer.
Public Functions
-
InferenceSpecs() = default
-
inline InferenceSpecs(const std::string &backend, const Mappings &backend_map, const Mappings &model_path_map, const MultiMappings &pre_processor_map, const MultiMappings &inference_map, const Mappings &device_map, const Mappings &dla_core_map, const Mappings &temporal_map, const Mappings &activation_map, const std::vector<int32_t> &trt_opt_profile, bool is_engine_path, bool oncpu, bool parallel_proc, bool use_fp16, bool cuda_buffer_in, bool cuda_buffer_out, bool use_cuda_graphs, int32_t dla_core, bool dla_gpu_fallback)
Constructor.
- Parameters
backend – Backend inference (trt or onnxrt)
backend_map – Backend inference map with model name as key, and backend as value
model_path_map – Map with model name as key, path to model as value
pre_processor_map – Map with model name as key, input tensor names in vector form as value
inference_map – Map with model name as key, output tensor names in vector form as value
device_map – Map with model name as key, GPU ID for inference as value
dla_core_map – Map with model name as key, DLA core index for inference as value
temporal_map – Map with model name as key, frame number to skip for inference as value
activation_map – Map with key as model name and activation state for inference as value
trt_opt_profile – Vector of values for TensorRT optimization profile during engine creation
is_engine_path – Input path to model is trt engine
oncpu – Perform inference on CPU
parallel_proc – Perform parallel inference of multiple models
use_fp16 – Use FP16 conversion, only supported for trt
cuda_buffer_in – Input buffers on CUDA
cuda_buffer_out – Output buffers on CUDA
use_cuda_graphs – Use CUDA graphs, only supported for trt
dla_core – The DLA core index to execute the engine on, only supported for trt. Set to -1 to disable DLA.
dla_gpu_fallback – If DLA is enabled, use the GPU if a layer cannot be executed on DLA.
-
inline Mappings get_path_map() const
Get the model data path map.
- Returns
Mappings data
-
inline Mappings get_backend_map() const
Get the model backend map.
- Returns
Mappings data
-
inline Mappings get_device_map() const
Get the device map.
- Returns
Mappings data
-
inline Mappings get_dla_core_map() const
Get the DLA core map.
- Returns
Mappings data
-
inline Mappings get_temporal_map() const
Get the Temporal map.
- Returns
Mappings data
-
inline Mappings get_activation_map() const
Get the Activation map.
- Returns
Mappings data
-
inline void set_activation_map(const Mappings &activation_map)
Set the Activation map.
- Parameters
activation_map – Map that will be used to update the activation_map_ of specs.
Public Members
-
std::string backend_type_ = {""}
Backend type (for all models)
-
Mappings backend_map_
Backend map.
-
MultiMappings pre_processor_map_
Map with key as model name and value as vector of input tensor names.
-
MultiMappings inference_map_
Map with key as model name and value as inferred tensor name.
-
std::vector<int32_t> trt_opt_profile_
TensorRT optimization profile during engine creation for dynamic inputs.
-
bool is_engine_path_ = false
Flag showing if input model path is path to engine files.
-
bool oncuda_ = true
Flag showing if inference on CUDA. Default is True.
-
bool parallel_processing_ = false
Flag to enable parallel inference. Default is True.
-
bool use_fp16_ = false
Flag showing if trt engine file conversion will use FP16. Default is False.
-
bool cuda_buffer_in_ = true
Flag showing if input buffers are on CUDA. Default is True.
-
bool cuda_buffer_out_ = true
Flag showing if output buffers are on CUDA. Default is True.
-
bool use_cuda_graphs_ = true
Flag showing if using CUDA Graphs. Default is True.
-
int32_t dla_core_ = -1
The DLA core index to execute the engine on, starts at 0. Set to -1 (the default) to disable DLA.
-
bool dla_gpu_fallback_ = true
If DLA is enabled, use the GPU if a layer cannot be executed on DLA. If the fallback is disabled, engine creation will fail if a layer cannot executed on DLA.
-
DataMap data_per_tensor_
Input Data Map with key as tensor name and value as DataBuffer.
-
DataMap output_per_model_
Output Data Map with key as tensor name and value as DataBuffer.
-
InferenceSpecs() = default