Struct InferenceSpecs

Struct Documentation

struct InferenceSpecs

Struct that holds specifications related to inference, along with input and output data buffer.

Public Functions

InferenceSpecs() = default
inline InferenceSpecs(const std::string &backend, const Mappings &backend_map, const Mappings &model_path_map, const MultiMappings &pre_processor_map, const MultiMappings &inference_map, const Mappings &device_map, const Mappings &dla_core_map, const Mappings &temporal_map, const Mappings &activation_map, const MultiMappings &trt_opt_profile, bool dynamic_input_dims, bool is_engine_path, bool oncpu, bool parallel_proc, bool use_fp16, bool cuda_buffer_in, bool cuda_buffer_out, bool use_cuda_graphs, int32_t dla_core, bool dla_gpu_fallback, std::function<cudaStream_t(int32_t)> allocate_cuda_stream)

Constructor.

Parameters

  • backend – Backend inference (trt or onnxrt)

  • backend_map – Backend inference map with model name as key, and backend as value

  • model_path_mapMap with model name as key, path to model as value

  • pre_processor_mapMap with model name as key, input tensor names in vector form as value

  • inference_mapMap with model name as key, output tensor names in vector form as value

  • device_mapMap with model name as key, GPU ID for inference as value

  • dla_core_mapMap with model name as key, DLA core index for inference as value

  • temporal_mapMap with model name as key, frame number to skip for inference as value

  • activation_mapMap with key as model name and activation state for inference as value

  • trt_opt_profile – Vector of values for TensorRT optimization profile during engine creation

  • dynamic_input_dims – Input dimensions to the model is dynamic

  • is_engine_path – Input path to model is trt engine

  • oncpu – Perform inference on CPU

  • parallel_proc – Perform parallel inference of multiple models

  • use_fp16 – Use FP16 conversion, only supported for trt

  • cuda_buffer_in – Input buffers on CUDA

  • cuda_buffer_out – Output buffers on CUDA

  • use_cuda_graphs – Use CUDA graphs, only supported for trt

  • dla_core – The DLA core index to execute the engine on, only supported for trt. Set to -1 to disable DLA.

  • dla_gpu_fallback – If DLA is enabled, use the GPU if a layer cannot be executed on DLA.

  • allocate_cuda_stream – Function to allocate a CUDA stream (optional)

inline Mappings get_path_map() const

Get the model data path map.

Returns

Mappings data

inline Mappings get_backend_map() const

Get the model backend map.

Returns

Mappings data

inline Mappings get_device_map() const

Get the device map.

Returns

Mappings data

inline Mappings get_dla_core_map() const

Get the DLA core map.

Returns

Mappings data

inline Mappings get_temporal_map() const

Get the Temporal map.

Returns

Mappings data

inline Mappings get_activation_map() const

Get the Activation map.

Returns

Mappings data

inline void set_activation_map(const Mappings &activation_map)

Set the Activation map.

Parameters

activation_mapMap that will be used to update the activation_map_ of specs.

Public Members

std::string backend_type_ = {""}

Backend type (for all models)

Mappings backend_map_

Backend map.

Mappings model_path_map_

Map with key as model name and value as model file path.

MultiMappings pre_processor_map_

Map with key as model name and value as vector of input tensor names.

MultiMappings inference_map_

Map with key as model name and value as inferred tensor name.

Mappings device_map_

Map with key as model name and value as GPU ID for inference.

Mappings dla_core_map_

Map with key as model name and value as DLA core index for inference.

Mappings temporal_map_

Map with key as model name and frame number to skip for inference as value.

Mappings activation_map_

Map with key as model name and activation state for inference as value.

std::map<std::string, std::vector<int>> dims_per_tensor_

Map holding dimensions per tensor. Key is tensor name and value is a vector with dimensions.

MultiMappings trt_opt_profile_

TensorRT optimization profile during engine creation for dynamic inputs.

bool dynamic_input_dims_ = false

Flag showing if the the inputs to the models are dynamic.

bool is_engine_path_ = false

Flag showing if input model path is path to engine files.

bool oncuda_ = true

Flag showing if inference on CUDA. Default is True.

bool parallel_processing_ = false

Flag to enable parallel inference. Default is True.

bool use_fp16_ = false

Flag showing if trt engine file conversion will use FP16. Default is False.

bool cuda_buffer_in_ = true

Flag showing if input buffers are on CUDA. Default is True.

bool cuda_buffer_out_ = true

Flag showing if output buffers are on CUDA. Default is True.

bool use_cuda_graphs_ = true

Flag showing if using CUDA Graphs. Default is True.

int32_t dla_core_ = -1

The DLA core index to execute the engine on, starts at 0. Set to -1 (the default) to disable DLA.

bool dla_gpu_fallback_ = true

If DLA is enabled, use the GPU if a layer cannot be executed on DLA. If the fallback is disabled, engine creation will fail if a layer cannot executed on DLA.

DataMap data_per_tensor_

Input Data Map with key as tensor name and value as DataBuffer.

DataMap output_per_model_

Output Data Map with key as tensor name and value as DataBuffer.

std::function<cudaStream_t(int32_t device_id)> allocate_cuda_stream_

Function to allocate a CUDA stream.

