Definition at line 21 of file infer_simple_runtime.h.
Public Types | |
enum | { kLTpLayerDesc, kTpLayerNum } |
enum | { kInShapeName, kInShapeDims } |
using | InferenceDone = std::function< void(NvDsInferStatus, SharedBatchArray)> |
Function wrapper for post inference processing. More... | |
using | InputsConsumed = std::function< void(SharedBatchArray)> |
Function wrapper called after the input buffer is consumed. More... | |
using | LayersTuple = std::tuple< const LayerDescription *, int > |
Tuple containing pointer to layer descriptions and the number of layers. More... | |
using | InputShapeTuple = std::tuple< std::string, InferBatchDims > |
Tuple of layer name and dimensions including batch size. More... | |
using | InputShapes = std::vector< InputShapeTuple > |
Public Member Functions | |
TritonSimpleRuntime (std::string model, int64_t version) | |
~TritonSimpleRuntime () override | |
void | setOutputs (const std::set< std::string > &names) |
NvDsInferStatus | initialize () override |
void | addClassifyParams (const TritonClassParams &c) |
Add Triton Classification parameters to the list. More... | |
void | setTensorMaxBytes (const std::string &name, size_t maxBytes) |
Set the maximum size for the tensor, the maximum of the existing size and new input size is used. More... | |
InferTensorOrder | getInputTensorOrder () const final |
Returns the input tensor order. More... | |
void | setUniqueId (uint32_t id) |
Set the unique ID for the object instance. More... | |
int | uniqueId () const |
Get the unique ID of the object instance. More... | |
void | setFirstDimBatch (bool flag) |
Set the flag indicating that it is a batch input. More... | |
bool | isFirstDimBatch () const final |
Returns boolean indicating if batched input is expected. More... | |
uint32_t | getLayerSize () const final |
Returns the total number of layers (input + output) for the model. More... | |
uint32_t | getInputLayerSize () const final |
Returns the number of input layers for the model. More... | |
const LayerDescription * | getLayerInfo (const std::string &bindingName) const final |
Retrieve the layer information from the layer name. More... | |
LayersTuple | getInputLayers () const final |
Get the LayersTuple for input layers. More... | |
LayersTuple | getOutputLayers () const final |
Get the LayersTuple for output layers. More... | |
bool | checkInputDims (const InputShapes &shapes) const |
Check that the list of input shapes have fixed dimensions and corresponding layers are marked as input layers. More... | |
const LayerDescriptionList & | allLayers () const |
Returns the list of all descriptions of all layers, input and output. More... | |
void | setKeepInputs (bool enable) |
Set the flag indicating whether to keep inputs buffers. More... | |
int32_t | maxBatchSize () const final |
Returns the maximum batch size set for the backend. More... | |
bool | isNonBatching () const |
Checks if the batch size indicates batched processing or no. More... | |
void | setOutputPoolSize (int size) |
Helper function to access the member variables. More... | |
int | outputPoolSize () const |
void | setOutputMemType (InferMemType memType) |
InferMemType | outputMemType () const |
void | setOutputDevId (int64_t devId) |
int64_t | outputDevId () const |
std::vector< TritonClassParams > | getClassifyParams () |
const std::string & | model () const |
int64_t | version () const |
void | setOutputPoolSize (int size) |
Helper function to access the member variables. More... | |
int | outputPoolSize () const |
void | setOutputMemType (InferMemType memType) |
InferMemType | outputMemType () const |
void | setOutputDevId (int64_t devId) |
int64_t | outputDevId () const |
std::vector< TritonClassParams > | getClassifyParams () |
const std::string & | model () const |
int64_t | version () const |
void | setOutputPoolSize (int size) |
Helper function to access the member variables. More... | |
int | outputPoolSize () const |
void | setOutputMemType (InferMemType memType) |
InferMemType | outputMemType () const |
void | setOutputDevId (int64_t devId) |
int64_t | outputDevId () const |
std::vector< TritonClassParams > | getClassifyParams () |
const std::string & | model () const |
int64_t | version () const |
Protected Types | |
enum | { kName, kGpuId, kMemType } |
Tuple keys as <tensor-name, gpu-id, memType> More... | |
using | AsyncDone = std::function< void(NvDsInferStatus, SharedBatchArray)> |
Asynchronous inference done function: AsyncDone(Status, outputs). More... | |
using | PoolKey = std::tuple< std::string, int64_t, InferMemType > |
Tuple holding tensor name, GPU ID, memory type. More... | |
using | PoolValue = SharedBufPool< UniqSysMem > |
The buffer pool for the specified tensor, GPU and memory type combination. More... | |
using | ReorderItemPtr = std::shared_ptr< ReorderItem > |
using | LayerIdxMap = std::unordered_map< std::string, int > |
Map of layer name to layer index. More... | |
Protected Member Functions | |
NvDsInferStatus | specifyInputDims (const InputShapes &shapes) override |
NvDsInferStatus | enqueue (SharedBatchArray inputs, SharedCuStream stream, InputsConsumed bufConsumed, InferenceDone inferenceDone) override |
void | requestTritonOutputNames (std::set< std::string > &names) override |
virtual NvDsInferStatus | ensureServerReady () |
Check that the Triton inference server is live. More... | |
virtual NvDsInferStatus | ensureModelReady () |
Check that the model is ready, load the model if it is not. More... | |
NvDsInferStatus | setupReorderThread () |
Create a loop thread that calls inferenceDoneReorderLoop on the queued items. More... | |
void | setAllocator (UniqTritonAllocator allocator) |
Set the output tensor allocator. More... | |
virtual NvDsInferStatus | setupLayersInfo () |
Get the model configuration from the server and populate layer information. More... | |
TrtServerPtr & | server () |
Get the Triton server handle. More... | |
virtual NvDsInferStatus | Run (SharedBatchArray inputs, InputsConsumed bufConsumed, AsyncDone asyncDone) |
Create an inference request and trigger asynchronous inference. More... | |
NvDsInferStatus | fixateDims (const SharedBatchArray &bufs) |
Extend the dimensions to include batch size for the buffers in input array. More... | |
SharedSysMem | allocateResponseBuf (const std::string &tensor, size_t bytes, InferMemType memType, int64_t devId) |
Acquire a buffer from the output buffer pool associated with the device ID and memory type. More... | |
void | releaseResponseBuf (const std::string &tensor, SharedSysMem mem) |
Release the output tensor buffer. More... | |
NvDsInferStatus | ensureInputs (SharedBatchArray &inputs) |
Ensure that the array of input buffers are expected by the model and reshape the input buffers if required. More... | |
PoolValue | findResponsePool (PoolKey &key) |
Find the buffer pool for the given key. More... | |
PoolValue | createResponsePool (PoolKey &key, size_t bytes) |
Create a new buffer pool for the key. More... | |
void | serverInferCompleted (std::shared_ptr< TrtServerRequest > request, std::unique_ptr< TrtServerResponse > uniqResponse, InputsConsumed inputsConsumed, AsyncDone asyncDone) |
Call the inputs consumed function and parse the inference response to form the array of output batch buffers and call asyncDone on it. More... | |
bool | inferenceDoneReorderLoop (ReorderItemPtr item) |
Add input buffers to the output buffer list if required. More... | |
bool | debatchingOutput (SharedBatchArray &outputs, SharedBatchArray &inputs) |
Separate the batch dimension from the output buffer descriptors. More... | |
void | resetLayers (LayerDescriptionList layers, int inputSize) |
Set the layer description list of the backend. More... | |
LayerDescription * | mutableLayerInfo (const std::string &bindingName) |
Get the mutable layer description structure for the layer name. More... | |
void | setInputTensorOrder (InferTensorOrder order) |
Set the tensor order for the input layers. More... | |
bool | needKeepInputs () const |
Check if the keep input flag is set. More... | |
void | setMaxBatchSize (uint32_t size) |
Set the maximum batch size to be used for the backend. More... | |
|
protectedinherited |
Asynchronous inference done function: AsyncDone(Status, outputs).
Definition at line 169 of file infer_trtis_backend.h.
|
inherited |
Function wrapper for post inference processing.
Definition at line 66 of file infer_ibackend.h.
|
inherited |
Function wrapper called after the input buffer is consumed.
Definition at line 70 of file infer_ibackend.h.
|
inherited |
Definition at line 84 of file infer_ibackend.h.
|
inherited |
Tuple of layer name and dimensions including batch size.
Definition at line 83 of file infer_ibackend.h.
|
protectedinherited |
Map of layer name to layer index.
Definition at line 136 of file infer_base_backend.h.
|
inherited |
Tuple containing pointer to layer descriptions and the number of layers.
Definition at line 77 of file infer_ibackend.h.
|
protectedinherited |
Tuple holding tensor name, GPU ID, memory type.
Definition at line 224 of file infer_trtis_backend.h.
|
protectedinherited |
The buffer pool for the specified tensor, GPU and memory type combination.
Definition at line 229 of file infer_trtis_backend.h.
|
protectedinherited |
Definition at line 293 of file infer_trtis_backend.h.
|
protectedinherited |
Tuple keys as <tensor-name, gpu-id, memType>
Enumerator | |
---|---|
kName | |
kGpuId | |
kMemType |
Definition at line 220 of file infer_trtis_backend.h.
|
inherited |
Enumerator | |
---|---|
kLTpLayerDesc | |
kTpLayerNum |
Definition at line 72 of file infer_ibackend.h.
|
inherited |
Enumerator | |
---|---|
kInShapeName | |
kInShapeDims |
Definition at line 79 of file infer_ibackend.h.
nvdsinferserver::TritonSimpleRuntime::TritonSimpleRuntime | ( | std::string | model, |
int64_t | version | ||
) |
|
override |
|
inlineinherited |
Add Triton Classification parameters to the list.
Definition at line 58 of file infer_trtis_backend.h.
|
inlineinherited |
Returns the list of all descriptions of all layers, input and output.
Definition at line 113 of file infer_base_backend.h.
|
protectedinherited |
Acquire a buffer from the output buffer pool associated with the device ID and memory type.
Create the pool if it doesn't exist.
[in] | tensor | Name of the output tensor. |
[in] | bytes | Buffer size. |
[in] | memType | Requested memory type. |
[in] | devId | Device ID for the allocation. |
|
inherited |
Check that the list of input shapes have fixed dimensions and corresponding layers are marked as input layers.
|
protectedinherited |
Create a new buffer pool for the key.
[in] | key | The pool key combination. |
[in] | bytes | Size of the requested buffer. |
|
protectedinherited |
Separate the batch dimension from the output buffer descriptors.
[in] | outputs | Array of output batch buffers. |
[in] | inputs | Array of input batch buffers. |
|
overrideprotectedvirtual |
Reimplemented from nvdsinferserver::TrtISBackend.
|
protectedinherited |
Ensure that the array of input buffers are expected by the model and reshape the input buffers if required.
inputs | Array of input batch buffers. |
|
protectedvirtualinherited |
Check that the model is ready, load the model if it is not.
Reimplemented in nvdsinferserver::TritonGrpcBackend.
|
protectedvirtualinherited |
Check that the Triton inference server is live.
Reimplemented in nvdsinferserver::TritonGrpcBackend.
Find the buffer pool for the given key.
|
protectedinherited |
Extend the dimensions to include batch size for the buffers in input array.
Do nothing if batch input is not required.
|
inlineinherited |
Definition at line 71 of file infer_trtis_backend.h.
|
finalvirtualinherited |
Get the LayersTuple for input layers.
Implements nvdsinferserver::IBackend.
|
inlinefinalvirtualinherited |
Returns the number of input layers for the model.
Implements nvdsinferserver::IBackend.
Definition at line 83 of file infer_base_backend.h.
|
inlinefinalvirtualinherited |
Returns the input tensor order.
Implements nvdsinferserver::IBackend.
Definition at line 49 of file infer_base_backend.h.
|
finalvirtualinherited |
Retrieve the layer information from the layer name.
Implements nvdsinferserver::IBackend.
Referenced by nvdsinferserver::BaseBackend::mutableLayerInfo().
|
inlinefinalvirtualinherited |
Returns the total number of layers (input + output) for the model.
Implements nvdsinferserver::IBackend.
Definition at line 75 of file infer_base_backend.h.
|
finalvirtualinherited |
Get the LayersTuple for output layers.
Implements nvdsinferserver::IBackend.
|
protectedinherited |
Add input buffers to the output buffer list if required.
De-batch and run inference done callback.
[in] | item | The reorder task. |
|
overridevirtual |
Reimplemented from nvdsinferserver::TrtISBackend.
|
inlinefinalvirtualinherited |
Returns boolean indicating if batched input is expected.
Implements nvdsinferserver::IBackend.
Definition at line 69 of file infer_base_backend.h.
|
inlineinherited |
Checks if the batch size indicates batched processing or no.
Definition at line 130 of file infer_base_backend.h.
References INFER_EXPORT_API::isNonBatch(), and nvdsinferserver::BaseBackend::maxBatchSize().
|
inlinefinalvirtualinherited |
Returns the maximum batch size set for the backend.
Implements nvdsinferserver::IBackend.
Definition at line 125 of file infer_base_backend.h.
Referenced by nvdsinferserver::BaseBackend::isNonBatching().
|
inlineinherited |
Definition at line 73 of file infer_trtis_backend.h.
|
inlineprotectedinherited |
Get the mutable layer description structure for the layer name.
Definition at line 153 of file infer_base_backend.h.
References nvdsinferserver::BaseBackend::getLayerInfo().
|
inlineprotectedinherited |
Check if the keep input flag is set.
Definition at line 167 of file infer_base_backend.h.
|
inlineinherited |
Definition at line 70 of file infer_trtis_backend.h.
|
inlineinherited |
Definition at line 68 of file infer_trtis_backend.h.
|
inlineinherited |
Definition at line 66 of file infer_trtis_backend.h.
|
protectedinherited |
Release the output tensor buffer.
[in] | tensor | Name of the output tensor. |
[in] | mem | Pointer to the memory buffer. |
|
overrideprotectedvirtual |
Reimplemented from nvdsinferserver::TrtISBackend.
|
protectedinherited |
Set the layer description list of the backend.
This function sets the layer description for the backend and updates the number of input layers, layer name to index map.
[in] | layers | The list of descriptions for all layers, input followed by output layers. |
[in] | inputSize | The number of input layers in the list. |
|
protectedvirtualinherited |
Create an inference request and trigger asynchronous inference.
serverInferCompleted() is set as callback function that in turn calls asyncDone.
[in] | inputs | Array of input batch buffers. |
[in] | bufConsumed | Callback function for releasing input buffer. |
[in] | asyncDone | Callback function for processing response . |
Reimplemented in nvdsinferserver::TritonGrpcBackend.
|
inlineprotectedinherited |
Get the Triton server handle.
Definition at line 164 of file infer_trtis_backend.h.
|
protectedinherited |
Call the inputs consumed function and parse the inference response to form the array of output batch buffers and call asyncDone on it.
[in] | request | Pointer to the inference request. |
[in] | uniqResponse | Pointer to the inference response from the server. |
[in] | inputsConsumed | Callback function for releasing input buffer. |
[in] | asyncDone | Callback function for processing response . |
|
inlineprotectedinherited |
Set the output tensor allocator.
Definition at line 148 of file infer_trtis_backend.h.
|
inlineinherited |
Set the flag indicating that it is a batch input.
Definition at line 64 of file infer_base_backend.h.
|
inlineprotectedinherited |
Set the tensor order for the input layers.
Definition at line 162 of file infer_base_backend.h.
|
inlineinherited |
Set the flag indicating whether to keep inputs buffers.
Definition at line 118 of file infer_base_backend.h.
|
inlineprotectedinherited |
Set the maximum batch size to be used for the backend.
Definition at line 174 of file infer_base_backend.h.
|
inlineinherited |
Definition at line 69 of file infer_trtis_backend.h.
|
inlineinherited |
Definition at line 67 of file infer_trtis_backend.h.
|
inlineinherited |
Helper function to access the member variables.
Definition at line 65 of file infer_trtis_backend.h.
|
inline |
Definition at line 26 of file infer_simple_runtime.h.
|
inlineinherited |
Set the maximum size for the tensor, the maximum of the existing size and new input size is used.
The size is rounded up to INFER_MEM_ALIGNMENT bytes.
name | Name of the tensor. |
maxBytes | New maximum number of bytes for the buffer. |
Definition at line 110 of file infer_trtis_backend.h.
References INFER_MEM_ALIGNMENT, and INFER_ROUND_UP.
|
inlineinherited |
Set the unique ID for the object instance.
Definition at line 54 of file infer_base_backend.h.
|
protectedvirtualinherited |
Get the model configuration from the server and populate layer information.
Set maximum batch size as specified in configuration settings.
Reimplemented in nvdsinferserver::TritonGrpcBackend.
|
protectedinherited |
Create a loop thread that calls inferenceDoneReorderLoop on the queued items.
|
overrideprotectedvirtual |
Reimplemented from nvdsinferserver::TrtISBackend.
|
inlineinherited |
Get the unique ID of the object instance.
Definition at line 59 of file infer_base_backend.h.
|
inlineinherited |
Definition at line 74 of file infer_trtis_backend.h.