Triton backend processing class.
Definition at line 39 of file infer_trtis_backend.h.


Data Structures | |
| struct | ReorderItem |
| Reorder thread task. More... | |
Public Member Functions | |
| TrtISBackend (const std::string &name, int64_t version, TrtServerPtr ptr=nullptr) | |
| Constructor. More... | |
| ~TrtISBackend () override | |
| Destructor. More... | |
| void | addClassifyParams (const TritonClassParams &c) |
| Add Triton Classification parameters to the list. More... | |
| NvDsInferStatus | initialize () override |
| Check that the server and model is ready, get the information of layers, setup reorder thread and output tensor allocator. More... | |
| NvDsInferStatus | specifyInputDims (const InputShapes &shapes) override |
| Specify the input layers for the backend. More... | |
| NvDsInferStatus | enqueue (SharedBatchArray inputs, SharedCuStream stream, InputsConsumed bufConsumed, InferenceDone inferenceDone) override |
| Enqueue an input for inference request by calling Run() and adding corresponding task to the reorder thread queue. More... | |
| void | setTensorMaxBytes (const std::string &name, size_t maxBytes) |
| Set the maximum size for the tensor, the maximum of the existing size and new input size is used. More... | |
Protected Types | |
| enum | { kName, kGpuId, kMemType } |
| Tuple keys as <tensor-name, gpu-id, memType> More... | |
| using | AsyncDone = std::function< void(NvDsInferStatus, SharedBatchArray)> |
| Asynchronous inference done function: AsyncDone(Status, outputs). More... | |
| using | PoolKey = std::tuple< std::string, int64_t, InferMemType > |
| Tuple holding tensor name, GPU ID, memory type. More... | |
| using | PoolValue = SharedBufPool< UniqSysMem > |
| The buffer pool for the specified tensor, GPU and memory type combination. More... | |
| using | ReorderItemPtr = std::shared_ptr< ReorderItem > |
Protected Member Functions | |
| virtual void | requestTritonOutputNames (std::set< std::string > &outNames) |
| Get the list of output tensor names. More... | |
| virtual NvDsInferStatus | ensureServerReady () |
| Check that the Triton inference server is live. More... | |
| virtual NvDsInferStatus | ensureModelReady () |
| Check that the model is ready, load the model if it is not. More... | |
| NvDsInferStatus | setupReorderThread () |
| Create a loop thread that calls inferenceDoneReorderLoop on the queued items. More... | |
| void | setAllocator (UniqTritonAllocator allocator) |
| Set the output tensor allocator. More... | |
| virtual NvDsInferStatus | setupLayersInfo () |
| Get the model configuration from the server and populate layer information. More... | |
| TrtServerPtr & | server () |
| Get the Triton server handle. More... | |
| virtual NvDsInferStatus | Run (SharedBatchArray inputs, InputsConsumed bufConsumed, AsyncDone asyncDone) |
| Create an inference request and trigger asynchronous inference. More... | |
| NvDsInferStatus | fixateDims (const SharedBatchArray &bufs) |
| Extend the dimensions to include batch size for the buffers in input array. More... | |
| SharedSysMem | allocateResponseBuf (const std::string &tensor, size_t bytes, InferMemType memType, int64_t devId) |
| Acquire a buffer from the output buffer pool associated with the device ID and memory type. More... | |
| void | releaseResponseBuf (const std::string &tensor, SharedSysMem mem) |
| Release the output tensor buffer. More... | |
| NvDsInferStatus | ensureInputs (SharedBatchArray &inputs) |
| Ensure that the array of input buffers are expected by the model and reshape the input buffers if required. More... | |
| PoolValue | findResponsePool (PoolKey &key) |
| Find the buffer pool for the given key. More... | |
| PoolValue | createResponsePool (PoolKey &key, size_t bytes) |
| Create a new buffer pool for the key. More... | |
| void | serverInferCompleted (std::shared_ptr< TrtServerRequest > request, std::unique_ptr< TrtServerResponse > uniqResponse, InputsConsumed inputsConsumed, AsyncDone asyncDone) |
| Call the inputs consumed function and parse the inference response to form the array of output batch buffers and call asyncDone on it. More... | |
| bool | inferenceDoneReorderLoop (ReorderItemPtr item) |
| Add input buffers to the output buffer list if required. More... | |
| bool | debatchingOutput (SharedBatchArray &outputs, SharedBatchArray &inputs) |
| Separate the batch dimension from the output buffer descriptors. More... | |
|
protected |
Asynchronous inference done function: AsyncDone(Status, outputs).
Definition at line 169 of file infer_trtis_backend.h.
|
protected |
Tuple holding tensor name, GPU ID, memory type.
Definition at line 224 of file infer_trtis_backend.h.
|
protected |
The buffer pool for the specified tensor, GPU and memory type combination.
Definition at line 229 of file infer_trtis_backend.h.
|
protected |
Definition at line 293 of file infer_trtis_backend.h.
|
protected |
Tuple keys as <tensor-name, gpu-id, memType>
| Enumerator | |
|---|---|
| kName | |
| kGpuId | |
| kMemType | |
Definition at line 220 of file infer_trtis_backend.h.
| nvdsinferserver::TrtISBackend::TrtISBackend | ( | const std::string & | name, |
| int64_t | version, | ||
| TrtServerPtr | ptr = nullptr |
||
| ) |
Constructor.
Save the model name, version and server handle.
| [in] | name | Model name. |
| [in] | version | Model version. |
| [in] | ptr | Handle to Triton server class instance. |
|
override |
Destructor.
Unload the model if needed.
|
inline |
Add Triton Classification parameters to the list.
Definition at line 58 of file infer_trtis_backend.h.
|
protected |
Acquire a buffer from the output buffer pool associated with the device ID and memory type.
Create the pool if it doesn't exist.
| [in] | tensor | Name of the output tensor. |
| [in] | bytes | Buffer size. |
| [in] | memType | Requested memory type. |
| [in] | devId | Device ID for the allocation. |
|
protected |
Create a new buffer pool for the key.
| [in] | key | The pool key combination. |
| [in] | bytes | Size of the requested buffer. |
|
protected |
Separate the batch dimension from the output buffer descriptors.
| [in] | outputs | Array of output batch buffers. |
| [in] | inputs | Array of input batch buffers. |
|
override |
Enqueue an input for inference request by calling Run() and adding corresponding task to the reorder thread queue.
| [in] | inputs | The array of input batch buffers. |
| [in] | stream | The CUDA stream to be used. |
| [in] | bufConsumed | Callback function for releasing input buffer. |
| [in] | inferenceDone | Callback function for processing result. |
|
protected |
Ensure that the array of input buffers are expected by the model and reshape the input buffers if required.
| inputs | Array of input batch buffers. |
|
protectedvirtual |
Check that the model is ready, load the model if it is not.
Reimplemented in nvdsinferserver::TritonGrpcBackend.
|
protectedvirtual |
Check that the Triton inference server is live.
Reimplemented in nvdsinferserver::TritonGrpcBackend.
Find the buffer pool for the given key.
|
protected |
Extend the dimensions to include batch size for the buffers in input array.
Do nothing if batch input is not required.
|
inline |
Definition at line 71 of file infer_trtis_backend.h.
|
protected |
Add input buffers to the output buffer list if required.
De-batch and run inference done callback.
| [in] | item | The reorder task. |
|
override |
Check that the server and model is ready, get the information of layers, setup reorder thread and output tensor allocator.
|
inline |
Definition at line 73 of file infer_trtis_backend.h.
|
inline |
Definition at line 70 of file infer_trtis_backend.h.
|
inline |
Definition at line 68 of file infer_trtis_backend.h.
|
inline |
Definition at line 66 of file infer_trtis_backend.h.
|
protected |
Release the output tensor buffer.
| [in] | tensor | Name of the output tensor. |
| [in] | mem | Pointer to the memory buffer. |
|
protectedvirtual |
Get the list of output tensor names.
| [out] | outNames | The set of strings to which the names are added. |
Reimplemented in nvdsinferserver::TritonGrpcBackend, and nvdsinferserver::TritonSimpleRuntime.
|
protectedvirtual |
Create an inference request and trigger asynchronous inference.
serverInferCompleted() is set as callback function that in turn calls asyncDone.
| [in] | inputs | Array of input batch buffers. |
| [in] | bufConsumed | Callback function for releasing input buffer. |
| [in] | asyncDone | Callback function for processing response . |
Reimplemented in nvdsinferserver::TritonGrpcBackend.
|
inlineprotected |
Get the Triton server handle.
Definition at line 164 of file infer_trtis_backend.h.
|
protected |
Call the inputs consumed function and parse the inference response to form the array of output batch buffers and call asyncDone on it.
| [in] | request | Pointer to the inference request. |
| [in] | uniqResponse | Pointer to the inference response from the server. |
| [in] | inputsConsumed | Callback function for releasing input buffer. |
| [in] | asyncDone | Callback function for processing response . |
|
inlineprotected |
Set the output tensor allocator.
Definition at line 148 of file infer_trtis_backend.h.
|
inline |
Definition at line 69 of file infer_trtis_backend.h.
|
inline |
Definition at line 67 of file infer_trtis_backend.h.
|
inline |
Helper function to access the member variables.
Definition at line 65 of file infer_trtis_backend.h.
|
inline |
Set the maximum size for the tensor, the maximum of the existing size and new input size is used.
The size is rounded up to INFER_MEM_ALIGNMENT bytes.
| name | Name of the tensor. |
| maxBytes | New maximum number of bytes for the buffer. |
Definition at line 110 of file infer_trtis_backend.h.
References INFER_MEM_ALIGNMENT, and INFER_ROUND_UP.
|
protectedvirtual |
Get the model configuration from the server and populate layer information.
Set maximum batch size as specified in configuration settings.
Reimplemented in nvdsinferserver::TritonGrpcBackend.
|
protected |
Create a loop thread that calls inferenceDoneReorderLoop on the queued items.
|
override |
Specify the input layers for the backend.
| shapes | List of name and shapes of the input layers. |
|
inline |
Definition at line 74 of file infer_trtis_backend.h.