Triton gRPC mode backend processing class.
Definition at line 34 of file infer_grpc_backend.h.
Public Member Functions | |
TritonGrpcBackend (std::string model, int64_t version) | |
~TritonGrpcBackend () override | |
void | setOutputs (const std::set< std::string > &names) |
void | setUrl (const std::string &url) |
void | setEnableCudaBufferSharing (const bool enableSharing) |
NvDsInferStatus | initialize () override |
void | addClassifyParams (const TritonClassParams &c) |
Add Triton Classification parameters to the list. More... | |
NvDsInferStatus | specifyInputDims (const InputShapes &shapes) override |
Specify the input layers for the backend. More... | |
void | setTensorMaxBytes (const std::string &name, size_t maxBytes) |
Set the maximum size for the tensor, the maximum of the existing size and new input size is used. More... | |
Protected Types | |
enum | { kName, kGpuId, kMemType } |
Tuple keys as <tensor-name, gpu-id, memType> More... | |
using | AsyncDone = std::function< void(NvDsInferStatus, SharedBatchArray)> |
Asynchronous inference done function: AsyncDone(Status, outputs). More... | |
using | PoolKey = std::tuple< std::string, int64_t, InferMemType > |
Tuple holding tensor name, GPU ID, memory type. More... | |
using | PoolValue = SharedBufPool< UniqSysMem > |
The buffer pool for the specified tensor, GPU and memory type combination. More... | |
using | ReorderItemPtr = std::shared_ptr< ReorderItem > |
Protected Member Functions | |
NvDsInferStatus | enqueue (SharedBatchArray inputs, SharedCuStream stream, InputsConsumed bufConsumed, InferenceDone inferenceDone) override |
void | requestTritonOutputNames (std::set< std::string > &names) override |
NvDsInferStatus | ensureServerReady () override |
NvDsInferStatus | ensureModelReady () override |
NvDsInferStatus | setupLayersInfo () override |
NvDsInferStatus | Run (SharedBatchArray inputs, InputsConsumed bufConsumed, AsyncDone asyncDone) override |
NvDsInferStatus | setupReorderThread () |
Create a loop thread that calls inferenceDoneReorderLoop on the queued items. More... | |
void | setAllocator (UniqTritonAllocator allocator) |
Set the output tensor allocator. More... | |
TrtServerPtr & | server () |
Get the Triton server handle. More... | |
NvDsInferStatus | fixateDims (const SharedBatchArray &bufs) |
Extend the dimensions to include batch size for the buffers in input array. More... | |
SharedSysMem | allocateResponseBuf (const std::string &tensor, size_t bytes, InferMemType memType, int64_t devId) |
Acquire a buffer from the output buffer pool associated with the device ID and memory type. More... | |
void | releaseResponseBuf (const std::string &tensor, SharedSysMem mem) |
Release the output tensor buffer. More... | |
NvDsInferStatus | ensureInputs (SharedBatchArray &inputs) |
Ensure that the array of input buffers are expected by the model and reshape the input buffers if required. More... | |
PoolValue | findResponsePool (PoolKey &key) |
Find the buffer pool for the given key. More... | |
PoolValue | createResponsePool (PoolKey &key, size_t bytes) |
Create a new buffer pool for the key. More... | |
void | serverInferCompleted (std::shared_ptr< TrtServerRequest > request, std::unique_ptr< TrtServerResponse > uniqResponse, InputsConsumed inputsConsumed, AsyncDone asyncDone) |
Call the inputs consumed function and parse the inference response to form the array of output batch buffers and call asyncDone on it. More... | |
bool | inferenceDoneReorderLoop (ReorderItemPtr item) |
Add input buffers to the output buffer list if required. More... | |
bool | debatchingOutput (SharedBatchArray &outputs, SharedBatchArray &inputs) |
Separate the batch dimension from the output buffer descriptors. More... | |
|
protectedinherited |
Asynchronous inference done function: AsyncDone(Status, outputs).
Definition at line 169 of file infer_trtis_backend.h.
|
protectedinherited |
Tuple holding tensor name, GPU ID, memory type.
Definition at line 224 of file infer_trtis_backend.h.
|
protectedinherited |
The buffer pool for the specified tensor, GPU and memory type combination.
Definition at line 229 of file infer_trtis_backend.h.
|
protectedinherited |
Definition at line 293 of file infer_trtis_backend.h.
|
protectedinherited |
Tuple keys as <tensor-name, gpu-id, memType>
Enumerator | |
---|---|
kName | |
kGpuId | |
kMemType |
Definition at line 220 of file infer_trtis_backend.h.
nvdsinferserver::TritonGrpcBackend::TritonGrpcBackend | ( | std::string | model, |
int64_t | version | ||
) |
|
override |
|
inlineinherited |
Add Triton Classification parameters to the list.
Definition at line 58 of file infer_trtis_backend.h.
|
protectedinherited |
Acquire a buffer from the output buffer pool associated with the device ID and memory type.
Create the pool if it doesn't exist.
[in] | tensor | Name of the output tensor. |
[in] | bytes | Buffer size. |
[in] | memType | Requested memory type. |
[in] | devId | Device ID for the allocation. |
|
protectedinherited |
Create a new buffer pool for the key.
[in] | key | The pool key combination. |
[in] | bytes | Size of the requested buffer. |
|
protectedinherited |
Separate the batch dimension from the output buffer descriptors.
[in] | outputs | Array of output batch buffers. |
[in] | inputs | Array of input batch buffers. |
|
overrideprotected |
|
protectedinherited |
Ensure that the array of input buffers are expected by the model and reshape the input buffers if required.
inputs | Array of input batch buffers. |
|
overrideprotectedvirtual |
Reimplemented from nvdsinferserver::TrtISBackend.
|
overrideprotectedvirtual |
Reimplemented from nvdsinferserver::TrtISBackend.
Find the buffer pool for the given key.
|
protectedinherited |
Extend the dimensions to include batch size for the buffers in input array.
Do nothing if batch input is not required.
|
inlineinherited |
Definition at line 71 of file infer_trtis_backend.h.
|
protectedinherited |
Add input buffers to the output buffer list if required.
De-batch and run inference done callback.
[in] | item | The reorder task. |
|
override |
|
inlineinherited |
Definition at line 73 of file infer_trtis_backend.h.
|
inlineinherited |
Definition at line 70 of file infer_trtis_backend.h.
|
inlineinherited |
Definition at line 68 of file infer_trtis_backend.h.
|
inlineinherited |
Definition at line 66 of file infer_trtis_backend.h.
|
protectedinherited |
Release the output tensor buffer.
[in] | tensor | Name of the output tensor. |
[in] | mem | Pointer to the memory buffer. |
|
overrideprotectedvirtual |
Reimplemented from nvdsinferserver::TrtISBackend.
|
overrideprotectedvirtual |
Reimplemented from nvdsinferserver::TrtISBackend.
|
inlineprotectedinherited |
Get the Triton server handle.
Definition at line 164 of file infer_trtis_backend.h.
|
protectedinherited |
Call the inputs consumed function and parse the inference response to form the array of output batch buffers and call asyncDone on it.
[in] | request | Pointer to the inference request. |
[in] | uniqResponse | Pointer to the inference response from the server. |
[in] | inputsConsumed | Callback function for releasing input buffer. |
[in] | asyncDone | Callback function for processing response . |
|
inlineprotectedinherited |
Set the output tensor allocator.
Definition at line 148 of file infer_trtis_backend.h.
|
inline |
Definition at line 43 of file infer_grpc_backend.h.
|
inlineinherited |
Definition at line 69 of file infer_trtis_backend.h.
|
inlineinherited |
Definition at line 67 of file infer_trtis_backend.h.
|
inlineinherited |
Helper function to access the member variables.
Definition at line 65 of file infer_trtis_backend.h.
|
inline |
Definition at line 39 of file infer_grpc_backend.h.
|
inlineinherited |
Set the maximum size for the tensor, the maximum of the existing size and new input size is used.
The size is rounded up to INFER_MEM_ALIGNMENT bytes.
name | Name of the tensor. |
maxBytes | New maximum number of bytes for the buffer. |
Definition at line 110 of file infer_trtis_backend.h.
References INFER_MEM_ALIGNMENT, and INFER_ROUND_UP.
|
overrideprotectedvirtual |
Reimplemented from nvdsinferserver::TrtISBackend.
|
protectedinherited |
Create a loop thread that calls inferenceDoneReorderLoop on the queued items.
|
inline |
Definition at line 42 of file infer_grpc_backend.h.
|
overrideinherited |
Specify the input layers for the backend.
shapes | List of name and shapes of the input layers. |
|
inlineinherited |
Definition at line 74 of file infer_trtis_backend.h.