Definition at line 20 of file infer_simple_runtime.h.


Public Member Functions | |
| TritonSimpleRuntime (std::string model, int64_t version) | |
| ~TritonSimpleRuntime () override | |
| void | setOutputs (const std::set< std::string > &names) |
| NvDsInferStatus | initialize () override |
| void | addClassifyParams (const TritonClassParams &c) |
| Add Triton Classification parameters to the list. More... | |
| void | setTensorMaxBytes (const std::string &name, size_t maxBytes) |
| Set the maximum size for the tensor, the maximum of the existing size and new input size is used. More... | |
| void | setOutputPoolSize (int size) |
| Helper function to access the member variables. More... | |
| int | outputPoolSize () const |
| void | setOutputMemType (InferMemType memType) |
| InferMemType | outputMemType () const |
| void | setOutputDevId (int64_t devId) |
| int64_t | outputDevId () const |
| std::vector< TritonClassParams > | getClassifyParams () |
| const std::string & | model () const |
| int64_t | version () const |
| void | setOutputPoolSize (int size) |
| Helper function to access the member variables. More... | |
| int | outputPoolSize () const |
| void | setOutputMemType (InferMemType memType) |
| InferMemType | outputMemType () const |
| void | setOutputDevId (int64_t devId) |
| int64_t | outputDevId () const |
| std::vector< TritonClassParams > | getClassifyParams () |
| const std::string & | model () const |
| int64_t | version () const |
| void | setOutputPoolSize (int size) |
| Helper function to access the member variables. More... | |
| int | outputPoolSize () const |
| void | setOutputMemType (InferMemType memType) |
| InferMemType | outputMemType () const |
| void | setOutputDevId (int64_t devId) |
| int64_t | outputDevId () const |
| std::vector< TritonClassParams > | getClassifyParams () |
| const std::string & | model () const |
| int64_t | version () const |
Protected Types | |
| enum | { kName, kGpuId, kMemType } |
| Tuple keys as <tensor-name, gpu-id, memType> More... | |
| using | AsyncDone = std::function< void(NvDsInferStatus, SharedBatchArray)> |
| Asynchronous inference done function: AsyncDone(Status, outputs). More... | |
| using | PoolKey = std::tuple< std::string, int64_t, InferMemType > |
| Tuple holding tensor name, GPU ID, memory type. More... | |
| using | PoolValue = SharedBufPool< UniqSysMem > |
| The buffer pool for the specified tensor, GPU and memory type combination. More... | |
| using | ReorderItemPtr = std::shared_ptr< ReorderItem > |
Protected Member Functions | |
| NvDsInferStatus | specifyInputDims (const InputShapes &shapes) override |
| NvDsInferStatus | enqueue (SharedBatchArray inputs, SharedCuStream stream, InputsConsumed bufConsumed, InferenceDone inferenceDone) override |
| void | requestTritonOutputNames (std::set< std::string > &names) override |
| virtual NvDsInferStatus | ensureServerReady () |
| Check that the Triton inference server is live. More... | |
| virtual NvDsInferStatus | ensureModelReady () |
| Check that the model is ready, load the model if it is not. More... | |
| NvDsInferStatus | setupReorderThread () |
| Create a loop thread that calls inferenceDoneReorderLoop on the queued items. More... | |
| void | setAllocator (UniqTritonAllocator allocator) |
| Set the output tensor allocator. More... | |
| virtual NvDsInferStatus | setupLayersInfo () |
| Get the model configuration from the server and populate layer information. More... | |
| TrtServerPtr & | server () |
| Get the Triton server handle. More... | |
| virtual NvDsInferStatus | Run (SharedBatchArray inputs, InputsConsumed bufConsumed, AsyncDone asyncDone) |
| Create an inference request and trigger asynchronous inference. More... | |
| NvDsInferStatus | fixateDims (const SharedBatchArray &bufs) |
| Extend the dimensions to include batch size for the buffers in input array. More... | |
| SharedSysMem | allocateResponseBuf (const std::string &tensor, size_t bytes, InferMemType memType, int64_t devId) |
| Acquire a buffer from the output buffer pool associated with the device ID and memory type. More... | |
| void | releaseResponseBuf (const std::string &tensor, SharedSysMem mem) |
| Release the output tensor buffer. More... | |
| NvDsInferStatus | ensureInputs (SharedBatchArray &inputs) |
| Ensure that the array of input buffers are expected by the model and reshape the input buffers if required. More... | |
| PoolValue | findResponsePool (PoolKey &key) |
| Find the buffer pool for the given key. More... | |
| PoolValue | createResponsePool (PoolKey &key, size_t bytes) |
| Create a new buffer pool for the key. More... | |
| void | serverInferCompleted (std::shared_ptr< TrtServerRequest > request, std::unique_ptr< TrtServerResponse > uniqResponse, InputsConsumed inputsConsumed, AsyncDone asyncDone) |
| Call the inputs consumed function and parse the inference response to form the array of output batch buffers and call asyncDone on it. More... | |
| bool | inferenceDoneReorderLoop (ReorderItemPtr item) |
| Add input buffers to the output buffer list if required. More... | |
| bool | debatchingOutput (SharedBatchArray &outputs, SharedBatchArray &inputs) |
| Separate the batch dimension from the output buffer descriptors. More... | |
|
protectedinherited |
Asynchronous inference done function: AsyncDone(Status, outputs).
Definition at line 169 of file infer_trtis_backend.h.
|
protectedinherited |
Tuple holding tensor name, GPU ID, memory type.
Definition at line 224 of file infer_trtis_backend.h.
|
protectedinherited |
The buffer pool for the specified tensor, GPU and memory type combination.
Definition at line 229 of file infer_trtis_backend.h.
|
protectedinherited |
Definition at line 293 of file infer_trtis_backend.h.
|
protectedinherited |
Tuple keys as <tensor-name, gpu-id, memType>
| Enumerator | |
|---|---|
| kName | |
| kGpuId | |
| kMemType | |
Definition at line 220 of file infer_trtis_backend.h.
| nvdsinferserver::TritonSimpleRuntime::TritonSimpleRuntime | ( | std::string | model, |
| int64_t | version | ||
| ) |
|
override |
|
inlineinherited |
Add Triton Classification parameters to the list.
Definition at line 58 of file infer_trtis_backend.h.
|
protectedinherited |
Acquire a buffer from the output buffer pool associated with the device ID and memory type.
Create the pool if it doesn't exist.
| [in] | tensor | Name of the output tensor. |
| [in] | bytes | Buffer size. |
| [in] | memType | Requested memory type. |
| [in] | devId | Device ID for the allocation. |
|
protectedinherited |
Create a new buffer pool for the key.
| [in] | key | The pool key combination. |
| [in] | bytes | Size of the requested buffer. |
|
protectedinherited |
Separate the batch dimension from the output buffer descriptors.
| [in] | outputs | Array of output batch buffers. |
| [in] | inputs | Array of input batch buffers. |
|
overrideprotected |
|
protectedinherited |
Ensure that the array of input buffers are expected by the model and reshape the input buffers if required.
| inputs | Array of input batch buffers. |
|
protectedvirtualinherited |
Check that the model is ready, load the model if it is not.
Reimplemented in nvdsinferserver::TritonGrpcBackend.
|
protectedvirtualinherited |
Check that the Triton inference server is live.
Reimplemented in nvdsinferserver::TritonGrpcBackend.
Find the buffer pool for the given key.
|
protectedinherited |
Extend the dimensions to include batch size for the buffers in input array.
Do nothing if batch input is not required.
|
inlineinherited |
Definition at line 71 of file infer_trtis_backend.h.
|
protectedinherited |
Add input buffers to the output buffer list if required.
De-batch and run inference done callback.
| [in] | item | The reorder task. |
|
override |
|
inlineinherited |
Definition at line 73 of file infer_trtis_backend.h.
|
inlineinherited |
Definition at line 70 of file infer_trtis_backend.h.
|
inlineinherited |
Definition at line 68 of file infer_trtis_backend.h.
|
inlineinherited |
Definition at line 66 of file infer_trtis_backend.h.
|
protectedinherited |
Release the output tensor buffer.
| [in] | tensor | Name of the output tensor. |
| [in] | mem | Pointer to the memory buffer. |
|
overrideprotectedvirtual |
Reimplemented from nvdsinferserver::TrtISBackend.
|
protectedvirtualinherited |
Create an inference request and trigger asynchronous inference.
serverInferCompleted() is set as callback function that in turn calls asyncDone.
| [in] | inputs | Array of input batch buffers. |
| [in] | bufConsumed | Callback function for releasing input buffer. |
| [in] | asyncDone | Callback function for processing response . |
Reimplemented in nvdsinferserver::TritonGrpcBackend.
|
inlineprotectedinherited |
Get the Triton server handle.
Definition at line 164 of file infer_trtis_backend.h.
|
protectedinherited |
Call the inputs consumed function and parse the inference response to form the array of output batch buffers and call asyncDone on it.
| [in] | request | Pointer to the inference request. |
| [in] | uniqResponse | Pointer to the inference response from the server. |
| [in] | inputsConsumed | Callback function for releasing input buffer. |
| [in] | asyncDone | Callback function for processing response . |
|
inlineprotectedinherited |
Set the output tensor allocator.
Definition at line 148 of file infer_trtis_backend.h.
|
inlineinherited |
Definition at line 69 of file infer_trtis_backend.h.
|
inlineinherited |
Definition at line 67 of file infer_trtis_backend.h.
|
inlineinherited |
Helper function to access the member variables.
Definition at line 65 of file infer_trtis_backend.h.
|
inline |
Definition at line 25 of file infer_simple_runtime.h.
|
inlineinherited |
Set the maximum size for the tensor, the maximum of the existing size and new input size is used.
The size is rounded up to INFER_MEM_ALIGNMENT bytes.
| name | Name of the tensor. |
| maxBytes | New maximum number of bytes for the buffer. |
Definition at line 110 of file infer_trtis_backend.h.
References INFER_MEM_ALIGNMENT, and INFER_ROUND_UP.
|
protectedvirtualinherited |
Get the model configuration from the server and populate layer information.
Set maximum batch size as specified in configuration settings.
Reimplemented in nvdsinferserver::TritonGrpcBackend.
|
protectedinherited |
Create a loop thread that calls inferenceDoneReorderLoop on the queued items.
|
overrideprotected |
|
inlineinherited |
Definition at line 74 of file infer_trtis_backend.h.