NVIDIA DeepStream SDK API Reference

6.2 Release
nvdsinferserver::TrtISBackend Class Reference

Detailed Description

Triton backend processing class.

Definition at line 39 of file infer_trtis_backend.h.

Inheritance diagram for nvdsinferserver::TrtISBackend:
Collaboration diagram for nvdsinferserver::TrtISBackend:

Data Structures

struct  ReorderItem
 Reorder thread task. More...
 

Public Member Functions

 TrtISBackend (const std::string &name, int64_t version, TrtServerPtr ptr=nullptr)
 Constructor. More...
 
 ~TrtISBackend () override
 Destructor. More...
 
void addClassifyParams (const TritonClassParams &c)
 Add Triton Classification parameters to the list. More...
 
NvDsInferStatus initialize () override
 Check that the server and model is ready, get the information of layers, setup reorder thread and output tensor allocator. More...
 
NvDsInferStatus specifyInputDims (const InputShapes &shapes) override
 Specify the input layers for the backend. More...
 
NvDsInferStatus enqueue (SharedBatchArray inputs, SharedCuStream stream, InputsConsumed bufConsumed, InferenceDone inferenceDone) override
 Enqueue an input for inference request by calling Run() and adding corresponding task to the reorder thread queue. More...
 
void setTensorMaxBytes (const std::string &name, size_t maxBytes)
 Set the maximum size for the tensor, the maximum of the existing size and new input size is used. More...
 

Protected Types

enum  {
  kName,
  kGpuId,
  kMemType
}
 Tuple keys as <tensor-name, gpu-id, memType> More...
 
using AsyncDone = std::function< void(NvDsInferStatus, SharedBatchArray)>
 Asynchronous inference done function: AsyncDone(Status, outputs). More...
 
using PoolKey = std::tuple< std::string, int64_t, InferMemType >
 Tuple holding tensor name, GPU ID, memory type. More...
 
using PoolValue = SharedBufPool< UniqSysMem >
 The buffer pool for the specified tensor, GPU and memory type combination. More...
 
using ReorderItemPtr = std::shared_ptr< ReorderItem >
 

Protected Member Functions

virtual void requestTritonOutputNames (std::set< std::string > &outNames)
 Get the list of output tensor names. More...
 
virtual NvDsInferStatus ensureServerReady ()
 Check that the Triton inference server is live. More...
 
virtual NvDsInferStatus ensureModelReady ()
 Check that the model is ready, load the model if it is not. More...
 
NvDsInferStatus setupReorderThread ()
 Create a loop thread that calls inferenceDoneReorderLoop on the queued items. More...
 
void setAllocator (UniqTritonAllocator allocator)
 Set the output tensor allocator. More...
 
virtual NvDsInferStatus setupLayersInfo ()
 Get the model configuration from the server and populate layer information. More...
 
TrtServerPtrserver ()
 Get the Triton server handle. More...
 
virtual NvDsInferStatus Run (SharedBatchArray inputs, InputsConsumed bufConsumed, AsyncDone asyncDone)
 Create an inference request and trigger asynchronous inference. More...
 
NvDsInferStatus fixateDims (const SharedBatchArray &bufs)
 Extend the dimensions to include batch size for the buffers in input array. More...
 
SharedSysMem allocateResponseBuf (const std::string &tensor, size_t bytes, InferMemType memType, int64_t devId)
 Acquire a buffer from the output buffer pool associated with the device ID and memory type. More...
 
void releaseResponseBuf (const std::string &tensor, SharedSysMem mem)
 Release the output tensor buffer. More...
 
NvDsInferStatus ensureInputs (SharedBatchArray &inputs)
 Ensure that the array of input buffers are expected by the model and reshape the input buffers if required. More...
 
PoolValue findResponsePool (PoolKey &key)
 Find the buffer pool for the given key. More...
 
PoolValue createResponsePool (PoolKey &key, size_t bytes)
 Create a new buffer pool for the key. More...
 
void serverInferCompleted (std::shared_ptr< TrtServerRequest > request, std::unique_ptr< TrtServerResponse > uniqResponse, InputsConsumed inputsConsumed, AsyncDone asyncDone)
 Call the inputs consumed function and parse the inference response to form the array of output batch buffers and call asyncDone on it. More...
 
bool inferenceDoneReorderLoop (ReorderItemPtr item)
 Add input buffers to the output buffer list if required. More...
 
bool debatchingOutput (SharedBatchArray &outputs, SharedBatchArray &inputs)
 Separate the batch dimension from the output buffer descriptors. More...
 

Member Typedef Documentation

◆ AsyncDone

Asynchronous inference done function: AsyncDone(Status, outputs).

Definition at line 169 of file infer_trtis_backend.h.

◆ PoolKey

using nvdsinferserver::TrtISBackend::PoolKey = std::tuple<std::string, int64_t, InferMemType>
protected

Tuple holding tensor name, GPU ID, memory type.

Definition at line 224 of file infer_trtis_backend.h.

◆ PoolValue

using nvdsinferserver::TrtISBackend::PoolValue = SharedBufPool<UniqSysMem>
protected

The buffer pool for the specified tensor, GPU and memory type combination.

Definition at line 229 of file infer_trtis_backend.h.

◆ ReorderItemPtr

using nvdsinferserver::TrtISBackend::ReorderItemPtr = std::shared_ptr<ReorderItem>
protected

Definition at line 293 of file infer_trtis_backend.h.

Member Enumeration Documentation

◆ anonymous enum

anonymous enum
protected

Tuple keys as <tensor-name, gpu-id, memType>

Enumerator
kName 
kGpuId 
kMemType 

Definition at line 220 of file infer_trtis_backend.h.

Constructor & Destructor Documentation

◆ TrtISBackend()

nvdsinferserver::TrtISBackend::TrtISBackend ( const std::string &  name,
int64_t  version,
TrtServerPtr  ptr = nullptr 
)

Constructor.

Save the model name, version and server handle.

Parameters
[in]nameModel name.
[in]versionModel version.
[in]ptrHandle to Triton server class instance.

◆ ~TrtISBackend()

nvdsinferserver::TrtISBackend::~TrtISBackend ( )
override

Destructor.

Unload the model if needed.

Member Function Documentation

◆ addClassifyParams()

void nvdsinferserver::TrtISBackend::addClassifyParams ( const TritonClassParams c)
inline

Add Triton Classification parameters to the list.

Definition at line 58 of file infer_trtis_backend.h.

◆ allocateResponseBuf()

SharedSysMem nvdsinferserver::TrtISBackend::allocateResponseBuf ( const std::string &  tensor,
size_t  bytes,
InferMemType  memType,
int64_t  devId 
)
protected

Acquire a buffer from the output buffer pool associated with the device ID and memory type.

Create the pool if it doesn't exist.

Parameters
[in]tensorName of the output tensor.
[in]bytesBuffer size.
[in]memTypeRequested memory type.
[in]devIdDevice ID for the allocation.
Returns
Pointer to the allocated buffer.

◆ createResponsePool()

PoolValue nvdsinferserver::TrtISBackend::createResponsePool ( PoolKey key,
size_t  bytes 
)
protected

Create a new buffer pool for the key.

Parameters
[in]keyThe pool key combination.
[in]bytesSize of the requested buffer.
Returns

◆ debatchingOutput()

bool nvdsinferserver::TrtISBackend::debatchingOutput ( SharedBatchArray outputs,
SharedBatchArray inputs 
)
protected

Separate the batch dimension from the output buffer descriptors.

Parameters
[in]outputsArray of output batch buffers.
[in]inputsArray of input batch buffers.
Returns
Boolean indicating success or failure.

◆ enqueue()

NvDsInferStatus nvdsinferserver::TrtISBackend::enqueue ( SharedBatchArray  inputs,
SharedCuStream  stream,
InputsConsumed  bufConsumed,
InferenceDone  inferenceDone 
)
override

Enqueue an input for inference request by calling Run() and adding corresponding task to the reorder thread queue.

Parameters
[in]inputsThe array of input batch buffers.
[in]streamThe CUDA stream to be used.
[in]bufConsumedCallback function for releasing input buffer.
[in]inferenceDoneCallback function for processing result.
Returns

◆ ensureInputs()

NvDsInferStatus nvdsinferserver::TrtISBackend::ensureInputs ( SharedBatchArray inputs)
protected

Ensure that the array of input buffers are expected by the model and reshape the input buffers if required.

Parameters
inputsArray of input batch buffers.
Returns
NVDSINFER_SUCCESS or NVDSINFER_TRITON_ERROR.

◆ ensureModelReady()

virtual NvDsInferStatus nvdsinferserver::TrtISBackend::ensureModelReady ( )
protectedvirtual

Check that the model is ready, load the model if it is not.

Returns
NVDSINFER_SUCCESS or NVDSINFER_TRITON_ERROR.

Reimplemented in nvdsinferserver::TritonGrpcBackend.

◆ ensureServerReady()

virtual NvDsInferStatus nvdsinferserver::TrtISBackend::ensureServerReady ( )
protectedvirtual

Check that the Triton inference server is live.

Returns
NVDSINFER_SUCCESS or NVDSINFER_TRITON_ERROR.

Reimplemented in nvdsinferserver::TritonGrpcBackend.

◆ findResponsePool()

PoolValue nvdsinferserver::TrtISBackend::findResponsePool ( PoolKey key)
protected

Find the buffer pool for the given key.

◆ fixateDims()

NvDsInferStatus nvdsinferserver::TrtISBackend::fixateDims ( const SharedBatchArray bufs)
protected

Extend the dimensions to include batch size for the buffers in input array.

Do nothing if batch input is not required.

◆ getClassifyParams()

std::vector<TritonClassParams> nvdsinferserver::TrtISBackend::getClassifyParams ( )
inline

Definition at line 71 of file infer_trtis_backend.h.

◆ inferenceDoneReorderLoop()

bool nvdsinferserver::TrtISBackend::inferenceDoneReorderLoop ( ReorderItemPtr  item)
protected

Add input buffers to the output buffer list if required.

De-batch and run inference done callback.

Parameters
[in]itemThe reorder task.
Returns
Boolean indicating success or failure.

◆ initialize()

NvDsInferStatus nvdsinferserver::TrtISBackend::initialize ( )
override

Check that the server and model is ready, get the information of layers, setup reorder thread and output tensor allocator.

Returns
NVDSINFER_SUCCESS or NVDSINFER_TRITON_ERROR.

◆ model()

const std::string& nvdsinferserver::TrtISBackend::model ( ) const
inline

Definition at line 73 of file infer_trtis_backend.h.

◆ outputDevId()

int64_t nvdsinferserver::TrtISBackend::outputDevId ( ) const
inline

Definition at line 70 of file infer_trtis_backend.h.

◆ outputMemType()

InferMemType nvdsinferserver::TrtISBackend::outputMemType ( ) const
inline

Definition at line 68 of file infer_trtis_backend.h.

◆ outputPoolSize()

int nvdsinferserver::TrtISBackend::outputPoolSize ( ) const
inline

Definition at line 66 of file infer_trtis_backend.h.

◆ releaseResponseBuf()

void nvdsinferserver::TrtISBackend::releaseResponseBuf ( const std::string &  tensor,
SharedSysMem  mem 
)
protected

Release the output tensor buffer.

Parameters
[in]tensorName of the output tensor.
[in]memPointer to the memory buffer.

◆ requestTritonOutputNames()

virtual void nvdsinferserver::TrtISBackend::requestTritonOutputNames ( std::set< std::string > &  outNames)
protectedvirtual

Get the list of output tensor names.

Parameters
[out]outNamesThe set of strings to which the names are added.

Reimplemented in nvdsinferserver::TritonGrpcBackend, and nvdsinferserver::TritonSimpleRuntime.

◆ Run()

virtual NvDsInferStatus nvdsinferserver::TrtISBackend::Run ( SharedBatchArray  inputs,
InputsConsumed  bufConsumed,
AsyncDone  asyncDone 
)
protectedvirtual

Create an inference request and trigger asynchronous inference.

serverInferCompleted() is set as callback function that in turn calls asyncDone.

Parameters
[in]inputsArray of input batch buffers.
[in]bufConsumedCallback function for releasing input buffer.
[in]asyncDoneCallback function for processing response .
Returns

Reimplemented in nvdsinferserver::TritonGrpcBackend.

◆ server()

TrtServerPtr& nvdsinferserver::TrtISBackend::server ( )
inlineprotected

Get the Triton server handle.

Definition at line 164 of file infer_trtis_backend.h.

◆ serverInferCompleted()

void nvdsinferserver::TrtISBackend::serverInferCompleted ( std::shared_ptr< TrtServerRequest request,
std::unique_ptr< TrtServerResponse uniqResponse,
InputsConsumed  inputsConsumed,
AsyncDone  asyncDone 
)
protected

Call the inputs consumed function and parse the inference response to form the array of output batch buffers and call asyncDone on it.

Parameters
[in]requestPointer to the inference request.
[in]uniqResponsePointer to the inference response from the server.
[in]inputsConsumedCallback function for releasing input buffer.
[in]asyncDoneCallback function for processing response .

◆ setAllocator()

void nvdsinferserver::TrtISBackend::setAllocator ( UniqTritonAllocator  allocator)
inlineprotected

Set the output tensor allocator.

Definition at line 148 of file infer_trtis_backend.h.

◆ setOutputDevId()

void nvdsinferserver::TrtISBackend::setOutputDevId ( int64_t  devId)
inline

Definition at line 69 of file infer_trtis_backend.h.

◆ setOutputMemType()

void nvdsinferserver::TrtISBackend::setOutputMemType ( InferMemType  memType)
inline

Definition at line 67 of file infer_trtis_backend.h.

◆ setOutputPoolSize()

void nvdsinferserver::TrtISBackend::setOutputPoolSize ( int  size)
inline

Helper function to access the member variables.

Definition at line 65 of file infer_trtis_backend.h.

◆ setTensorMaxBytes()

void nvdsinferserver::TrtISBackend::setTensorMaxBytes ( const std::string &  name,
size_t  maxBytes 
)
inline

Set the maximum size for the tensor, the maximum of the existing size and new input size is used.

The size is rounded up to INFER_MEM_ALIGNMENT bytes.

Parameters
nameName of the tensor.
maxBytesNew maximum number of bytes for the buffer.

Definition at line 110 of file infer_trtis_backend.h.

References INFER_MEM_ALIGNMENT, and INFER_ROUND_UP.

◆ setupLayersInfo()

virtual NvDsInferStatus nvdsinferserver::TrtISBackend::setupLayersInfo ( )
protectedvirtual

Get the model configuration from the server and populate layer information.

Set maximum batch size as specified in configuration settings.

Returns
NVDSINFER_SUCCESS or NVDSINFER_TRITON_ERROR.

Reimplemented in nvdsinferserver::TritonGrpcBackend.

◆ setupReorderThread()

NvDsInferStatus nvdsinferserver::TrtISBackend::setupReorderThread ( )
protected

Create a loop thread that calls inferenceDoneReorderLoop on the queued items.

Returns
NVDSINFER_SUCCESS or NVDSINFER_TRITON_ERROR.

◆ specifyInputDims()

NvDsInferStatus nvdsinferserver::TrtISBackend::specifyInputDims ( const InputShapes &  shapes)
override

Specify the input layers for the backend.

Parameters
shapesList of name and shapes of the input layers.
Returns
Status code of the type NvDsInferStatus.

◆ version()

int64_t nvdsinferserver::TrtISBackend::version ( ) const
inline

Definition at line 74 of file infer_trtis_backend.h.


The documentation for this class was generated from the following file: