Detailed Description

Definition at line 20 of file infer_simple_runtime.h.

Inheritance diagram for nvdsinferserver::TritonSimpleRuntime:

Collaboration diagram for nvdsinferserver::TritonSimpleRuntime:

Public Member Functions
	TritonSimpleRuntime (std::string model, int64_t version)

	~TritonSimpleRuntime () override

void	setOutputs (const std::set< std::string > &names)

NvDsInferStatus	initialize () override

void	addClassifyParams (const TritonClassParams &c)
	Add Triton Classification parameters to the list. More...

void	setTensorMaxBytes (const std::string &name, size_t maxBytes)
	Set the maximum size for the tensor, the maximum of the existing size and new input size is used. More...


void	setOutputPoolSize (int size)
	Helper function to access the member variables. More...

int	outputPoolSize () const

void	setOutputMemType (InferMemType memType)

InferMemType	outputMemType () const

void	setOutputDevId (int64_t devId)

int64_t	outputDevId () const

std::vector< TritonClassParams >	getClassifyParams ()

const std::string &	model () const

int64_t	version () const


void	setOutputPoolSize (int size)
	Helper function to access the member variables. More...

int	outputPoolSize () const

void	setOutputMemType (InferMemType memType)

InferMemType	outputMemType () const

void	setOutputDevId (int64_t devId)

int64_t	outputDevId () const

std::vector< TritonClassParams >	getClassifyParams ()

const std::string &	model () const

int64_t	version () const


void	setOutputPoolSize (int size)
	Helper function to access the member variables. More...

int	outputPoolSize () const

void	setOutputMemType (InferMemType memType)

InferMemType	outputMemType () const

void	setOutputDevId (int64_t devId)

int64_t	outputDevId () const

std::vector< TritonClassParams >	getClassifyParams ()

const std::string &	model () const

int64_t	version () const

Protected Types
enum	{ kName, kGpuId, kMemType }
	Tuple keys as <tensor-name, gpu-id, memType> More...

using	AsyncDone = std::function< void(NvDsInferStatus, SharedBatchArray)>
	Asynchronous inference done function: AsyncDone(Status, outputs). More...

using	PoolKey = std::tuple< std::string, int64_t, InferMemType >
	Tuple holding tensor name, GPU ID, memory type. More...

using	PoolValue = SharedBufPool< UniqSysMem >
	The buffer pool for the specified tensor, GPU and memory type combination. More...

using	ReorderItemPtr = std::shared_ptr< ReorderItem >

Protected Member Functions
NvDsInferStatus	specifyInputDims (const InputShapes &shapes) override

NvDsInferStatus	enqueue (SharedBatchArray inputs, SharedCuStream stream, InputsConsumed bufConsumed, InferenceDone inferenceDone) override

void	requestTritonOutputNames (std::set< std::string > &names) override

virtual NvDsInferStatus	ensureServerReady ()
	Check that the Triton inference server is live. More...

virtual NvDsInferStatus	ensureModelReady ()
	Check that the model is ready, load the model if it is not. More...

NvDsInferStatus	setupReorderThread ()
	Create a loop thread that calls inferenceDoneReorderLoop on the queued items. More...

void	setAllocator (UniqTritonAllocator allocator)
	Set the output tensor allocator. More...

virtual NvDsInferStatus	setupLayersInfo ()
	Get the model configuration from the server and populate layer information. More...

TrtServerPtr &	server ()
	Get the Triton server handle. More...

virtual NvDsInferStatus	Run (SharedBatchArray inputs, InputsConsumed bufConsumed, AsyncDone asyncDone)
	Create an inference request and trigger asynchronous inference. More...

NvDsInferStatus	fixateDims (const SharedBatchArray &bufs)
	Extend the dimensions to include batch size for the buffers in input array. More...

SharedSysMem	allocateResponseBuf (const std::string &tensor, size_t bytes, InferMemType memType, int64_t devId)
	Acquire a buffer from the output buffer pool associated with the device ID and memory type. More...

void	releaseResponseBuf (const std::string &tensor, SharedSysMem mem)
	Release the output tensor buffer. More...

NvDsInferStatus	ensureInputs (SharedBatchArray &inputs)
	Ensure that the array of input buffers are expected by the model and reshape the input buffers if required. More...

PoolValue	findResponsePool (PoolKey &key)
	Find the buffer pool for the given key. More...

PoolValue	createResponsePool (PoolKey &key, size_t bytes)
	Create a new buffer pool for the key. More...

void	serverInferCompleted (std::shared_ptr< TrtServerRequest > request, std::unique_ptr< TrtServerResponse > uniqResponse, InputsConsumed inputsConsumed, AsyncDone asyncDone)
	Call the inputs consumed function and parse the inference response to form the array of output batch buffers and call asyncDone on it. More...

bool	inferenceDoneReorderLoop (ReorderItemPtr item)
	Add input buffers to the output buffer list if required. More...

bool	debatchingOutput (SharedBatchArray &outputs, SharedBatchArray &inputs)
	Separate the batch dimension from the output buffer descriptors. More...

Member Typedef Documentation

◆ AsyncDone

using nvdsinferserver::TrtISBackend::AsyncDone = std::function<void(NvDsInferStatus, SharedBatchArray)>

protectedinherited

Asynchronous inference done function: AsyncDone(Status, outputs).

Definition at line 169 of file infer_trtis_backend.h.

◆ PoolKey

using nvdsinferserver::TrtISBackend::PoolKey = std::tuple<std::string, int64_t, InferMemType>

protectedinherited

Tuple holding tensor name, GPU ID, memory type.

Definition at line 224 of file infer_trtis_backend.h.

◆ PoolValue

using nvdsinferserver::TrtISBackend::PoolValue = SharedBufPool<UniqSysMem>

protectedinherited

The buffer pool for the specified tensor, GPU and memory type combination.

Definition at line 229 of file infer_trtis_backend.h.

◆ ReorderItemPtr

using nvdsinferserver::TrtISBackend::ReorderItemPtr = std::shared_ptr<ReorderItem>

protectedinherited

Definition at line 293 of file infer_trtis_backend.h.

Member Enumeration Documentation

◆ anonymous enum

anonymous enum

protectedinherited

Tuple keys as <tensor-name, gpu-id, memType>

Enumerator
kName
kGpuId
kMemType

Definition at line 220 of file infer_trtis_backend.h.

Constructor & Destructor Documentation

◆ TritonSimpleRuntime()

nvdsinferserver::TritonSimpleRuntime::TritonSimpleRuntime	(	std::string	model,
		int64_t	version
	)

◆ ~TritonSimpleRuntime()

nvdsinferserver::TritonSimpleRuntime::~TritonSimpleRuntime ( )

override

Member Function Documentation

◆ addClassifyParams()

void nvdsinferserver::TrtISBackend::addClassifyParams ( const TritonClassParams & c )

inlineinherited

Add Triton Classification parameters to the list.

Definition at line 58 of file infer_trtis_backend.h.

◆ allocateResponseBuf()

SharedSysMem nvdsinferserver::TrtISBackend::allocateResponseBuf	(	const std::string &	tensor,
		size_t	bytes,
		InferMemType	memType,
		int64_t	devId
	)

protectedinherited

Acquire a buffer from the output buffer pool associated with the device ID and memory type.

Create the pool if it doesn't exist.

Parameters

[in]	tensor	Name of the output tensor.
[in]	bytes	Buffer size.
[in]	memType	Requested memory type.
[in]	devId	Device ID for the allocation.

Returns: Pointer to the allocated buffer.

◆ createResponsePool()

PoolValue nvdsinferserver::TrtISBackend::createResponsePool	(	PoolKey &	key,
		size_t	bytes
	)

protectedinherited

Create a new buffer pool for the key.

Parameters

[in]	key	The pool key combination.
[in]	bytes	Size of the requested buffer.

Returns

◆ debatchingOutput()

bool nvdsinferserver::TrtISBackend::debatchingOutput	(	SharedBatchArray &	outputs,
		SharedBatchArray &	inputs
	)

protectedinherited

Separate the batch dimension from the output buffer descriptors.

Parameters

[in]	outputs	Array of output batch buffers.
[in]	inputs	Array of input batch buffers.

Returns: Boolean indicating success or failure.

◆ enqueue()

NvDsInferStatus nvdsinferserver::TritonSimpleRuntime::enqueue	(	SharedBatchArray	inputs,
		SharedCuStream	stream,
		InputsConsumed	bufConsumed,
		InferenceDone	inferenceDone
	)

overrideprotected

◆ ensureInputs()

NvDsInferStatus nvdsinferserver::TrtISBackend::ensureInputs ( SharedBatchArray & inputs )

protectedinherited

Ensure that the array of input buffers are expected by the model and reshape the input buffers if required.

Parameters

inputs Array of input batch buffers.

Returns: NVDSINFER_SUCCESS or NVDSINFER_TRITON_ERROR.

◆ ensureModelReady()

virtual NvDsInferStatus nvdsinferserver::TrtISBackend::ensureModelReady ( )

protectedvirtualinherited

Check that the model is ready, load the model if it is not.

Returns: NVDSINFER_SUCCESS or NVDSINFER_TRITON_ERROR.

Reimplemented in nvdsinferserver::TritonGrpcBackend.

◆ ensureServerReady()

virtual NvDsInferStatus nvdsinferserver::TrtISBackend::ensureServerReady ( )

protectedvirtualinherited

Check that the Triton inference server is live.

Returns: NVDSINFER_SUCCESS or NVDSINFER_TRITON_ERROR.

Reimplemented in nvdsinferserver::TritonGrpcBackend.

◆ findResponsePool()

PoolValue nvdsinferserver::TrtISBackend::findResponsePool ( PoolKey & key )

protectedinherited

Find the buffer pool for the given key.

◆ fixateDims()

NvDsInferStatus nvdsinferserver::TrtISBackend::fixateDims ( const SharedBatchArray & bufs )

protectedinherited

Extend the dimensions to include batch size for the buffers in input array.

Do nothing if batch input is not required.

◆ getClassifyParams()

std::vector<TritonClassParams> nvdsinferserver::TrtISBackend::getClassifyParams ( )

inlineinherited

Definition at line 71 of file infer_trtis_backend.h.

◆ inferenceDoneReorderLoop()

bool nvdsinferserver::TrtISBackend::inferenceDoneReorderLoop ( ReorderItemPtr item )

protectedinherited

Add input buffers to the output buffer list if required.

De-batch and run inference done callback.

Parameters

[in] item The reorder task.

Returns: Boolean indicating success or failure.

◆ initialize()

NvDsInferStatus nvdsinferserver::TritonSimpleRuntime::initialize ( )

override

◆ model()

const std::string& nvdsinferserver::TrtISBackend::model ( ) const

inlineinherited

Definition at line 73 of file infer_trtis_backend.h.

◆ outputDevId()

int64_t nvdsinferserver::TrtISBackend::outputDevId ( ) const

inlineinherited

Definition at line 70 of file infer_trtis_backend.h.

◆ outputMemType()

InferMemType nvdsinferserver::TrtISBackend::outputMemType ( ) const

inlineinherited

Definition at line 68 of file infer_trtis_backend.h.

◆ outputPoolSize()

int nvdsinferserver::TrtISBackend::outputPoolSize ( ) const

inlineinherited

Definition at line 66 of file infer_trtis_backend.h.

◆ releaseResponseBuf()

void nvdsinferserver::TrtISBackend::releaseResponseBuf	(	const std::string &	tensor,
		SharedSysMem	mem
	)

protectedinherited

Release the output tensor buffer.

Parameters

[in]	tensor	Name of the output tensor.
[in]	mem	Pointer to the memory buffer.

◆ requestTritonOutputNames()

void nvdsinferserver::TritonSimpleRuntime::requestTritonOutputNames ( std::set< std::string > & names )

overrideprotectedvirtual

Reimplemented from nvdsinferserver::TrtISBackend.

◆ Run()

virtual NvDsInferStatus nvdsinferserver::TrtISBackend::Run	(	SharedBatchArray	inputs,
		InputsConsumed	bufConsumed,
		AsyncDone	asyncDone
	)

protectedvirtualinherited

Create an inference request and trigger asynchronous inference.

serverInferCompleted() is set as callback function that in turn calls asyncDone.

Parameters

[in]	inputs	Array of input batch buffers.
[in]	bufConsumed	Callback function for releasing input buffer.
[in]	asyncDone	Callback function for processing response .

Returns

Reimplemented in nvdsinferserver::TritonGrpcBackend.

◆ server()

TrtServerPtr& nvdsinferserver::TrtISBackend::server ( )

inlineprotectedinherited

Get the Triton server handle.

Definition at line 164 of file infer_trtis_backend.h.

◆ serverInferCompleted()

void nvdsinferserver::TrtISBackend::serverInferCompleted	(	std::shared_ptr< TrtServerRequest >	request,
		std::unique_ptr< TrtServerResponse >	uniqResponse,
		InputsConsumed	inputsConsumed,
		AsyncDone	asyncDone
	)

protectedinherited

Call the inputs consumed function and parse the inference response to form the array of output batch buffers and call asyncDone on it.

Parameters

[in]	request	Pointer to the inference request.
[in]	uniqResponse	Pointer to the inference response from the server.
[in]	inputsConsumed	Callback function for releasing input buffer.
[in]	asyncDone	Callback function for processing response .

◆ setAllocator()

void nvdsinferserver::TrtISBackend::setAllocator ( UniqTritonAllocator allocator )

inlineprotectedinherited

Set the output tensor allocator.

Definition at line 148 of file infer_trtis_backend.h.

◆ setOutputDevId()

void nvdsinferserver::TrtISBackend::setOutputDevId ( int64_t devId )

inlineinherited

Definition at line 69 of file infer_trtis_backend.h.

◆ setOutputMemType()

void nvdsinferserver::TrtISBackend::setOutputMemType ( InferMemType memType )

inlineinherited

Definition at line 67 of file infer_trtis_backend.h.

◆ setOutputPoolSize()

void nvdsinferserver::TrtISBackend::setOutputPoolSize ( int size )

inlineinherited

Helper function to access the member variables.

Definition at line 65 of file infer_trtis_backend.h.

◆ setOutputs()

void nvdsinferserver::TritonSimpleRuntime::setOutputs ( const std::set< std::string > & names )

inline

Definition at line 25 of file infer_simple_runtime.h.

◆ setTensorMaxBytes()

void nvdsinferserver::TrtISBackend::setTensorMaxBytes	(	const std::string &	name,
		size_t	maxBytes
	)

inlineinherited

Set the maximum size for the tensor, the maximum of the existing size and new input size is used.

The size is rounded up to INFER_MEM_ALIGNMENT bytes.

Parameters

name	Name of the tensor.
maxBytes	New maximum number of bytes for the buffer.

Definition at line 110 of file infer_trtis_backend.h.

References INFER_MEM_ALIGNMENT, and INFER_ROUND_UP.

◆ setupLayersInfo()

virtual NvDsInferStatus nvdsinferserver::TrtISBackend::setupLayersInfo ( )

protectedvirtualinherited

Get the model configuration from the server and populate layer information.

Set maximum batch size as specified in configuration settings.

Returns: NVDSINFER_SUCCESS or NVDSINFER_TRITON_ERROR.

Reimplemented in nvdsinferserver::TritonGrpcBackend.

◆ setupReorderThread()

NvDsInferStatus nvdsinferserver::TrtISBackend::setupReorderThread ( )

protectedinherited

Create a loop thread that calls inferenceDoneReorderLoop on the queued items.

Returns: NVDSINFER_SUCCESS or NVDSINFER_TRITON_ERROR.

◆ specifyInputDims()

NvDsInferStatus nvdsinferserver::TritonSimpleRuntime::specifyInputDims ( const InputShapes & shapes )

overrideprotected

◆ version()

int64_t nvdsinferserver::TrtISBackend::version ( ) const

inlineinherited

Definition at line 74 of file infer_trtis_backend.h.

The documentation for this class was generated from the following file:

infer_simple_runtime.h


NVIDIA DeepStream SDK API Reference	6.2 Release

NVIDIA DeepStream SDK API Reference

6.2 Release

Detailed Description

Public Member Functions

Protected Types

Protected Member Functions

Member Typedef Documentation

◆ AsyncDone

◆ PoolKey

◆ PoolValue

◆ ReorderItemPtr

Member Enumeration Documentation

◆ anonymous enum

Constructor & Destructor Documentation

◆ TritonSimpleRuntime()

◆ ~TritonSimpleRuntime()

Member Function Documentation

◆ addClassifyParams()

◆ allocateResponseBuf()

◆ createResponsePool()

◆ debatchingOutput()

◆ enqueue()

◆ ensureInputs()

◆ ensureModelReady()

◆ ensureServerReady()

◆ findResponsePool()

◆ fixateDims()

◆ getClassifyParams()

◆ inferenceDoneReorderLoop()

◆ initialize()

◆ model()

◆ outputDevId()

◆ outputMemType()

◆ outputPoolSize()

◆ releaseResponseBuf()

◆ requestTritonOutputNames()

◆ Run()

◆ server()

◆ serverInferCompleted()

◆ setAllocator()

◆ setOutputDevId()

◆ setOutputMemType()

◆ setOutputPoolSize()

◆ setOutputs()

◆ setTensorMaxBytes()

◆ setupLayersInfo()

◆ setupReorderThread()

◆ specifyInputDims()

◆ version()