NVIDIA Docs Hub NVIDIA Morpheus morpheus.stages.inference.triton_inference_stage.ShmInputWrapper

morpheus.stages.inference.triton_inference_stage.ShmInputWrapper

class ShmInputWrapper(client, model_name, config)[source]

Bases: morpheus.stages.inference.triton_inference_stage.InputWrapper

This class is a wrapper around a CUDA shared memory object shared between this process and a Triton server instance. Since the Triton server only accepts numpy arrays as inputs, we can use this special class to pass memory references of inputs on the device to the server without having to go to the host eliminating serialization and network overhead.

Parameters

clienttritonclient.InferenceServerClient
model_namestr
configtyping.Dict[str, TritonInOut]

Methods

`build_input`(name, data, force_convert_inputs)	This helper function builds a Triton InferInput object that can be directly used by `tritonclient.async_infer`.
`get_bytes`(name)	Get the bytes needed for a particular input/output.
`get_offset`(name)	Get the offset needed for a particular input/output.
`get_ptr`(name)	Returns the `cupy.cuda.MemoryPointer` object to the internal `ShmWrapper` for the specified input/output name.

build_input(name, data, force_convert_inputs)[source]

This helper function builds a Triton InferInput object that can be directly used by tritonclient.async_infer. Utilizes the config option passed in the constructor to determine the shape/size/type.

Parameters

namestr
datacupy.ndarray
force_convert_inputs: bool

get_bytes(name)[source]

Get the bytes needed for a particular input/output.

Parameters

namestr

Returns

bytes

get_offset(name)[source]

Get the offset needed for a particular input/output.

Parameters

namestr

Returns

int

get_ptr(name)[source]

Returns the cupy.cuda.MemoryPointer object to the internal ShmWrapper for the specified input/output name.

Parameters

namestr

Returns

cp.cuda.MemoryPointer