Experimental gRPC Python API

Client Core

This module contains most of the core functionality of the library including setting up a connection, sending requests and receiving response to/from an active Triton server.

class tritongrpcclient.core.InferInput(name, shape=None, datatype=None)

An object of InferInput class is used to describe input tensor for an inference request.

Parameters
  • name (str) – The name of input whose data will be described by this object

  • shape (list) – The shape of the associated input. Default value is None.

  • datatype (str) – The datatype of the associated input. Default is None.

clear_parameters()

Clears all the parameters that have been added to the input request.

datatype()

Get the datatype of input associated with this object.

Returns

The datatype of input

Return type

str

name()

Get the name of input associated with this object.

Returns

The name of input

Return type

str

set_data_from_numpy(input_tensor)

Set the tensor data (datatype, shape, contents) from the specified numpy array for input associated with this object.

Parameters

input_tensor (numpy array) – The tensor data in numpy array format

set_parameter(key, value)

Adds the specified key-value pair in the requested input parameters

Parameters
  • key (str) – The name of the parameter to be included in the request.

  • value (str/int/bool) – The value of the parameter

shape()

Get the shape of input associated with this object.

Returns

The shape of input

Return type

list

class tritongrpcclient.core.InferOutput(name)

An object of InferOutput class is used to describe a requested output tensor for an inference request.

Parameters

name (str) – The name of output tensor to associate with this object

clear_parameters()

Clears all the parameters that have been added to the output request.

name()

Get the name of output associated with this object.

Returns

The name of output

Return type

str

set_parameter(key, value)

Adds the specified key-value pair in the requested output parameters

Parameters
  • key (str) – The name of the parameter to be included in the request.

  • value (str/int/bool) – The value of the parameter

class tritongrpcclient.core.InferResult(result)

An object of InferResult class holds the response of an inference request and provide methods to retrieve inference results.

Parameters

result (protobuf message) – The ModelInferResponse returned by the server

as_numpy(name)

Get the tensor data for output associated with this object in numpy format

Parameters

name (str) – The name of the output tensor whose result is to be retrieved.

Returns

The numpy array containing the response data for the tensor or None if the data for specified tensor name is not found.

Return type

numpy array

get_response(as_json=False)

Retrieves the complete ModelInferResponse as a json dict object or protobuf message

Parameters

as_json (bool) – If True then returns response as a json dict, otherwise as a protobuf message. Default value is False.

Returns

The underlying ModelInferResponse as a protobuf message or dict.

Return type

protobuf message or dict

get_statistics(as_json=False)

Retrieves the InferStatistics for this response as a json dict object or protobuf message

Parameters

as_json (bool) – If True then returns statistics as a json dict, otherwise as a protobuf message. Default value is False.

Returns

The InferStatistics protobuf message or dict for this response.

Return type

protobuf message or dict

class tritongrpcclient.core.InferenceServerClient(url, verbose=False)

An InferenceServerClient object is used to perform any kind of communication with the InferenceServer using gRPC protocol.

Parameters
  • url (str) – The inference server URL, e.g. ‘localhost:8001’.

  • verbose (bool) – If True generate verbose output. Default value is False.

Raises

Exception – If unable to create a client.

async_infer(callback, inputs, outputs, model_name, model_version='', request_id=None, parameters=None)

Run asynchronous inference using the supplied ‘inputs’ requesting the outputs specified by ‘outputs’.

Parameters
  • callback (function) – Python function that is invoked once the request is completed. The function must reserve the last argument to hold InferResult object which will be provided to the function when executing the callback. The ownership of this InferResult object will be given to the user and the its lifetime is limited to the scope of this function.

  • inputs (list) – A list of InferInput objects, each describing data for a input tensor required by the model.

  • outputs (list) – A list of InferOutput objects, each describing how the output data must be returned. Only the output tensors present in the list will be requested from the server.

  • model_name (str) – The name of the model to run inference.

  • model_version (str) – The version of the model to run inference. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • request_id (str) – Optional identifier for the request. If specified will be returned in the response. Default value is ‘None’ which means no request_id will be used.

  • parameters (dict) – Optional inference parameters described as key-value pairs.

Raises

InferenceServerException – If server fails to issue inference.

close()

Close the client. Any future calls to server will result in an Error.

get_cuda_shared_memory_status(region_name='', as_json=False)

Request cuda shared memory status from the server.

Parameters
  • region_name (str) – The name of the region to query status. The default value is an empty string, which means that the status of all active cuda shared memory will be returned.

  • as_json (bool) – If True then returns cuda shared memory status as a json dict, otherwise as a protobuf message. Default value is False.

Returns

The JSON dict or CudaSharedMemoryStatusResponse message holding the metadata.

Return type

dict or protobuf message

Raises

InferenceServerException – If unable to get the status of specified shared memory.

get_model_config(model_name, model_version='', as_json=False)

Contact the inference server and get the configuration for specified model.

Parameters
  • model_name (str) – The name of the model

  • model_version (str) – The version of the model to get configuration. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • as_json (bool) – If True then returns configuration as a json dict, otherwise as a protobuf message. Default value is False.

Returns

The JSON dict or ModelConfigResponse message holding the metadata.

Return type

dict or protobuf message

Raises

InferenceServerException – If unable to get model configuration.

get_model_metadata(model_name, model_version='', as_json=False)

Contact the inference server and get the metadata for specified model.

Parameters
  • model_name (str) – The name of the model

  • model_version (str) – The version of the model to get metadata. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • as_json (bool) – If True then returns model metadata as a json dict, otherwise as a protobuf message. Default value is False.

Returns

The JSON dict or ModelMetadataResponse message holding the metadata.

Return type

dict or protobuf message

Raises

InferenceServerException – If unable to get model metadata.

get_model_repository_index(as_json=False)

Get the index of model repository contents

Parameters

as_json (bool) – If True then returns model repository index as a json dict, otherwise as a protobuf message. Default value is False.

Returns

The JSON dict or RepositoryIndexResponse message holding the model repository index.

Return type

dict or protobuf message

get_server_metadata(as_json=False)

Contact the inference server and get its metadata.

Parameters

as_json (bool) – If True then returns server metadata as a json dict, otherwise as a protobuf message. Default value is False.

Returns

The JSON dict or ServerMetadataResponse message holding the metadata.

Return type

dict or protobuf message

Raises

InferenceServerException – If unable to get server metadata.

get_system_shared_memory_status(region_name='', as_json=False)

Request system shared memory status from the server.

Parameters
  • region_name (str) – The name of the region to query status. The default value is an empty string, which means that the status of all active system shared memory will be returned.

  • as_json (bool) – If True then returns system shared memory status as a json dict, otherwise as a protobuf message. Default value is False.

Returns

The JSON dict or SystemSharedMemoryStatusResponse message holding the metadata.

Return type

dict or protobuf message

Raises

InferenceServerException – If unable to get the status of specified shared memory.

infer(inputs, outputs, model_name, model_version='', request_id=None, parameters=None)

Run synchronous inference using the supplied ‘inputs’ requesting the outputs specified by ‘outputs’.

Parameters
  • inputs (list) – A list of InferInput objects, each describing data for a input tensor required by the model.

  • outputs (list) – A list of InferOutput objects, each describing how the output data must be returned. Only the output tensors present in the list will be requested from the server.

  • model_name (str) – The name of the model to run inference.

  • model_version (str) – The version of the model to run inference. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • request_id (str) – Optional identifier for the request. If specified will be returned in the response. Default value is ‘None’ which means no request_id will be used.

  • parameters (dict) – Optional inference parameters described as key-value pairs.

Returns

The object holding the result of the inference, including the statistics.

Return type

InferResult

Raises

InferenceServerException – If server fails to perform inference.

is_model_ready(model_name, model_version='')

Contact the inference server and get the readiness of specified model.

Parameters
  • model_name (str) – The name of the model to check for readiness.

  • model_version (str) – The version of the model to check for readiness. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

Returns

True if the model is ready, False if not ready.

Return type

bool

Raises

InferenceServerException – If unable to get model readiness.

is_server_live()

Contact the inference server and get liveness.

Returns

True if server is live, False if server is not live.

Return type

bool

Raises

InferenceServerException – If unable to get liveness.

is_server_ready()

Contact the inference server and get readiness.

Returns

True if server is ready, False if server is not ready.

Return type

bool

Raises

InferenceServerException – If unable to get readiness.

load_model(model_name)

Request the inference server to load or reload specified model.

Parameters

model_name (str) – The name of the model to be loaded.

Raises

InferenceServerException – If unable to load the model.

register_cuda_shared_memory(name, raw_handle, device_id, byte_size)

Request the server to register a system shared memory with the following specification.

Parameters
  • name (str) – The name of the region to register.

  • raw_handle (bytes) – The raw serialized cudaIPC handle in base64 encoding.

  • device_id (int) – The GPU device ID on which the cudaIPC handle was created.

  • byte_size (int) – The size of the cuda shared memory region, in bytes.

Raises

InferenceServerException – If unable to register the specified cuda shared memory.

register_system_shared_memory(name, key, byte_size, offset=0)

Request the server to register a system shared memory with the following specification.

Parameters
  • name (str) – The name of the region to register.

  • key (str) – The key of the underlying memory object that contains the system shared memory region.

  • byte_size (int) – The size of the system shared memory region, in bytes.

  • offset (int) – Offset, in bytes, within the underlying memory object to the start of the system shared memory region. The default value is zero.

Raises

InferenceServerException – If unable to register the specified system shared memory.

unload_model(model_name)

Request the inference server to unload specified model.

Parameters

model_name (str) – The name of the model to be unloaded.

Raises

InferenceServerException – If unable to unload the model.

unregister_cuda_shared_memory(name='')

Request the server to unregister a cuda shared memory with the specified name.

Parameters

name (str) – The name of the region to unregister. The default value is empty string which means all the cuda shared memory regions will be unregistered.

Raises

InferenceServerException – If unable to unregister the specified cuda shared memory region.

unregister_system_shared_memory(name='')

Request the server to unregister a system shared memory with the specified name.

Parameters

name (str) – The name of the region to unregister. The default value is empty string which means all the system shared memory regions will be unregistered.

Raises

InferenceServerException – If unable to unregister the specified system shared memory region.

Client Utils

This module exposes additional supporting utilities.

tritongrpcclient.utils.raise_error(msg)

Raise error with the provided message

exception tritongrpcclient.utils.InferenceServerException(msg, status=None, debug_details=None)

Exception indicating non-Success status.

Parameters
  • msg (str) – A brief description of error

  • status (str) – The error code

  • debug_details (str) – The additional details on the error

debug_details()

Get the detailed information about the exception for debugging purposes

Returns

Returns the exception details

Return type

str

message()

Get the exception message.

Returns

The message associated with this exception, or None if no message.

Return type

str

status()

Get the status of the exception.

Returns

Returns the status of the exception

Return type

str

tritongrpcclient.utils.serialize_byte_tensor(input_tensor)

Serializes a bytes tensor into a flat numpy array of length prepend bytes. Can pass bytes tensor as numpy array of bytes with dtype of np.bytes_, numpy strings with dtype of np.str_ or python strings with dtype of np.object.

Parameters

input_tensor (np.array) – The bytes tensor to serialize.

Returns

serialized_bytes_tensor – The 1-D numpy array of type uint8 containing the serialized bytes in ‘C’ order.

Return type

np.array

Raises

InferenceServerException – If unable to serialize the given tensor.

tritongrpcclient.utils.deserialize_bytes_tensor(encoded_tensor)

Deserializes an encoded bytes tensor into an numpy array of dtype of python objects

Parameters

encoded_tensor (bytes) – The encoded bytes tensor where each element has its length in first 4 bytes followed by the content

Returns

string_tensor – The 1-D numpy array of type object containing the deserialized bytes in ‘C’ order.

Return type

np.array