tritonclient.grpc.aio

tritonclient.grpc.aio#

Classes

InferenceServerClient(url[, verbose, ssl, ...])

This feature is currently in beta and may be subject to change.

class tritonclient.grpc.aio.InferenceServerClient(url, verbose=False, ssl=False, root_certificates=None, private_key=None, certificate_chain=None, creds=None, keepalive_options=None, channel_args=None)#

This feature is currently in beta and may be subject to change.

An analogy of the tritonclient.grpc.InferenceServerClient to enable calling via asyncio syntax. The object is intended to be used by a single thread and simultaneously calling methods with different threads is not supported and can cause undefined behavior.

_get_metadata(headers)#

_return_response(response, as_json)#

async close()#: Close the client. Any future calls to server will result in an Error.

async get_cuda_shared_memory_status(region_name='', headers=None, as_json=False, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.get_cuda_shared_memory_status()

async get_inference_statistics(model_name='', model_version='', headers=None, as_json=False, client_timeout=None)#: Refer to :tritonclient.grpc.InferenceServerClient.get_inference_statistics()

async get_log_settings(headers=None, as_json=False, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.get_log_settings()

async get_model_config(model_name, model_version='', headers=None, as_json=False, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.get_model_config()

async get_model_metadata(model_name, model_version='', headers=None, as_json=False, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.get_model_metadata()

async get_model_repository_index(headers=None, as_json=False, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.get_model_repository_index()

async get_server_metadata(headers=None, as_json=False, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.get_server_metadata()

async get_system_shared_memory_status(region_name='', headers=None, as_json=False, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.get_system_shared_memory_status()

async get_trace_settings(model_name=None, headers=None, as_json=False, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.get_trace_settings()

async infer(model_name, inputs, model_version='', outputs=None, request_id='', sequence_id=0, sequence_start=False, sequence_end=False, priority=0, timeout=None, client_timeout=None, headers=None, compression_algorithm=None, parameters=None)#: Refer to tritonclient.grpc.InferenceServerClient.infer()

async is_model_ready(model_name, model_version='', headers=None, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.is_model_ready()

async is_server_live(headers=None, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.is_server_live()

async is_server_ready(headers=None, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.is_server_ready()

async load_model(model_name, headers=None, config=None, files=None, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.load_model()

async register_cuda_shared_memory(name, raw_handle, device_id, byte_size, headers=None, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.register_cuda_shared_memory()

async register_system_shared_memory(name, key, byte_size, offset=0, headers=None, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.register_system_shared_memory()

stream_infer(inputs_iterator, stream_timeout=None, headers=None, compression_algorithm=None)#

Runs an asynchronous inference over gRPC bi-directional streaming API.

Parameters:

inputs_iterator (asynchronous iterator) – Async iterator that yields a dict(s) consists of the input parameters to the tritonclient.grpc.InferenceServerClient.async_stream_infer() function defined in tritonclient.grpc.InferenceServerClient.
stream_timeout (float) – Optional stream timeout. The stream will be closed once the specified timeout expires.
headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request.
compression_algorithm (str) – Optional grpc compression algorithm to be used on client side. Currently supports “deflate”, “gzip” and None. By default, no compression is used.

Returns:

Yield tuple holding (tritonclient.grpc.InferResult, tritonclient.grpc.InferenceServerException) objects.

Note

This object can be used to cancel the inference request like below:

>>> it = stream_infer(...)
>>> ret = it.cancel()

Return type:

asynchronous iterator

Raises:

tritonclient.grpc.InferenceServerException – If inputs_iterator does not yield the correct input.

async unload_model(model_name, headers=None, unload_dependents=False, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.unload_model()

async unregister_cuda_shared_memory(name='', headers=None, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.unregister_cuda_shared_memory()

async unregister_system_shared_memory(name='', headers=None, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.unregister_system_shared_memory()

async update_log_settings(settings, headers=None, as_json=False, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.update_log_settings()

async update_trace_settings(model_name=None, settings={}, headers=None, as_json=False, client_timeout=None)#: Refer to tritonclient.grpc.InferenceServerClient.update_trace_settings()

Modules

tritonclient.grpc.aio.auth