tritonclient.grpc.aio#
Classes
|
This feature is currently in beta and may be subject to change. |
- class tritonclient.grpc.aio.InferenceServerClient(
- url,
- verbose=False,
- ssl=False,
- root_certificates=None,
- private_key=None,
- certificate_chain=None,
- creds=None,
- keepalive_options=None,
- channel_args=None,
This feature is currently in beta and may be subject to change.
An analogy of the
tritonclient.grpc.InferenceServerClientto enable calling via asyncio syntax. The object is intended to be used by a single thread and simultaneously calling methods with different threads is not supported and can cause undefined behavior.- _get_metadata(headers)#
- _return_response(response, as_json)#
- async close()#
Close the client. Any future calls to server will result in an Error.
- region_name='',
- headers=None,
- as_json=False,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.get_cuda_shared_memory_status()
- async get_inference_statistics(
- model_name='',
- model_version='',
- headers=None,
- as_json=False,
- client_timeout=None,
Refer to :
tritonclient.grpc.InferenceServerClient.get_inference_statistics()
- async get_log_settings(
- headers=None,
- as_json=False,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.get_log_settings()
- async get_model_config(
- model_name,
- model_version='',
- headers=None,
- as_json=False,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.get_model_config()
- async get_model_metadata(
- model_name,
- model_version='',
- headers=None,
- as_json=False,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.get_model_metadata()
- async get_model_repository_index(
- headers=None,
- as_json=False,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.get_model_repository_index()
- async get_server_metadata(
- headers=None,
- as_json=False,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.get_server_metadata()
- region_name='',
- headers=None,
- as_json=False,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.get_system_shared_memory_status()
- async get_trace_settings(
- model_name=None,
- headers=None,
- as_json=False,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.get_trace_settings()
- async infer(
- model_name,
- inputs,
- model_version='',
- outputs=None,
- request_id='',
- sequence_id=0,
- sequence_start=False,
- sequence_end=False,
- priority=0,
- timeout=None,
- client_timeout=None,
- headers=None,
- compression_algorithm=None,
- parameters=None,
- async is_model_ready(
- model_name,
- model_version='',
- headers=None,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.is_model_ready()
- async is_server_live(
- headers=None,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.is_server_live()
- async is_server_ready(
- headers=None,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.is_server_ready()
- async load_model(
- model_name,
- headers=None,
- config=None,
- files=None,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.load_model()
- name,
- raw_handle,
- device_id,
- byte_size,
- headers=None,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.register_cuda_shared_memory()
- name,
- key,
- byte_size,
- offset=0,
- headers=None,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.register_system_shared_memory()
- stream_infer(
- inputs_iterator,
- stream_timeout=None,
- headers=None,
- compression_algorithm=None,
Runs an asynchronous inference over gRPC bi-directional streaming API.
- Parameters:
inputs_iterator (asynchronous iterator) – Async iterator that yields a dict(s) consists of the input parameters to the
tritonclient.grpc.InferenceServerClient.async_stream_infer()function defined intritonclient.grpc.InferenceServerClient.stream_timeout (float) – Optional stream timeout. The stream will be closed once the specified timeout expires.
headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request.
compression_algorithm (str) – Optional grpc compression algorithm to be used on client side. Currently supports “deflate”, “gzip” and None. By default, no compression is used.
- Returns:
Yield tuple holding (
tritonclient.grpc.InferResult,tritonclient.grpc.InferenceServerException) objects.Note
This object can be used to cancel the inference request like below:
>>> it = stream_infer(...) >>> ret = it.cancel()
- Return type:
asynchronous iterator
- Raises:
tritonclient.grpc.InferenceServerException – If inputs_iterator does not yield the correct input.
- async unload_model(
- model_name,
- headers=None,
- unload_dependents=False,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.unload_model()
- name='',
- headers=None,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.unregister_cuda_shared_memory()
- name='',
- headers=None,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.unregister_system_shared_memory()
- async update_log_settings(
- settings,
- headers=None,
- as_json=False,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.update_log_settings()
- async update_trace_settings(
- model_name=None,
- settings={},
- headers=None,
- as_json=False,
- client_timeout=None,
Refer to
tritonclient.grpc.InferenceServerClient.update_trace_settings()
Modules