tritonclient.http

tritonclient.http#

class tritonclient.http.InferAsyncRequest(greenlet, verbose=False)#

An object of InferAsyncRequest class is used to describe a handle to an ongoing asynchronous inference request.

Parameters:
  • greenlet (gevent.Greenlet) – The greenlet object which will provide the results. For further details about greenlets refer http://www.gevent.org/api/gevent.greenlet.html.

  • verbose (bool) – If True generate verbose output. Default value is False.

get_result(block=True, timeout=None)#

Get the results of the associated asynchronous inference.

Parameters:
  • block (bool) – If block is True, the function will wait till the corresponding response is received from the server. Default value is True.

  • timeout (int) – The maximum wait time for the function. This setting is ignored if the block is set False. Default is None, which means the function will block indefinitely till the corresponding response is received.

Returns:

The object holding the result of the async inference.

Return type:

InferResult

Raises:

InferenceServerException – If server fails to perform inference or failed to respond within specified timeout.

class tritonclient.http.InferInput(name, shape, datatype)#

An object of InferInput class is used to describe input tensor for an inference request.

Parameters:
  • name (str) – The name of input whose data will be described by this object

  • shape (list) – The shape of the associated input.

  • datatype (str) – The datatype of the associated input.

_get_binary_data()#

Returns the raw binary data if available

Returns:

The raw data for the input tensor

Return type:

bytes

_get_tensor()#

Retrieve the underlying input as json dict.

Returns:

The underlying tensor specification as dict

Return type:

dict

datatype()#

Get the datatype of input associated with this object.

Returns:

The datatype of input

Return type:

str

name()#

Get the name of input associated with this object.

Returns:

The name of input

Return type:

str

set_data_from_numpy(input_tensor, binary_data=True)#

Set the tensor data from the specified numpy array for input associated with this object.

Parameters:
  • input_tensor (numpy array) – The tensor data in numpy array format

  • binary_data (bool) – Indicates whether to set data for the input in binary format or explicit tensor within JSON. The default value is True, which means the data will be delivered as binary data in the HTTP body after the JSON object.

Returns:

The updated input

Return type:

InferInput

Raises:

InferenceServerException – If failed to set data for the tensor.

set_shape(shape)#

Set the shape of input.

Parameters:

shape (list) – The shape of the associated input.

Returns:

The updated input

Return type:

InferInput

set_shared_memory(region_name, byte_size, offset=0)#

Set the tensor data from the specified shared memory region.

Parameters:
  • region_name (str) – The name of the shared memory region holding tensor data.

  • byte_size (int) – The size of the shared memory region holding tensor data.

  • offset (int) – The offset, in bytes, into the region where the data for the tensor starts. The default value is 0.

Returns:

The updated input

Return type:

InferInput

shape()#

Get the shape of input associated with this object.

Returns:

The shape of input

Return type:

list

class tritonclient.http.InferRequestedOutput(name, binary_data=True, class_count=0)#

An object of InferRequestedOutput class is used to describe a requested output tensor for an inference request.

Parameters:
  • name (str) – The name of output tensor to associate with this object.

  • binary_data (bool) – Indicates whether to return result data for the output in binary format or explicit tensor within JSON. The default value is True, which means the data will be delivered as binary data in the HTTP body after JSON object. This field will be unset if shared memory is set for the output.

  • class_count (int) – The number of classifications to be requested. The default value is 0 which means the classification results are not requested.

_get_tensor()#

Retrieve the underlying input as json dict.

Returns:

The underlying tensor as a dict

Return type:

dict

name()#

Get the name of output associated with this object.

Returns:

The name of output

Return type:

str

set_shared_memory(region_name, byte_size, offset=0)#

Marks the output to return the inference result in specified shared memory region.

Parameters:
  • region_name (str) – The name of the shared memory region to hold tensor data.

  • byte_size (int) – The size of the shared memory region to hold tensor data.

  • offset (int) – The offset, in bytes, into the region where the data for the tensor starts. The default value is 0.

unset_shared_memory()#

Clears the shared memory option set by the last call to InferRequestedOutput.set_shared_memory(). After call to this function requested output will no longer be returned in a shared memory region.

class tritonclient.http.InferResult(response, verbose)#

An object of InferResult class holds the response of an inference request and provide methods to retrieve inference results.

Parameters:
  • response (geventhttpclient.response.HTTPSocketPoolResponse) – The inference response from the server

  • verbose (bool) – If True generate verbose output. Default value is False.

as_numpy(name)#

Get the tensor data for output associated with this object in numpy format

Parameters:

name (str) – The name of the output tensor whose result is to be retrieved.

Returns:

The numpy array containing the response data for the tensor or None if the data for specified tensor name is not found.

Return type:

numpy array

classmethod from_response_body(response_body, verbose=False, header_length=None, content_encoding=None)#

A class method to construct InferResult object from a given ‘response_body’.

Parameters:
  • response_body (bytes) – The inference response from the server

  • verbose (bool) – If True generate verbose output. Default value is False.

  • header_length (int) – The length of the inference header if the header does not occupy the whole response body. Default value is None.

  • content_encoding (string) – The encoding of the response body if it is compressed. Default value is None.

Returns:

The InferResult object generated from the response body

Return type:

InferResult

get_output(name)#

Retrieves the output tensor corresponding to the named output.

Parameters:

name (str) – The name of the tensor for which Output is to be retrieved.

Returns:

If an output tensor with specified name is present in the infer response then returns it as a json dict, otherwise returns None.

Return type:

Dict

get_response()#

Retrieves the complete response

Returns:

The underlying response dict.

Return type:

dict

class tritonclient.http.InferenceServerClient(url, verbose=False, concurrency=1, connection_timeout=60.0, network_timeout=60.0, max_greenlets=None, ssl=False, ssl_options=None, ssl_context_factory=None, insecure=False)#

An InferenceServerClient object is used to perform any kind of communication with the InferenceServer using http protocol. None of the methods are thread safe. The object is intended to be used by a single thread and simultaneously calling different methods with different threads is not supported and will cause undefined behavior.

Parameters:
  • url (str) – The inference server name, port and optional base path in the following format: host:port/<base-path>, e.g. ‘localhost:8000’.

  • verbose (bool) – If True generate verbose output. Default value is False.

  • concurrency (int) – The number of connections to create for this client. Default value is 1.

  • connection_timeout (float) – The timeout value for the connection. Default value is 60.0 sec.

  • network_timeout (float) – The timeout value for the network. Default value is 60.0 sec

  • max_greenlets (int) – Determines the maximum allowed number of worker greenlets for handling asynchronous inference requests. Default value is None, which means there will be no restriction on the number of greenlets created.

  • ssl (bool) – If True, channels the requests to encrypted https scheme. Some improper settings may cause connection to prematurely terminate with an unsuccessful handshake. See ssl_context_factory option for using secure default settings. Default value for this option is False.

  • ssl_options (dict) – Any options supported by ssl.wrap_socket specified as dictionary. The argument is ignored if ‘ssl’ is specified False.

  • ssl_context_factory (SSLContext callable) – It must be a callbable that returns a SSLContext. Set to gevent.ssl.create_default_context to use contexts with secure default settings. This should most likely resolve connection issues in a secure way. The default value for this option is None which directly wraps the socket with the options provided via ssl_options. The argument is ignored if ‘ssl’ is specified False.

  • insecure (bool) – If True, then does not match the host name with the certificate. Default value is False. The argument is ignored if ‘ssl’ is specified False.

Raises:

Exception – If unable to create a client.

_get(request_uri, headers, query_params)#

Issues the GET request to the server

Parameters:
  • request_uri (str) – The request URI to be used in GET request.

  • headers (dict) – Additional HTTP headers to include in the request.

  • query_params (dict) – Optional url query parameters to use in network transaction.

Returns:

The response from server.

Return type:

geventhttpclient.response.HTTPSocketPoolResponse

_post(request_uri, request_body, headers, query_params)#

Issues the POST request to the server

Parameters:
  • request_uri (str) – The request URI to be used in POST request.

  • request_body (str) – The body of the request

  • headers (dict) – Additional HTTP headers to include in the request.

  • query_params (dict) – Optional url query parameters to use in network transaction.

Returns:

The response from server.

Return type:

geventhttpclient.response.HTTPSocketPoolResponse

_validate_headers(headers)#

Checks for any unsupported HTTP headers before processing a request.

Parameters:

headers (dict) – HTTP headers to validate before processing the request.

Raises:

InferenceServerException – If an unsupported HTTP header is included in a request.

async_infer(model_name, inputs, model_version='', outputs=None, request_id='', sequence_id=0, sequence_start=False, sequence_end=False, priority=0, timeout=None, headers=None, query_params=None, request_compression_algorithm=None, response_compression_algorithm=None, parameters=None)#

Run asynchronous inference using the supplied ‘inputs’ requesting the outputs specified by ‘outputs’. Even though this call is non-blocking, however, the actual number of concurrent requests to the server will be limited by the ‘concurrency’ parameter specified while creating this client. In other words, if the inflight async_infer exceeds the specified ‘concurrency’, the delivery of the exceeding request(s) to server will be blocked till the slot is made available by retrieving the results of previously issued requests.

Parameters:
  • model_name (str) – The name of the model to run inference.

  • inputs (list) – A list of InferInput objects, each describing data for a input tensor required by the model.

  • model_version (str) – The version of the model to run inference. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • outputs (list) – A list of InferRequestedOutput objects, each describing how the output data must be returned. If not specified all outputs produced by the model will be returned using default settings.

  • request_id (str) – Optional identifier for the request. If specified will be returned in the response. Default value is ‘None’ which means no request_id will be used.

  • sequence_id (int) – The unique identifier for the sequence being represented by the object. Default value is 0 which means that the request does not belong to a sequence.

  • sequence_start (bool) – Indicates whether the request being added marks the start of the sequence. Default value is False. This argument is ignored if ‘sequence_id’ is 0.

  • sequence_end (bool) – Indicates whether the request being added marks the end of the sequence. Default value is False. This argument is ignored if ‘sequence_id’ is 0.

  • priority (int) – Indicates the priority of the request. Priority value zero indicates that the default priority level should be used (i.e. same behavior as not specifying the priority parameter). Lower value priorities indicate higher priority levels. Thus the highest priority level is indicated by setting the parameter to 1, the next highest is 2, etc. If not provided, the server will handle the request using default setting for the model.

  • timeout (int) – The timeout value for the request, in microseconds. If the request cannot be completed within the time the server can take a model-specific action such as terminating the request. If not provided, the server will handle the request using default setting for the model. This option is only respected by the model that is configured with dynamic batching. See here for more details: triton-inference-server/server

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request

  • query_params (dict) – Optional url query parameters to use in network transaction.

  • request_compression_algorithm (str) – Optional HTTP compression algorithm to use for the request body on client side. Currently supports “deflate”, “gzip” and None. By default, no compression is used.

  • response_compression_algorithm (str) – Optional HTTP compression algorithm to request for the response body. Note that the response may not be compressed if the server does not support the specified algorithm. Currently supports “deflate”, “gzip” and None. By default, no compression is requested.

  • parameters (dict) – Optional custom parameters to be included in the inference request.

Returns:

The handle to the asynchronous inference request.

Return type:

InferAsyncRequest

Raises:

InferenceServerException – If server fails to issue inference.

close()#

Close the client. Any future calls to server will result in an Error.

static generate_request_body(inputs, outputs=None, request_id='', sequence_id=0, sequence_start=False, sequence_end=False, priority=0, timeout=None, parameters=None)#

Generate a request body for inference using the supplied ‘inputs’ requesting the outputs specified by ‘outputs’.

Parameters:
  • inputs (list) – A list of InferInput objects, each describing data for a input tensor required by the model.

  • outputs (list) – A list of InferRequestedOutput objects, each describing how the output data must be returned. If not specified all outputs produced by the model will be returned using default settings.

  • request_id (str) – Optional identifier for the request. If specified will be returned in the response. Default value is an empty string which means no request_id will be used.

  • sequence_id (int or str) – The unique identifier for the sequence being represented by the object. A value of 0 or “” means that the request does not belong to a sequence. Default is 0.

  • sequence_start (bool) – Indicates whether the request being added marks the start of the sequence. Default value is False. This argument is ignored if ‘sequence_id’ is 0.

  • sequence_end (bool) – Indicates whether the request being added marks the end of the sequence. Default value is False. This argument is ignored if ‘sequence_id’ is 0.

  • priority (int) – Indicates the priority of the request. Priority value zero indicates that the default priority level should be used (i.e. same behavior as not specifying the priority parameter). Lower value priorities indicate higher priority levels. Thus the highest priority level is indicated by setting the parameter to 1, the next highest is 2, etc. If not provided, the server will handle the request using default setting for the model.

  • timeout (int) – The timeout value for the request, in microseconds. If the request cannot be completed within the time the server can take a model-specific action such as terminating the request. If not provided, the server will handle the request using default setting for the model. This option is only respected by the model that is configured with dynamic batching. See here for more details: triton-inference-server/server

  • parameters (dict) – Optional fields to be included in the ‘parameters’ fields.

Returns:

  • Bytes – The request body of the inference.

  • Int – The byte size of the inference request header in the request body. Returns None if the whole request body constitutes the request header.

Raises:

InferenceServerException – If server fails to perform inference.

get_cuda_shared_memory_status(region_name='', headers=None, query_params=None)#

Request cuda shared memory status from the server.

Parameters:
  • region_name (str) – The name of the region to query status. The default value is an empty string, which means that the status of all active cuda shared memory will be returned.

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request

  • query_params (dict) – Optional url query parameters to use in network transaction

Returns:

The JSON dict holding cuda shared memory status.

Return type:

dict

Raises:

InferenceServerException – If unable to get the status of specified shared memory.

get_inference_statistics(model_name='', model_version='', headers=None, query_params=None)#

Get the inference statistics for the specified model name and version.

Parameters:
  • model_name (str) – The name of the model to get statistics. The default value is an empty string, which means statistics of all models will be returned.

  • model_version (str) – The version of the model to get inference statistics. The default value is an empty string which means then the server will return the statistics of all available model versions.

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request.

  • query_params (dict) – Optional url query parameters to use in network transaction

Returns:

The JSON dict holding the model inference statistics.

Return type:

dict

Raises:

InferenceServerException – If unable to get the model inference statistics.

get_log_settings(headers=None, query_params=None)#

Get the global log settings for the Triton server

Parameters:
  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request.

  • query_params (dict) – Optional url query parameters to use in network transaction

Returns:

The JSON dict holding the log settings.

Return type:

dict

Raises:

InferenceServerException – If unable to get the log settings.

get_model_config(model_name, model_version='', headers=None, query_params=None)#

Contact the inference server and get the configuration for specified model.

Parameters:
  • model_name (str) – The name of the model

  • model_version (str) – The version of the model to get configuration. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request

  • query_params (dict) – Optional url query parameters to use in network transaction

Returns:

The JSON dict holding the model config.

Return type:

dict

Raises:

InferenceServerException – If unable to get model configuration.

get_model_metadata(model_name, model_version='', headers=None, query_params=None)#

Contact the inference server and get the metadata for specified model.

Parameters:
  • model_name (str) – The name of the model

  • model_version (str) – The version of the model to get metadata. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request

  • query_params (dict) – Optional url query parameters to use in network transaction

Returns:

The JSON dict holding the metadata.

Return type:

dict

Raises:

InferenceServerException – If unable to get model metadata.

get_model_repository_index(headers=None, query_params=None)#

Get the index of model repository contents

Parameters:
  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request

  • query_params (dict) – Optional url query parameters to use in network transaction

Returns:

The JSON dict holding the model repository index.

Return type:

dict

Raises:

InferenceServerException – If unable to get the repository index.

get_server_metadata(headers=None, query_params=None)#

Contact the inference server and get its metadata.

Parameters:
  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request.

  • query_params (dict) – Optional url query parameters to use in network transaction.

Returns:

The JSON dict holding the metadata.

Return type:

dict

Raises:

InferenceServerException – If unable to get server metadata.

get_system_shared_memory_status(region_name='', headers=None, query_params=None)#

Request system shared memory status from the server.

Parameters:
  • region_name (str) – The name of the region to query status. The default value is an empty string, which means that the status of all active system shared memory will be returned.

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request

  • query_params (dict) – Optional url query parameters to use in network transaction

Returns:

The JSON dict holding system shared memory status.

Return type:

dict

Raises:

InferenceServerException – If unable to get the status of specified shared memory.

get_trace_settings(model_name=None, headers=None, query_params=None)#

Get the trace settings for the specified model name, or global trace settings if model name is not given

Parameters:
  • model_name (str) – The name of the model to get trace settings. Specifying None or empty string will return the global trace settings. The default value is None.

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request.

  • query_params (dict) – Optional url query parameters to use in network transaction

Returns:

The JSON dict holding the trace settings.

Return type:

dict

Raises:

InferenceServerException – If unable to get the trace settings.

infer(model_name, inputs, model_version='', outputs=None, request_id='', sequence_id=0, sequence_start=False, sequence_end=False, priority=0, timeout=None, headers=None, query_params=None, request_compression_algorithm=None, response_compression_algorithm=None, parameters=None)#

Run synchronous inference using the supplied ‘inputs’ requesting the outputs specified by ‘outputs’.

Parameters:
  • model_name (str) – The name of the model to run inference.

  • inputs (list) – A list of InferInput objects, each describing data for a input tensor required by the model.

  • model_version (str) – The version of the model to run inference. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • outputs (list) – A list of InferRequestedOutput objects, each describing how the output data must be returned. If not specified all outputs produced by the model will be returned using default settings.

  • request_id (str) – Optional identifier for the request. If specified will be returned in the response. Default value is an empty string which means no request_id will be used.

  • sequence_id (int) – The unique identifier for the sequence being represented by the object. Default value is 0 which means that the request does not belong to a sequence.

  • sequence_start (bool) – Indicates whether the request being added marks the start of the sequence. Default value is False. This argument is ignored if ‘sequence_id’ is 0.

  • sequence_end (bool) – Indicates whether the request being added marks the end of the sequence. Default value is False. This argument is ignored if ‘sequence_id’ is 0.

  • priority (int) – Indicates the priority of the request. Priority value zero indicates that the default priority level should be used (i.e. same behavior as not specifying the priority parameter). Lower value priorities indicate higher priority levels. Thus the highest priority level is indicated by setting the parameter to 1, the next highest is 2, etc. If not provided, the server will handle the request using default setting for the model.

  • timeout (int) – The timeout value for the request, in microseconds. If the request cannot be completed within the time the server can take a model-specific action such as terminating the request. If not provided, the server will handle the request using default setting for the model. This option is only respected by the model that is configured with dynamic batching. See here for more details: triton-inference-server/server

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request.

  • query_params (dict) – Optional url query parameters to use in network transaction.

  • request_compression_algorithm (str) – Optional HTTP compression algorithm to use for the request body on client side. Currently supports “deflate”, “gzip” and None. By default, no compression is used.

  • response_compression_algorithm (str) – Optional HTTP compression algorithm to request for the response body. Note that the response may not be compressed if the server does not support the specified algorithm. Currently supports “deflate”, “gzip” and None. By default, no compression is requested.

  • parameters (dict) – Optional fields to be included in the ‘parameters’ fields.

Returns:

The object holding the result of the inference.

Return type:

InferResult

Raises:

InferenceServerException – If server fails to perform inference.

is_model_ready(model_name, model_version='', headers=None, query_params=None)#

Contact the inference server and get the readiness of specified model.

Parameters:
  • model_name (str) – The name of the model to check for readiness.

  • model_version (str) – The version of the model to check for readiness. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request.

  • query_params (dict) – Optional url query parameters to use in network transaction.

Returns:

True if the model is ready, False if not ready.

Return type:

bool

Raises:

Exception – If unable to get model readiness.

is_server_live(headers=None, query_params=None)#

Contact the inference server and get liveness.

Parameters:
  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request.

  • query_params (dict) – Optional url query parameters to use in network transaction.

Returns:

True if server is live, False if server is not live.

Return type:

bool

Raises:

Exception – If unable to get liveness.

is_server_ready(headers=None, query_params=None)#

Contact the inference server and get readiness.

Parameters:
  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request.

  • query_params (dict) – Optional url query parameters to use in network transaction.

Returns:

True if server is ready, False if server is not ready.

Return type:

bool

Raises:

Exception – If unable to get readiness.

load_model(model_name, headers=None, query_params=None, config=None, files=None)#

Request the inference server to load or reload specified model.

Parameters:
  • model_name (str) – The name of the model to be loaded.

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request.

  • query_params (dict) – Optional url query parameters to use in network transaction.

  • config (str) – Optional JSON representation of a model config provided for the load request, if provided, this config will be used for loading the model.

  • files (dict) – Optional dictionary specifying file path (with “file:” prefix) in the override model directory to the file content as bytes. The files will form the model directory that the model will be loaded from. If specified, ‘config’ must be provided to be the model configuration of the override model directory.

Raises:

InferenceServerException – If unable to load the model.

static parse_response_body(response_body, verbose=False, header_length=None, content_encoding=None)#

Generate a InferResult object from the given ‘response_body’

Parameters:
  • response_body (bytes) – The inference response from the server

  • verbose (bool) – If True generate verbose output. Default value is False.

  • header_length (int) – The length of the inference header if the header does not occupy the whole response body. Default value is None.

  • content_encoding (string) – The encoding of the response body if it is compressed. Default value is None.

Returns:

The InferResult object generated from the response body

Return type:

InferResult

register_cuda_shared_memory(name, raw_handle, device_id, byte_size, headers=None, query_params=None)#

Request the server to register a system shared memory with the following specification.

Parameters:
  • name (str) – The name of the region to register.

  • raw_handle (bytes) – The raw serialized cudaIPC handle in base64 encoding.

  • device_id (int) – The GPU device ID on which the cudaIPC handle was created.

  • byte_size (int) – The size of the cuda shared memory region, in bytes.

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request

  • query_params (dict) – Optional url query parameters to use in network transaction

Raises:

InferenceServerException – If unable to register the specified cuda shared memory.

register_system_shared_memory(name, key, byte_size, offset=0, headers=None, query_params=None)#

Request the server to register a system shared memory with the following specification.

Parameters:
  • name (str) – The name of the region to register.

  • key (str) – The key of the underlying memory object that contains the system shared memory region.

  • byte_size (int) – The size of the system shared memory region, in bytes.

  • offset (int) – Offset, in bytes, within the underlying memory object to the start of the system shared memory region. The default value is zero.

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request

  • query_params (dict) – Optional url query parameters to use in network transaction

Raises:

InferenceServerException – If unable to register the specified system shared memory.

unload_model(model_name, headers=None, query_params=None, unload_dependents=False)#

Request the inference server to unload specified model.

Parameters:
  • model_name (str) – The name of the model to be unloaded.

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request

  • query_params (dict) – Optional url query parameters to use in network transaction

  • unload_dependents (bool) – Whether the dependents of the model should also be unloaded.

Raises:

InferenceServerException – If unable to unload the model.

unregister_cuda_shared_memory(name='', headers=None, query_params=None)#

Request the server to unregister a cuda shared memory with the specified name.

Parameters:
  • name (str) – The name of the region to unregister. The default value is empty string which means all the cuda shared memory regions will be unregistered.

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request

  • query_params (dict) – Optional url query parameters to use in network transaction

Raises:

InferenceServerException – If unable to unregister the specified cuda shared memory region.

unregister_system_shared_memory(name='', headers=None, query_params=None)#

Request the server to unregister a system shared memory with the specified name.

Parameters:
  • name (str) – The name of the region to unregister. The default value is empty string which means all the system shared memory regions will be unregistered.

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request

  • query_params (dict) – Optional url query parameters to use in network transaction

Raises:

InferenceServerException – If unable to unregister the specified system shared memory region.

update_log_settings(settings, headers=None, query_params=None)#

Update the global log settings of the Triton server.

Parameters:
  • settings (dict) – The new log setting values. Only the settings listed will be updated.

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request.

  • query_params (dict) – Optional url query parameters to use in network transaction

Returns:

The JSON dict holding the updated log settings.

Return type:

dict

Raises:

InferenceServerException – If unable to update the log settings.

update_trace_settings(model_name=None, settings={}, headers=None, query_params=None)#

Update the trace settings for the specified model name, or global trace settings if model name is not given. Returns the trace settings after the update.

Parameters:
  • model_name (str) – The name of the model to update trace settings. Specifying None or empty string will update the global trace settings. The default value is None.

  • settings (dict) – The new trace setting values. Only the settings listed will be updated. If a trace setting is listed in the dictionary with a value of ‘None’, that setting will be cleared.

  • headers (dict) – Optional dictionary specifying additional HTTP headers to include in the request.

  • query_params (dict) – Optional url query parameters to use in network transaction

Returns:

The JSON dict holding the updated trace settings.

Return type:

dict

Raises:

InferenceServerException – If unable to update the trace settings.

class tritonclient.http.InferenceServerClientPlugin#

Every Triton Client Plugin should extend this class. Each plugin needs to implement the __call__() method.

abstract __call__(request)#

This method will be called when any of the client functions are invoked. Note that the request object must be modified in-place.

Parameters:

request (Request) – The request object.

_abc_impl = <_abc._abc_data object>#
exception tritonclient.http.InferenceServerException(msg, status=None, debug_details=None)#

Exception indicating non-Success status.

Parameters:
  • msg (str) – A brief description of error

  • status (str) – The error code

  • debug_details (str) – The additional details on the error

debug_details()#

Get the detailed information about the exception for debugging purposes

Returns:

Returns the exception details

Return type:

str

message()#

Get the exception message.

Returns:

The message associated with this exception, or None if no message.

Return type:

str

status()#

Get the status of the exception.

Returns:

Returns the status of the exception

Return type:

str

class tritonclient.http.Request(headers)#

A request object.

Parameters:

headers (dict) – A dictionary containing the request headers.

Modules

tritonclient.http.aio

tritonclient.http.auth