Python API¶
Client¶
-
class
tensorrtserver.api.
InferContext
(url, protocol, model_name, model_version=None, verbose=False)¶ An InferContext object is used to run inference on an inference server for a specific model.
Once created an InferContext object can be used repeatedly to perform inference using the model.
Parameters: - url (str) – The inference server URL, e.g. localhost:8000.
- protocol (ProtocolType) – The protocol used to communicate with the server.
- model_name (str) – The name of the model to get status for, or None to get status for all models managed by the server.
- model_version (int) – The version of the model to use for inference, or None to indicate that the latest (i.e. highest version number) version should be used.
- verbose (bool) – If True generate verbose output.
-
class
ResultFormat
¶ Formats for output tensor results.
- RAW
- All values of the output are returned as an numpy array of the appropriate type.
- CLASS
- Specified as tuple (CLASS, k). Top ‘k’ results are returned as an array of (index, value, label) tuples.
-
async_run
(inputs, outputs, batch_size=1)¶ Run inference using the supplied ‘inputs’ to calculate the outputs specified by ‘outputs’.
Unlike run(), async_run() returns immediately after sending the inference request to the server. The returned integer identifier must be used subsequently to wait on and retrieve the actual inference results.
Parameters: - inputs (dict) – Dictionary from input name to the value(s) for that input. An input value is specified as a numpy array. Each input in the dictionary maps to a list of values (i.e. a list of numpy array objects), where the length of the list must equal the ‘batch_size’.
- outputs (dict) – Dictionary from output name to a value indicating the ResultFormat that should be used for that output. For RAW the value should be ResultFormat.RAW. For CLASS the value should be a tuple (ResultFormat.CLASS, k), where ‘k’ indicates how many classification results should be returned for the output.
- batch_size (int) – The batch size of the inference. Each input must provide an appropriately sized batch of inputs.
Returns: Integer identifier which must be passed to get_async_run_results() to wait on and retrieve the inference results.
Return type: int
Raises: InferenceServerException
– If all inputs are not specified, if the size of input data does not match expectations, if unknown output names are specified or if server fails to perform inference.
-
close
()¶ Close the context. Any future calls to object will result in an Error.
-
get_async_run_results
(request_id, wait)¶ Retrieve the results of a previous async_run() using the supplied ‘request_id’
Parameters: - request_id (int) – The integer ID of the asynchronous request returned by async_run().
- wait (bool) – If True block until the request results are ready. If False return immediately even if results are not ready.
Returns: None if the results are not ready and ‘wait’ is False. A dictionary from output name to the list of values for that output (one list element for each entry of the batch). The format of a value returned for an output depends on the output format specified in ‘outputs’. For format RAW a value is a numpy array of the appropriate type and shape for the output. For format CLASS a value is the top ‘k’ output values returned as an array of (class index, class value, class label) tuples.
Return type: dict
Raises: InferenceServerException
– If the request ID supplied is not valid, or if the server fails to perform inference.
-
get_last_request_id
()¶ Get the request ID of the most recent run() request.
Returns: The request ID, or None if a request has not yet been made or if the last request was not successful. Return type: int
-
get_last_request_model_name
()¶ Get the model name used in the most recent run() request.
Returns: The model name, or None if a request has not yet been made or if the last request was not successful. Return type: str
-
get_last_request_model_version
()¶ Get the model version used in the most recent run() request.
Returns: The model version, or None if a request has not yet been made or if the last request was not successful. Return type: int
-
get_ready_async_request
(wait)¶ Get the request ID of an async_run() request that has completed but not yet had results read with get_async_run_results().
Parameters: wait (bool) – If True block until an async request is ready. If False return immediately even if results are not ready. Returns: None if no asynchronous results are ready and ‘wait’ is False. An integer identifier which must be passed to get_async_run_results() to wait on and retrieve the inference results. Return type: int Raises: InferenceServerException
– If no asynchronous request is in flight or completed.
-
run
(inputs, outputs, batch_size=1)¶ Run inference using the supplied ‘inputs’ to calculate the outputs specified by ‘outputs’.
Parameters: - inputs (dict) – Dictionary from input name to the value(s) for that input. An input value is specified as a numpy array. Each input in the dictionary maps to a list of values (i.e. a list of numpy array objects), where the length of the list must equal the ‘batch_size’.
- outputs (dict) – Dictionary from output name to a value indicating the ResultFormat that should be used for that output. For RAW the value should be ResultFormat.RAW. For CLASS the value should be a tuple (ResultFormat.CLASS, k), where ‘k’ indicates how many classification results should be returned for the output.
- batch_size (int) – The batch size of the inference. Each input must provide an appropriately sized batch of inputs.
Returns: A dictionary from output name to the list of values for that output (one list element for each entry of the batch). The format of a value returned for an output depends on the output format specified in ‘outputs’. For format RAW a value is a numpy array of the appropriate type and shape for the output. For format CLASS a value is the top ‘k’ output values returned as an array of (class index, class value, class label) tuples.
Return type: dict
Raises: InferenceServerException
– If all inputs are not specified, if the size of input data does not match expectations, if unknown output names are specified or if server fails to perform inference.
-
exception
tensorrtserver.api.
InferenceServerException
(err)¶ Exception indicating non-Success status.
Parameters: err (c_void_p) – Pointer to an Error that should be used to initialize the exception. -
message
()¶ Get the exception message.
Returns: The message associated with this exception, or None if no message. Return type: str
-
request_id
()¶ Get the ID of the request with this exception.
Returns: The ID of the request associated with this exception, or 0 (zero) if no request is associated. Return type: int
-
server_id
()¶ Get the ID of the server associated with this exception.
Returns: The ID of the server associated with this exception, or None if no server is associated. Return type: str
-
-
class
tensorrtserver.api.
ProtocolType
¶ Protocol types supported by the client API
- HTTP
- The HTTP protocol.
- GRPC
- The GRPC protocol.
-
class
tensorrtserver.api.
ServerHealthContext
(url, protocol, verbose=False)¶ Performs a health request to an inference server.
Parameters: - url (str) – The inference server URL, e.g. localhost:8000.
- protocol (ProtocolType) – The protocol used to communicate with the server.
- verbose (bool) – If True generate verbose output.
-
close
()¶ Close the context. Any future calls to is_ready() or is_live() will result in an Error.
-
get_last_request_id
()¶ Get the request ID of the most recent is_ready() or is_live() request.
Returns: The request ID, or None if a request has not yet been made or if the last request was not successful. Return type: int
-
is_live
()¶ Contact the inference server and get liveness.
Returns: True if server is live, False if server is not live. Return type: bool Raises: InferenceServerException
– If unable to get liveness.
-
is_ready
()¶ Contact the inference server and get readiness.
Returns: True if server is ready, False if server is not ready. Return type: bool Raises: InferenceServerException
– If unable to get readiness.
-
class
tensorrtserver.api.
ServerStatusContext
(url, protocol, model_name=None, verbose=False)¶ Performs a status request to an inference server.
A request can be made to get status for the server and all models managed by the server, or to get status foronly a single model.
Parameters: - url (str) – The inference server URL, e.g. localhost:8000.
- protocol (ProtocolType) – The protocol used to communicate with the server.
- model_name (str) – The name of the model to get status for, or None to get status for all models managed by the server.
- verbose (bool) – If True generate verbose output.
-
close
()¶ Close the context. Any future calls to get_server_status() will result in an Error.
-
get_last_request_id
()¶ Get the request ID of the most recent get_server_status() request.
Returns: The request ID, or None if a request has not yet been made or if the last request was not successful. Return type: int
-
get_server_status
()¶ Contact the inference server and get status.
Returns: The ServerStatus protobuf containing the status. Return type: ServerStatus Raises: InferenceServerException
– If unable to get status.