Python API¶
Client¶
-
class
tensorrtserver.api.
InferContext
(url, protocol, model_name, model_version=None, verbose=False, correlation_id=0, streaming=False)¶ An InferContext object is used to run inference on an inference server for a specific model.
Once created an InferContext object can be used repeatedly to perform inference using the model.
- Parameters
url (str) – The inference server URL, e.g. localhost:8000.
protocol (ProtocolType) – The protocol used to communicate with the server.
model_name (str) – The name of the model to use for inference.
model_version (int) – The version of the model to use for inference, or None to indicate that the latest (i.e. highest version number) version should be used.
correlation_id (int) – The correlation ID for the inference. If not specified (or if specified as 0), the inference will have no correlation ID.
streaming (bool) – If True create streaming context. Streaming is only allowed with gRPC protocol.
verbose (bool) – If True generate verbose output.
-
class
ResultFormat
¶ Formats for output tensor results.
- RAW
All values of the output are returned as an numpy array of the appropriate type.
- CLASS
Specified as tuple (CLASS, k). Top ‘k’ results are returned as an array of (index, value, label) tuples.
-
async_run
(inputs, outputs, batch_size=1, flags=0)¶ Run inference using the supplied ‘inputs’ to calculate the outputs specified by ‘outputs’.
Unlike run(), async_run() returns immediately after sending the inference request to the server. The returned integer identifier must be used subsequently to wait on and retrieve the actual inference results.
- Parameters
inputs (dict) – Dictionary from input name to the value(s) for that input. An input value is specified as a numpy array. Each input in the dictionary maps to a list of values (i.e. a list of numpy array objects), where the length of the list must equal the ‘batch_size’.
outputs (dict) – Dictionary from output name to a value indicating the ResultFormat that should be used for that output. For RAW the value should be ResultFormat.RAW. For CLASS the value should be a tuple (ResultFormat.CLASS, k), where ‘k’ indicates how many classification results should be returned for the output.
batch_size (int) – The batch size of the inference. Each input must provide an appropriately sized batch of inputs.
flags (int) – The flags to use for the inference. The bitwise-or of InferRequestHeader.Flag values.
- Returns
Integer identifier which must be passed to get_async_run_results() to wait on and retrieve the inference results.
- Return type
int
- Raises
InferenceServerException – If all inputs are not specified, if the size of input data does not match expectations, if unknown output names are specified or if server fails to perform inference.
-
close
()¶ Close the context. Any future calls to object will result in an Error.
-
correlation_id
()¶ Get the correlation ID associated with the context.
- Returns
The correlation ID.
- Return type
int
-
get_async_run_results
(request_id, wait)¶ Retrieve the results of a previous async_run() using the supplied ‘request_id’
- Parameters
request_id (int) – The integer ID of the asynchronous request returned by async_run().
wait (bool) – If True block until the request results are ready. If False return immediately even if results are not ready.
- Returns
None if the results are not ready and ‘wait’ is False. A dictionary from output name to the list of values for that output (one list element for each entry of the batch). The format of a value returned for an output depends on the output format specified in ‘outputs’. For format RAW a value is a numpy array of the appropriate type and shape for the output. For format CLASS a value is the top ‘k’ output values returned as an array of (class index, class value, class label) tuples.
- Return type
dict
- Raises
InferenceServerException – If the request ID supplied is not valid, or if the server fails to perform inference.
-
get_last_request_id
()¶ Get the request ID of the most recent run() request.
- Returns
The request ID, or None if a request has not yet been made or if the last request was not successful.
- Return type
int
-
get_last_request_model_name
()¶ Get the model name used in the most recent run() request.
- Returns
The model name, or None if a request has not yet been made or if the last request was not successful.
- Return type
str
-
get_last_request_model_version
()¶ Get the model version used in the most recent run() request.
- Returns
The model version, or None if a request has not yet been made or if the last request was not successful.
- Return type
int
-
get_ready_async_request
(wait)¶ Get the request ID of an async_run() request that has completed but not yet had results read with get_async_run_results().
- Parameters
wait (bool) – If True block until an async request is ready. If False return immediately even if results are not ready.
- Returns
None if no asynchronous results are ready and ‘wait’ is False. An integer identifier which must be passed to get_async_run_results() to wait on and retrieve the inference results.
- Return type
int
- Raises
InferenceServerException – If no asynchronous request is in flight or completed.
-
run
(inputs, outputs, batch_size=1, flags=0)¶ Run inference using the supplied ‘inputs’ to calculate the outputs specified by ‘outputs’.
- Parameters
inputs (dict) – Dictionary from input name to the value(s) for that input. An input value is specified as a numpy array. Each input in the dictionary maps to a list of values (i.e. a list of numpy array objects), where the length of the list must equal the ‘batch_size’.
outputs (dict) – Dictionary from output name to a value indicating the ResultFormat that should be used for that output. For RAW the value should be ResultFormat.RAW. For CLASS the value should be a tuple (ResultFormat.CLASS, k), where ‘k’ indicates how many classification results should be returned for the output.
batch_size (int) – The batch size of the inference. Each input must provide an appropriately sized batch of inputs.
flags (int) – The flags to use for the inference. The bitwise-or of InferRequestHeader.Flag values.
- Returns
A dictionary from output name to the list of values for that output (one list element for each entry of the batch). The format of a value returned for an output depends on the output format specified in ‘outputs’. For format RAW a value is a numpy array of the appropriate type and shape for the output. For format CLASS a value is the top ‘k’ output values returned as an array of (class index, class value, class label) tuples.
- Return type
dict
- Raises
InferenceServerException – If all inputs are not specified, if the size of input data does not match expectations, if unknown output names are specified or if server fails to perform inference.
-
exception
tensorrtserver.api.
InferenceServerException
(err)¶ Exception indicating non-Success status.
- Parameters
err (c_void_p) – Pointer to an Error that should be used to initialize the exception.
-
message
()¶ Get the exception message.
- Returns
The message associated with this exception, or None if no message.
- Return type
str
-
request_id
()¶ Get the ID of the request with this exception.
- Returns
The ID of the request associated with this exception, or 0 (zero) if no request is associated.
- Return type
int
-
server_id
()¶ Get the ID of the server associated with this exception.
- Returns
The ID of the server associated with this exception, or None if no server is associated.
- Return type
str
-
class
tensorrtserver.api.
ProtocolType
¶ Protocol types supported by the client API
- HTTP
The HTTP protocol.
- GRPC
The GRPC protocol.
-
class
tensorrtserver.api.
ServerHealthContext
(url, protocol, verbose=False)¶ Performs a health request to an inference server.
- Parameters
url (str) – The inference server URL, e.g. localhost:8000.
protocol (ProtocolType) – The protocol used to communicate with the server.
verbose (bool) – If True generate verbose output.
-
close
()¶ Close the context. Any future calls to is_ready() or is_live() will result in an Error.
-
get_last_request_id
()¶ Get the request ID of the most recent is_ready() or is_live() request.
- Returns
The request ID, or None if a request has not yet been made or if the last request was not successful.
- Return type
int
-
is_live
()¶ Contact the inference server and get liveness.
- Returns
True if server is live, False if server is not live.
- Return type
bool
- Raises
InferenceServerException – If unable to get liveness.
-
is_ready
()¶ Contact the inference server and get readiness.
- Returns
True if server is ready, False if server is not ready.
- Return type
bool
- Raises
InferenceServerException – If unable to get readiness.
-
class
tensorrtserver.api.
ServerStatusContext
(url, protocol, model_name=None, verbose=False)¶ Performs a status request to an inference server.
A request can be made to get status for the server and all models managed by the server, or to get status foronly a single model.
- Parameters
url (str) – The inference server URL, e.g. localhost:8000.
protocol (ProtocolType) – The protocol used to communicate with the server.
model_name (str) – The name of the model to get status for, or None to get status for all models managed by the server.
verbose (bool) – If True generate verbose output.
-
close
()¶ Close the context. Any future calls to get_server_status() will result in an Error.
-
get_last_request_id
()¶ Get the request ID of the most recent get_server_status() request.
- Returns
The request ID, or None if a request has not yet been made or if the last request was not successful.
- Return type
int
-
get_server_status
()¶ Contact the inference server and get status.
- Returns
The ServerStatus protobuf containing the status.
- Return type
ServerStatus
- Raises
InferenceServerException – If unable to get status.