Class InferenceServerHttpClient¶
Defined in File http_client.h
Inheritance Relationships¶
Base Type¶
public nvidia::inferenceserver::client::InferenceServerClient
(Class InferenceServerClient)
Class Documentation¶
-
class
InferenceServerHttpClient
: public nvidia::inferenceserver::client::InferenceServerClient¶ An InferenceServerHttpClient object is used to perform any kind of communication with the InferenceServer using HTTP protocol.
None of the methods of InferenceServerHttpClient are thread safe. The class is intended to be used by a single thread and simultaneously calling different methods with different threads is not supported and will cause undefined behavior.
std::unique_ptr<InferenceServerHttpClient> client; InferenceServerHttpClient::Create(&client, "localhost:8000"); bool live; client->IsServerLive(&live); ... ...
Public Functions
-
~InferenceServerHttpClient
()¶
-
Error
IsServerLive
(bool *live, const Headers &headers = Headers(), const Parameters &query_params = Parameters())¶ Contact the inference server and get its liveness.
- Return
Error object indicating success or failure of the request.
- Parameters
live
: Returns whether the server is live or not.headers
: Optional map specifying additional HTTP headers to include in request.
-
Error
IsServerReady
(bool *ready, const Headers &headers = Headers(), const Parameters &query_params = Parameters())¶ Contact the inference server and get its readiness.
- Return
Error object indicating success or failure of the request.
- Parameters
ready
: Returns whether the server is ready or not.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
-
Error
IsModelReady
(bool *ready, const std::string &model_name, const std::string &model_version = "", const Headers &headers = Headers(), const Parameters &query_params = Parameters())¶ Contact the inference server and get the readiness of specified model.
- Return
Error object indicating success or failure of the request.
- Parameters
ready
: Returns whether the specified model is ready or not.model_name
: The name of the model to check for readiness.model_version
: The version of the model to check for readiness. The default value is an empty string which means then the server will choose a version based on the model and internal policy.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
-
Error
ServerMetadata
(std::string *server_metadata, const Headers &headers = Headers(), const Parameters &query_params = Parameters())¶ Contact the inference server and get its metadata.
- Return
Error object indicating success or failure of the request.
- Parameters
server_metadata
: Returns JSON representation of the metadata as a string.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
-
Error
ModelMetadata
(std::string *model_metadata, const std::string &model_name, const std::string &model_version = "", const Headers &headers = Headers(), const Parameters &query_params = Parameters())¶ Contact the inference server and get the metadata of specified model.
- Return
Error object indicating success or failure of the request.
- Parameters
model_metadata
: Returns JSON representation of model metadata as a string.model_name
: The name of the model to get metadata.model_version
: The version of the model to get metadata. The default value is an empty string which means then the server will choose a version based on the model and internal policy.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
-
Error
ModelConfig
(std::string *model_config, const std::string &model_name, const std::string &model_version = "", const Headers &headers = Headers(), const Parameters &query_params = Parameters())¶ Contact the inference server and get the configuration of specified model.
- Return
Error object indicating success or failure of the request.
- Parameters
model_config
: Returns JSON representation of model configuration as a string.model_name
: The name of the model to get configuration.model_version
: The version of the model to get configuration. The default value is an empty string which means then the server will choose a version based on the model and internal policy.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
-
Error
ModelRepositoryIndex
(std::string *repository_index, const Headers &headers = Headers(), const Parameters &query_params = Parameters())¶ Contact the inference server and get the index of model repository contents.
- Return
Error object indicating success or failure of the request.
- Parameters
repository_index
: Returns JSON representation of the repository index as a string.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
-
Error
LoadModel
(const std::string &model_name, const Headers &headers = Headers(), const Parameters &query_params = Parameters())¶ Request the inference server to load or reload specified model.
- Return
Error object indicating success or failure of the request.
- Parameters
model_name
: The name of the model to be loaded or reloaded.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
-
Error
UnloadModel
(const std::string &model_name, const Headers &headers = Headers(), const Parameters &query_params = Parameters())¶ Request the inference server to unload specified model.
- Return
Error object indicating success or failure of the request.
- Parameters
model_name
: The name of the model to be unloaded.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
-
Error
ModelInferenceStatistics
(std::string *infer_stat, const std::string &model_name = "", const std::string &model_version = "", const Headers &headers = Headers(), const Parameters &query_params = Parameters())¶ Contact the inference server and get the inference statistics for the specified model name and version.
- Return
Error object indicating success or failure of the request.
- Parameters
infer_stat
: Returns the JSON representation of the inference statistics as a string.model_name
: The name of the model to get inference statistics. The default value is an empty string which means statistics of all models will be returned in the response.model_version
: The version of the model to get inference statistics. The default value is an empty string which means then the server will choose a version based on the model and internal policy.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
Contact the inference server and get the status for requested system shared memory.
- Return
Error object indicating success or failure of the request.
- Parameters
status
: Returns the JSON representation of the system shared memory status as a string.region_name
: The name of the region to query status. The default value is an empty string, which means that the status of all active system shared memory will be returned.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
Request the server to register a system shared memory with the provided details.
- Return
Error object indicating success or failure of the request
- Parameters
name
: The name of the region to register.key
: The key of the underlying memory object that contains the system shared memory region.byte_size
: The size of the system shared memory region, in bytes.offset
: Offset, in bytes, within the underlying memory object to the start of the system shared memory region. The default value is zero.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
Request the server to unregister a system shared memory with the specified name.
- Return
Error object indicating success or failure of the request
- Parameters
name
: The name of the region to unregister. The default value is empty string which means all the system shared memory regions will be unregistered.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
Contact the inference server and get the status for requested CUDA shared memory.
- Return
Error object indicating success or failure of the request.
- Parameters
status
: Returns the JSON representation of the CUDA shared memory status as a string.region_name
: The name of the region to query status. The default value is an empty string, which means that the status of all active CUDA shared memory will be returned.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
Request the server to register a CUDA shared memory with the provided details.
- Return
Error object indicating success or failure of the request
- Parameters
name
: The name of the region to register.cuda_shm_handle
: The cudaIPC handle for the memory object.device_id
: The GPU device ID on which the cudaIPC handle was created.byte_size
: The size of the CUDA shared memory region, in bytes.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
Request the server to unregister a CUDA shared memory with the specified name.
- Return
Error object indicating success or failure of the request
- Parameters
name
: The name of the region to unregister. The default value is empty string which means all the CUDA shared memory regions will be unregistered.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
-
Error
Infer
(InferResult **result, const InferOptions &options, const std::vector<InferInput *> &inputs, const std::vector<const InferRequestedOutput *> &outputs = std::vector<const InferRequestedOutput *>(), const Headers &headers = Headers(), const Parameters &query_params = Parameters())¶ Run synchronous inference on server.
- Return
Error object indicating success or failure of the request.
- Parameters
result
: Returns the result of inference.options
: The options for inference request.inputs
: The vector of InferInput describing the model inputs.outputs
: The vector of InferRequestedOutput describing how the output must be returned. The server will return the result for only these requested outputs.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
-
Error
AsyncInfer
(OnCompleteFn callback, const InferOptions &options, const std::vector<InferInput *> &inputs, const std::vector<const InferRequestedOutput *> &outputs = std::vector<const InferRequestedOutput *>(), const Headers &headers = Headers(), const Parameters &query_params = Parameters())¶ Run asynchronous inference on server.
Once the request is completed, the InferResult pointer will be passed to the provided ‘callback’ function. Upon the invocation of callback function, the ownership of InferResult object is transfered to the function caller. It is then the caller’s choice on either retrieving the results inside the callback function or deferring it to a different thread so that the client is unblocked. In order to prevent memory leak, user must ensure this object gets deleted. Note: InferInput::AppendRaw() or InferInput::SetSharedMemory() calls do not copy the data buffers but hold the pointers to the data directly. It is advisable to not to disturb the buffer contents until the respective callback is invoked.
- Return
Error object indicating success or failure of the request.
- Parameters
callback
: The callback function to be invoked on request completion.options
: The options for inference request.inputs
: The vector of InferInput describing the model inputs.outputs
: The vector of InferRequestedOutput describing how the output must be returned. The server will return the result for only these requested outputs.headers
: Optional map specifying additional HTTP headers to include in request.query_params
: Optional map specifying parameters that must be included with URL query.
Public Static Functions
-
static Error
Create
(std::unique_ptr<InferenceServerHttpClient> *client, const std::string &server_url, bool verbose = false)¶ Create a client that can be used to communicate with the server.
- Return
Error object indicating success or failure.
- Parameters
client
: Returns a new InferenceServerHttpClient object.server_url
: The inference server name and port.verbose
: If true generate verbose output when contacting the inference server.
-