Class InferenceServerHttpClient

Inheritance Relationships

Base Type

Class Documentation

class InferenceServerHttpClient : public nvidia::inferenceserver::client::InferenceServerClient

An InferenceServerHttpClient object is used to perform any kind of communication with the InferenceServer using HTTP protocol.

None of the methods of InferenceServerHttpClient are thread safe. The class is intended to be used by a single thread and simultaneously calling different methods with different threads is not supported and will cause undefined behavior.

std::unique_ptr<InferenceServerHttpClient> client;
InferenceServerHttpClient::Create(&client, "localhost:8000");
bool live;
client->IsServerLive(&live);
...
...

Public Functions

~InferenceServerHttpClient()
Error IsServerLive(bool *live, const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Contact the inference server and get its liveness.

Return

Error object indicating success or failure of the request.

Parameters
  • live: Returns whether the server is live or not.

  • headers: Optional map specifying additional HTTP headers to include in request.

Error IsServerReady(bool *ready, const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Contact the inference server and get its readiness.

Return

Error object indicating success or failure of the request.

Parameters
  • ready: Returns whether the server is ready or not.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error IsModelReady(bool *ready, const std::string &model_name, const std::string &model_version = "", const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Contact the inference server and get the readiness of specified model.

Return

Error object indicating success or failure of the request.

Parameters
  • ready: Returns whether the specified model is ready or not.

  • model_name: The name of the model to check for readiness.

  • model_version: The version of the model to check for readiness. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error ServerMetadata(std::string *server_metadata, const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Contact the inference server and get its metadata.

Return

Error object indicating success or failure of the request.

Parameters
  • server_metadata: Returns JSON representation of the metadata as a string.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error ModelMetadata(std::string *model_metadata, const std::string &model_name, const std::string &model_version = "", const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Contact the inference server and get the metadata of specified model.

Return

Error object indicating success or failure of the request.

Parameters
  • model_metadata: Returns JSON representation of model metadata as a string.

  • model_name: The name of the model to get metadata.

  • model_version: The version of the model to get metadata. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error ModelConfig(std::string *model_config, const std::string &model_name, const std::string &model_version = "", const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Contact the inference server and get the configuration of specified model.

Return

Error object indicating success or failure of the request.

Parameters
  • model_config: Returns JSON representation of model configuration as a string.

  • model_name: The name of the model to get configuration.

  • model_version: The version of the model to get configuration. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error ModelRepositoryIndex(std::string *repository_index, const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Contact the inference server and get the index of model repository contents.

Return

Error object indicating success or failure of the request.

Parameters
  • repository_index: Returns JSON representation of the repository index as a string.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error LoadModel(const std::string &model_name, const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Request the inference server to load or reload specified model.

Return

Error object indicating success or failure of the request.

Parameters
  • model_name: The name of the model to be loaded or reloaded.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error UnloadModel(const std::string &model_name, const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Request the inference server to unload specified model.

Return

Error object indicating success or failure of the request.

Parameters
  • model_name: The name of the model to be unloaded.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error ModelInferenceStatistics(std::string *infer_stat, const std::string &model_name = "", const std::string &model_version = "", const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Contact the inference server and get the inference statistics for the specified model name and version.

Return

Error object indicating success or failure of the request.

Parameters
  • infer_stat: Returns the JSON representation of the inference statistics as a string.

  • model_name: The name of the model to get inference statistics. The default value is an empty string which means statistics of all models will be returned in the response.

  • model_version: The version of the model to get inference statistics. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error SystemSharedMemoryStatus(std::string *status, const std::string &region_name = "", const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Contact the inference server and get the status for requested system shared memory.

Return

Error object indicating success or failure of the request.

Parameters
  • status: Returns the JSON representation of the system shared memory status as a string.

  • region_name: The name of the region to query status. The default value is an empty string, which means that the status of all active system shared memory will be returned.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error RegisterSystemSharedMemory(const std::string &name, const std::string &key, const size_t byte_size, const size_t offset = 0, const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Request the server to register a system shared memory with the provided details.

Return

Error object indicating success or failure of the request

Parameters
  • name: The name of the region to register.

  • key: The key of the underlying memory object that contains the system shared memory region.

  • byte_size: The size of the system shared memory region, in bytes.

  • offset: Offset, in bytes, within the underlying memory object to the start of the system shared memory region. The default value is zero.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error UnregisterSystemSharedMemory(const std::string &name = "", const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Request the server to unregister a system shared memory with the specified name.

Return

Error object indicating success or failure of the request

Parameters
  • name: The name of the region to unregister. The default value is empty string which means all the system shared memory regions will be unregistered.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error CudaSharedMemoryStatus(std::string *status, const std::string &region_name = "", const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Contact the inference server and get the status for requested CUDA shared memory.

Return

Error object indicating success or failure of the request.

Parameters
  • status: Returns the JSON representation of the CUDA shared memory status as a string.

  • region_name: The name of the region to query status. The default value is an empty string, which means that the status of all active CUDA shared memory will be returned.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error RegisterCudaSharedMemory(const std::string &name, const cudaIpcMemHandle_t &cuda_shm_handle, const size_t device_id, const size_t byte_size, const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Request the server to register a CUDA shared memory with the provided details.

Return

Error object indicating success or failure of the request

Parameters
  • name: The name of the region to register.

  • cuda_shm_handle: The cudaIPC handle for the memory object.

  • device_id: The GPU device ID on which the cudaIPC handle was created.

  • byte_size: The size of the CUDA shared memory region, in bytes.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error UnregisterCudaSharedMemory(const std::string &name = "", const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Request the server to unregister a CUDA shared memory with the specified name.

Return

Error object indicating success or failure of the request

Parameters
  • name: The name of the region to unregister. The default value is empty string which means all the CUDA shared memory regions will be unregistered.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error Infer(InferResult **result, const InferOptions &options, const std::vector<InferInput *> &inputs, const std::vector<const InferRequestedOutput *> &outputs = std::vector<const InferRequestedOutput *>(), const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Run synchronous inference on server.

Return

Error object indicating success or failure of the request.

Parameters
  • result: Returns the result of inference.

  • options: The options for inference request.

  • inputs: The vector of InferInput describing the model inputs.

  • outputs: The vector of InferRequestedOutput describing how the output must be returned. The server will return the result for only these requested outputs.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Error AsyncInfer(OnCompleteFn callback, const InferOptions &options, const std::vector<InferInput *> &inputs, const std::vector<const InferRequestedOutput *> &outputs = std::vector<const InferRequestedOutput *>(), const Headers &headers = Headers(), const Parameters &query_params = Parameters())

Run asynchronous inference on server.

Once the request is completed, the InferResult pointer will be passed to the provided ‘callback’ function. Upon the invocation of callback function, the ownership of InferResult object is transfered to the function caller. It is then the caller’s choice on either retrieving the results inside the callback function or deferring it to a different thread so that the client is unblocked. In order to prevent memory leak, user must ensure this object gets deleted. Note: InferInput::AppendRaw() or InferInput::SetSharedMemory() calls do not copy the data buffers but hold the pointers to the data directly. It is advisable to not to disturb the buffer contents until the respective callback is invoked.

Return

Error object indicating success or failure of the request.

Parameters
  • callback: The callback function to be invoked on request completion.

  • options: The options for inference request.

  • inputs: The vector of InferInput describing the model inputs.

  • outputs: The vector of InferRequestedOutput describing how the output must be returned. The server will return the result for only these requested outputs.

  • headers: Optional map specifying additional HTTP headers to include in request.

  • query_params: Optional map specifying parameters that must be included with URL query.

Public Static Functions

static Error Create(std::unique_ptr<InferenceServerHttpClient> *client, const std::string &server_url, bool verbose = false)

Create a client that can be used to communicate with the server.

Return

Error object indicating success or failure.

Parameters
  • client: Returns a new InferenceServerHttpClient object.

  • server_url: The inference server name and port.

  • verbose: If true generate verbose output when contacting the inference server.