Class InferenceServerGrpcClient

Inheritance Relationships

Base Type

Class Documentation

class InferenceServerGrpcClient : public nvidia::inferenceserver::client::InferenceServerClient

An InferenceServerGrpcClient object is used to perform any kind of communication with the InferenceServer using gRPC protocol.

std::unique_ptr<InferenceServerGrpcClient> client;
InferenceServerGrpcClient::Create(&client, "localhost:8001");
bool live;

Public Functions

Error IsServerLive(bool *live, const Headers &headers = Headers())

Contact the inference server and get its liveness.


Error object indicating success or failure of the request.

  • live: Returns whether the server is live or not.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error IsServerReady(bool *ready, const Headers &headers = Headers())

Contact the inference server and get its readiness.


Error object indicating success or failure of the request.

  • ready: Returns whether the server is ready or not.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error IsModelReady(bool *ready, const std::string &model_name, const std::string &model_version = "", const Headers &headers = Headers())

Contact the inference server and get the readiness of specified model.


Error object indicating success or failure of the request.

  • ready: Returns whether the specified model is ready or not.

  • model_name: The name of the model to check for readiness.

  • model_version: The version of the model to check for readiness. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error ServerMetadata(ServerMetadataResponse *server_metadata, const Headers &headers = Headers())

Contact the inference server and get its metadata.


Error object indicating success or failure of the request.

  • server_metadata: Returns the server metadata as SeverMetadataResponse message.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error ModelMetadata(ModelMetadataResponse *model_metadata, const std::string &model_name, const std::string &model_version = "", const Headers &headers = Headers())

Contact the inference server and get the metadata of specified model.


Error object indicating success or failure of the request.

  • model_metadata: Returns model metadata as ModelMetadataResponse message.

  • model_name: The name of the model to get metadata.

  • model_version: The version of the model to get metadata. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error ModelConfig(ModelConfigResponse *model_config, const std::string &model_name, const std::string &model_version = "", const Headers &headers = Headers())

Contact the inference server and get the configuration of specified model.


Error object indicating success or failure of the request.

  • model_config: Returns model config as ModelConfigResponse message.

  • model_name: The name of the model to get configuration.

  • model_version: The version of the model to get configuration. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error ModelRepositoryIndex(RepositoryIndexResponse *repository_index, const Headers &headers = Headers())

Contact the inference server and get the index of model repository contents.


Error object indicating success or failure of the request.

  • repository_index: Returns the repository index as RepositoryIndexRequestResponse

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error LoadModel(const std::string &model_name, const Headers &headers = Headers())

Request the inference server to load or reload specified model.


Error object indicating success or failure of the request.

  • model_name: The name of the model to be loaded or reloaded.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error UnloadModel(const std::string &model_name, const Headers &headers = Headers())

Request the inference server to unload specified model.


Error object indicating success or failure of the request.

  • model_name: The name of the model to be unloaded.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error ModelInferenceStatistics(ModelStatisticsResponse *infer_stat, const std::string &model_name = "", const std::string &model_version = "", const Headers &headers = Headers())

Contact the inference server and get the inference statistics for the specified model name and version.


Error object indicating success or failure of the request.

  • infer_stat: The inference statistics of requested model name and version.

  • model_name: The name of the model to get inference statistics. The default value is an empty string which means statistics of all models will be returned in the response.

  • model_version: The version of the model to get inference statistics. The default value is an empty string which means then the server will choose a version based on the model and internal policy.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error SystemSharedMemoryStatus(SystemSharedMemoryStatusResponse *status, const std::string &region_name = "", const Headers &headers = Headers())

Contact the inference server and get the status for requested system shared memory.


Error object indicating success or failure of the request.

  • status: The system shared memory status as SystemSharedMemoryStatusResponse

  • region_name: The name of the region to query status. The default value is an empty string, which means that the status of all active system shared memory will be returned.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error RegisterSystemSharedMemory(const std::string &name, const std::string &key, const size_t byte_size, const size_t offset = 0, const Headers &headers = Headers())

Request the server to register a system shared memory with the provided details.


Error object indicating success or failure of the request

  • name: The name of the region to register.

  • key: The key of the underlying memory object that contains the system shared memory region.

  • byte_size: The size of the system shared memory region, in bytes.

  • offset: Offset, in bytes, within the underlying memory object to the start of the system shared memory region. The default value is zero.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error UnregisterSystemSharedMemory(const std::string &name = "", const Headers &headers = Headers())

Request the server to unregister a system shared memory with the specified name.


Error object indicating success or failure of the request

  • name: The name of the region to unregister. The default value is empty string which means all the system shared memory regions will be unregistered.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error CudaSharedMemoryStatus(CudaSharedMemoryStatusResponse *status, const std::string &region_name = "", const Headers &headers = Headers())

Contact the inference server and get the status for requested CUDA shared memory.


Error object indicating success or failure of the request.

  • status: The CUDA shared memory status as CudaSharedMemoryStatusResponse

  • region_name: The name of the region to query status. The default value is an empty string, which means that the status of all active CUDA shared memory will be returned.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error RegisterCudaSharedMemory(const std::string &name, const cudaIpcMemHandle_t &cuda_shm_handle, const size_t device_id, const size_t byte_size, const Headers &headers = Headers())

Request the server to register a CUDA shared memory with the provided details.


Error object indicating success or failure of the request

  • name: The name of the region to register.

  • cuda_shm_handle: The cudaIPC handle for the memory object.

  • device_id: The GPU device ID on which the cudaIPC handle was created.

  • byte_size: The size of the CUDA shared memory region, in bytes.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error UnregisterCudaSharedMemory(const std::string &name = "", const Headers &headers = Headers())

Request the server to unregister a CUDA shared memory with the specified name.


Error object indicating success or failure of the request

  • name: The name of the region to unregister. The default value is empty string which means all the CUDA shared memory regions will be unregistered.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error Infer(InferResult **result, const InferOptions &options, const std::vector<InferInput *> &inputs, const std::vector<const InferRequestedOutput *> &outputs = std::vector<const InferRequestedOutput *>(), const Headers &headers = Headers())

Run synchronous inference on server.


Error object indicating success or failure of the request.

  • result: Returns the result of inference.

  • options: The options for inference request.

  • inputs: The vector of InferInput describing the model inputs.

  • outputs: Optional vector of InferRequestedOutput describing how the output must be returned. If not provided then all the outputs in the model config will be returned as default settings.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error AsyncInfer(OnCompleteFn callback, const InferOptions &options, const std::vector<InferInput *> &inputs, const std::vector<const InferRequestedOutput *> &outputs = std::vector<const InferRequestedOutput *>(), const Headers &headers = Headers())

Run asynchronous inference on server.

Once the request is completed, the InferResult pointer will be passed to the provided ‘callback’ function. Upon the invocation of callback function, the ownership of InferResult object is transfered to the function caller. It is then the caller’s choice on either retrieving the results inside the callback function or deferring it to a different thread so that the client is unblocked. In order to prevent memory leak, user must ensure this object gets deleted.


Error object indicating success or failure of the request.

  • callback: The callback function to be invoked on request completion.

  • options: The options for inference request.

  • inputs: The vector of InferInput describing the model inputs.

  • outputs: Optional vector of InferRequestedOutput describing how the output must be returned. If not provided then all the outputs in the model config will be returned as default settings.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error StartStream(OnCompleteFn callback, bool enable_stats = true, const Headers &headers = Headers())

Starts a grpc bi-directional stream to send streaming inferences.


Error object indicating success or failure of the request.

  • callback: The callback function to be invoked on receiving a response at the stream.

  • enable_stats: Indicates whether client library should record the the client-side statistics for inference requests on stream or not. The library does not support client side statistics for decoupled streaming. Set this option false when there is no 1:1 mapping between request and response on the stream.

  • headers: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.

Error StopStream()

Stops an active grpc bi-directional stream, if one available.


Error object indicating success or failure of the request.

Error AsyncStreamInfer(const InferOptions &options, const std::vector<InferInput *> &inputs, const std::vector<const InferRequestedOutput *> &outputs = std::vector<const InferRequestedOutput *>())

Runs an asynchronous inference over gRPC bi-directional streaming API.

A stream must be established with a call to StartStream() before calling this function. All the results will be provided to the callback function provided when starting the stream.


Error object indicating success or failure of the request.

  • options: The options for inference request.

  • inputs: The vector of InferInput describing the model inputs.

  • outputs: Optional vector of InferRequestedOutput describing how the output must be returned. If not provided then all the outputs in the model config will be returned as default settings.

Public Static Functions

static Error Create(std::unique_ptr<InferenceServerGrpcClient> *client, const std::string &server_url, bool verbose = false)

Create a client that can be used to communicate with the server.


Error object indicating success or failure.

  • client: Returns a new InferenceServerGrpcClient object.

  • server_url: The inference server name and port.

  • verbose: If true generate verbose output when contacting the inference server.