Class InferenceServerGrpcClient¶
Defined in File grpc_client.h
Inheritance Relationships¶
Base Type¶
public nvidia::inferenceserver::client::InferenceServerClient
(Class InferenceServerClient)
Class Documentation¶
-
class
InferenceServerGrpcClient
: public nvidia::inferenceserver::client::InferenceServerClient¶ An InferenceServerGrpcClient object is used to perform any kind of communication with the InferenceServer using gRPC protocol.
std::unique_ptr<InferenceServerGrpcClient> client; InferenceServerGrpcClient::Create(&client, "localhost:8001"); bool live; client->IsServerLive(&live); ... ...
Public Functions
-
~InferenceServerGrpcClient
()¶
-
Error
IsServerLive
(bool *live, const Headers &headers = Headers())¶ Contact the inference server and get its liveness.
- Return
Error object indicating success or failure of the request.
- Parameters
live
: Returns whether the server is live or not.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
-
Error
IsServerReady
(bool *ready, const Headers &headers = Headers())¶ Contact the inference server and get its readiness.
- Return
Error object indicating success or failure of the request.
- Parameters
ready
: Returns whether the server is ready or not.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
-
Error
IsModelReady
(bool *ready, const std::string &model_name, const std::string &model_version = "", const Headers &headers = Headers())¶ Contact the inference server and get the readiness of specified model.
- Return
Error object indicating success or failure of the request.
- Parameters
ready
: Returns whether the specified model is ready or not.model_name
: The name of the model to check for readiness.model_version
: The version of the model to check for readiness. The default value is an empty string which means then the server will choose a version based on the model and internal policy.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
-
Error
ServerMetadata
(ServerMetadataResponse *server_metadata, const Headers &headers = Headers())¶ Contact the inference server and get its metadata.
- Return
Error object indicating success or failure of the request.
- Parameters
server_metadata
: Returns the server metadata as SeverMetadataResponse message.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
-
Error
ModelMetadata
(ModelMetadataResponse *model_metadata, const std::string &model_name, const std::string &model_version = "", const Headers &headers = Headers())¶ Contact the inference server and get the metadata of specified model.
- Return
Error object indicating success or failure of the request.
- Parameters
model_metadata
: Returns model metadata as ModelMetadataResponse message.model_name
: The name of the model to get metadata.model_version
: The version of the model to get metadata. The default value is an empty string which means then the server will choose a version based on the model and internal policy.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
-
Error
ModelConfig
(ModelConfigResponse *model_config, const std::string &model_name, const std::string &model_version = "", const Headers &headers = Headers())¶ Contact the inference server and get the configuration of specified model.
- Return
Error object indicating success or failure of the request.
- Parameters
model_config
: Returns model config as ModelConfigResponse message.model_name
: The name of the model to get configuration.model_version
: The version of the model to get configuration. The default value is an empty string which means then the server will choose a version based on the model and internal policy.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
-
Error
ModelRepositoryIndex
(RepositoryIndexResponse *repository_index, const Headers &headers = Headers())¶ Contact the inference server and get the index of model repository contents.
- Return
Error object indicating success or failure of the request.
- Parameters
repository_index
: Returns the repository index as RepositoryIndexRequestResponseheaders
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
-
Error
LoadModel
(const std::string &model_name, const Headers &headers = Headers())¶ Request the inference server to load or reload specified model.
- Return
Error object indicating success or failure of the request.
- Parameters
model_name
: The name of the model to be loaded or reloaded.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
-
Error
UnloadModel
(const std::string &model_name, const Headers &headers = Headers())¶ Request the inference server to unload specified model.
- Return
Error object indicating success or failure of the request.
- Parameters
model_name
: The name of the model to be unloaded.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
-
Error
ModelInferenceStatistics
(ModelStatisticsResponse *infer_stat, const std::string &model_name = "", const std::string &model_version = "", const Headers &headers = Headers())¶ Contact the inference server and get the inference statistics for the specified model name and version.
- Return
Error object indicating success or failure of the request.
- Parameters
infer_stat
: The inference statistics of requested model name and version.model_name
: The name of the model to get inference statistics. The default value is an empty string which means statistics of all models will be returned in the response.model_version
: The version of the model to get inference statistics. The default value is an empty string which means then the server will choose a version based on the model and internal policy.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
Contact the inference server and get the status for requested system shared memory.
- Return
Error object indicating success or failure of the request.
- Parameters
status
: The system shared memory status as SystemSharedMemoryStatusResponseregion_name
: The name of the region to query status. The default value is an empty string, which means that the status of all active system shared memory will be returned.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
Request the server to register a system shared memory with the provided details.
- Return
Error object indicating success or failure of the request
- Parameters
name
: The name of the region to register.key
: The key of the underlying memory object that contains the system shared memory region.byte_size
: The size of the system shared memory region, in bytes.offset
: Offset, in bytes, within the underlying memory object to the start of the system shared memory region. The default value is zero.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
Request the server to unregister a system shared memory with the specified name.
- Return
Error object indicating success or failure of the request
- Parameters
name
: The name of the region to unregister. The default value is empty string which means all the system shared memory regions will be unregistered.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
Contact the inference server and get the status for requested CUDA shared memory.
- Return
Error object indicating success or failure of the request.
- Parameters
status
: The CUDA shared memory status as CudaSharedMemoryStatusResponseregion_name
: The name of the region to query status. The default value is an empty string, which means that the status of all active CUDA shared memory will be returned.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
Request the server to register a CUDA shared memory with the provided details.
- Return
Error object indicating success or failure of the request
- Parameters
name
: The name of the region to register.cuda_shm_handle
: The cudaIPC handle for the memory object.device_id
: The GPU device ID on which the cudaIPC handle was created.byte_size
: The size of the CUDA shared memory region, in bytes.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
Request the server to unregister a CUDA shared memory with the specified name.
- Return
Error object indicating success or failure of the request
- Parameters
name
: The name of the region to unregister. The default value is empty string which means all the CUDA shared memory regions will be unregistered.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
-
Error
Infer
(InferResult **result, const InferOptions &options, const std::vector<InferInput *> &inputs, const std::vector<const InferRequestedOutput *> &outputs = std::vector<const InferRequestedOutput *>(), const Headers &headers = Headers())¶ Run synchronous inference on server.
- Return
Error object indicating success or failure of the request.
- Parameters
result
: Returns the result of inference.options
: The options for inference request.inputs
: The vector of InferInput describing the model inputs.outputs
: Optional vector of InferRequestedOutput describing how the output must be returned. If not provided then all the outputs in the model config will be returned as default settings.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
-
Error
AsyncInfer
(OnCompleteFn callback, const InferOptions &options, const std::vector<InferInput *> &inputs, const std::vector<const InferRequestedOutput *> &outputs = std::vector<const InferRequestedOutput *>(), const Headers &headers = Headers())¶ Run asynchronous inference on server.
Once the request is completed, the InferResult pointer will be passed to the provided ‘callback’ function. Upon the invocation of callback function, the ownership of InferResult object is transfered to the function caller. It is then the caller’s choice on either retrieving the results inside the callback function or deferring it to a different thread so that the client is unblocked. In order to prevent memory leak, user must ensure this object gets deleted.
- Return
Error object indicating success or failure of the request.
- Parameters
callback
: The callback function to be invoked on request completion.options
: The options for inference request.inputs
: The vector of InferInput describing the model inputs.outputs
: Optional vector of InferRequestedOutput describing how the output must be returned. If not provided then all the outputs in the model config will be returned as default settings.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
-
Error
StartStream
(OnCompleteFn callback, bool enable_stats = true, const Headers &headers = Headers())¶ Starts a grpc bi-directional stream to send streaming inferences.
- Return
Error object indicating success or failure of the request.
- Parameters
callback
: The callback function to be invoked on receiving a response at the stream.enable_stats
: Indicates whether client library should record the the client-side statistics for inference requests on stream or not. The library does not support client side statistics for decoupled streaming. Set this option false when there is no 1:1 mapping between request and response on the stream.headers
: Optional map specifying additional HTTP headers to include in the metadata of gRPC request.
-
Error
StopStream
()¶ Stops an active grpc bi-directional stream, if one available.
- Return
Error object indicating success or failure of the request.
-
Error
AsyncStreamInfer
(const InferOptions &options, const std::vector<InferInput *> &inputs, const std::vector<const InferRequestedOutput *> &outputs = std::vector<const InferRequestedOutput *>())¶ Runs an asynchronous inference over gRPC bi-directional streaming API.
A stream must be established with a call to StartStream() before calling this function. All the results will be provided to the callback function provided when starting the stream.
- Return
Error object indicating success or failure of the request.
- Parameters
options
: The options for inference request.inputs
: The vector of InferInput describing the model inputs.outputs
: Optional vector of InferRequestedOutput describing how the output must be returned. If not provided then all the outputs in the model config will be returned as default settings.
Public Static Functions
-
static Error
Create
(std::unique_ptr<InferenceServerGrpcClient> *client, const std::string &server_url, bool verbose = false)¶ Create a client that can be used to communicate with the server.
- Return
Error object indicating success or failure.
- Parameters
client
: Returns a new InferenceServerGrpcClient object.server_url
: The inference server name and port.verbose
: If true generate verbose output when contacting the inference server.
-