grpc_service.proto¶

service GRPCService¶

Inference Server GRPC endpoints.

rpc Status(StatusRequest) returns (StatusResponse): Get status for entire inference server or for a specified model.

rpc Profile(ProfileRequest) returns (ProfileResponse): Enable and disable low-level GPU profiling.

rpc Health(HealthRequest) returns (HealthResponse): Check liveness and readiness of the inference server.

rpc Infer(InferRequest) returns (InferResponse): Request inference using a specific model. [ To handle large input tensors likely need to set the maximum message size to that they can be transmitted in one pass.

rpc StreamInfer(stream InferRequest) returns (stream
InferResponse): Request inferences using a specific model in a streaming manner. Individual inference requests sent through the same stream will be processed in order and be returned on completion

message StatusRequest¶

Request message for Status gRPC endpoint.

string model_name¶: The specific model status to be returned. If empty return status for all models.

message StatusResponse¶

Response message for Status gRPC endpoint.

RequestStatus request_status¶: The status of the request, indicating success or failure.

message ProfileRequest¶

Request message for Profile gRPC endpoint.

string cmd¶: The requested profiling action: ‘start’ requests that GPU profiling be enabled on all GPUs controlled by the inference server; ‘stop’ requests that GPU profiling be disabled on all GPUs controlled by the inference server.

message ProfileResponse¶

Response message for Profile gRPC endpoint.

RequestStatus request_status¶: The status of the request, indicating success or failure.

message HealthRequest¶

Request message for Health gRPC endpoint.

string mode¶: The requested health action: ‘live’ requests the liveness state of the inference server; ‘ready’ requests the readiness state of the inference server.

message HealthResponse¶

Response message for Health gRPC endpoint.

RequestStatus request_status¶: The status of the request, indicating success or failure.

bool health¶: The result of the request. True indicates the inference server is live/ready, false indicates the inference server is not live/ready.

message InferRequest¶

Request message for Infer gRPC endpoint.

int64 version¶: The version of the model to use for inference. If -1 the latest/most-recent version of the model is used.

InferRequestHeader meta_data¶: Meta-data for the request profiling input tensors and requesting output tensors.

bytes raw_input(repeated)¶: The raw input tensor data in the order specified in ‘meta_data’.

message InferResponse¶

Response message for Infer gRPC endpoint.

RequestStatus request_status¶: The status of the request, indicating success or failure.

InferResponseHeader meta_data¶: The response meta-data for the output tensors.

bytes raw_output(repeated)¶: The raw output tensor data in the order specified in ‘meta_data’.