grpc_service.proto¶

service GRPCService¶

Inference Server GRPC endpoints.

rpc Status(StatusRequest) returns (StatusResponse): Get status for entire inference server or for a specified model.

rpc Profile(ProfileRequest) returns (ProfileResponse): Enable and disable low-level GPU profiling.

rpc Health(HealthRequest) returns (HealthResponse): Check liveness and readiness of the inference server.

rpc Infer(InferRequest) returns (InferResponse): Request inference using a specific model. [ To handle large input tensors likely need to set the maximum message size to that they can be transmitted in one pass.

message StatusRequest¶

Request message for Status gRPC endpoint.

string model_name¶: The specific model status to be returned. If empty return status for all models.

message StatusResponse¶

Response message for Status gRPC endpoint.

RequestStatus request_status¶: The status of the request, indicating success or failure.

message ProfileRequest¶

Request message for Profile gRPC endpoint.

string cmd¶: The requested profiling action: ‘start’ requests that GPU profiling be enabled on all GPUs controlled by the inference server; ‘stop’ requests that GPU profiling be disabled on all GPUs controlled by the inference server.

message ProfileResponse¶

Response message for Profile gRPC endpoint.

RequestStatus request_status¶: The status of the request, indicating success or failure.

message HealthRequest¶

Request message for Health gRPC endpoint.

string mode¶: The requested health action: ‘live’ requests the liveness state of the inference server; ‘ready’ requests the readiness state of the inference server.

message HealthResponse¶

Response message for Health gRPC endpoint.

RequestStatus request_status¶: The status of the request, indicating success or failure.

bool health¶: The result of the request. True indicates the inference server is live/ready, false indicates the inference server is not live/ready.

message InferRequest¶

Request message for Infer gRPC endpoint.

int64 version¶: The version of the model to use for inference. If -1 the latest/most-recent version of the model is used.

InferRequestHeader meta_data¶: Meta-data for the request profiling input tensors and requesting output tensors.

bytes raw_input(repeated)¶: The raw input tensor data in the order specified in ‘meta_data’.

message InferResponse¶

Response message for Infer gRPC endpoint.

RequestStatus request_status¶: The status of the request, indicating success or failure.

InferResponseHeader meta_data¶: The response meta-data for the output tensors.

bytes raw_output(repeated)¶: The raw output tensor data in the order specified in ‘meta_data’.