grpc_service.proto¶
-
service
GRPCService
¶ - Inference Server GRPC endpoints.
-
rpc Status(StatusRequest) returns (StatusResponse)
Get status for entire inference server or for a specified model.
-
rpc Profile(ProfileRequest) returns (ProfileResponse)
Enable and disable low-level GPU profiling.
-
rpc Health(HealthRequest) returns (HealthResponse)
Check liveness and readiness of the inference server.
-
rpc Infer(InferRequest) returns (InferResponse)
Request inference using a specific model. [ To handle large input tensors likely need to set the maximum message size to that they can be transmitted in one pass.
-
-
message
StatusRequest
¶ - Request message for Status gRPC endpoint.
-
string
model_name
¶ The specific model status to be returned. If empty return status for all models.
-
string
-
message
StatusResponse
¶ - Response message for Status gRPC endpoint.
-
RequestStatus
request_status
¶ The status of the request, indicating success or failure.
-
ServerStatus
server_status
¶ The server and model status.
-
RequestStatus
-
message
ProfileRequest
¶ - Request message for Profile gRPC endpoint.
-
string
cmd
¶ The requested profiling action: ‘start’ requests that GPU profiling be enabled on all GPUs controlled by the inference server; ‘stop’ requests that GPU profiling be disabled on all GPUs controlled by the inference server.
-
string
-
message
ProfileResponse
¶ - Response message for Profile gRPC endpoint.
-
RequestStatus
request_status
¶ The status of the request, indicating success or failure.
-
RequestStatus
-
message
HealthRequest
¶ - Request message for Health gRPC endpoint.
-
string
mode
¶ The requested health action: ‘live’ requests the liveness state of the inference server; ‘ready’ requests the readiness state of the inference server.
-
string
-
message
HealthResponse
¶ - Response message for Health gRPC endpoint.
-
RequestStatus
request_status
¶ The status of the request, indicating success or failure.
-
bool
health
¶ The result of the request. True indicates the inference server is live/ready, false indicates the inference server is not live/ready.
-
RequestStatus
-
message
InferRequest
¶ - Request message for Infer gRPC endpoint.
-
string
model_name
¶ The name of the model to use for inferencing.
-
int32
version
¶ The version of the model to use for inference. If -1 the latest/most-recent version of the model is used.
-
InferRequestHeader
meta_data
¶ Meta-data for the request profiling input tensors and requesting output tensors.
-
bytes
raw_input
(repeated)¶ The raw input tensor data in the order specified in ‘meta_data’.
-
string
-
message
InferResponse
¶ - Response message for Infer gRPC endpoint.
-
RequestStatus
request_status
¶ The status of the request, indicating success or failure.
-
InferResponseHeader
meta_data
¶ The response meta-data for the output tensors.
-
bytes
raw_output
(repeated)¶ The raw output tensor data in the order specified in ‘meta_data’.
-
RequestStatus