
service GRPCService

Inference Server GRPC endpoints.

rpc Status(StatusRequest) returns (StatusResponse)

Get status for entire inference server or for a specified model.

rpc Profile(ProfileRequest) returns (ProfileResponse)

Enable and disable low-level GPU profiling.

rpc Health(HealthRequest) returns (HealthResponse)

Check liveness and readiness of the inference server.

rpc Infer(InferRequest) returns (InferResponse)

Request inference using a specific model. [ To handle large input tensors likely need to set the maximum message size to that they can be transmitted in one pass.

rpc StreamInfer(stream InferRequest) returns (stream

Request inferences using a specific model in a streaming manner. Individual inference requests sent through the same stream will be processed in order and be returned on completion

message StatusRequest

Request message for Status gRPC endpoint.

string model_name

The specific model status to be returned. If empty return status for all models.

message StatusResponse

Response message for Status gRPC endpoint.

RequestStatus request_status

The status of the request, indicating success or failure.

ServerStatus server_status

The server and model status.

message ProfileRequest

Request message for Profile gRPC endpoint.

string cmd

The requested profiling action: ‘start’ requests that GPU profiling be enabled on all GPUs controlled by the inference server; ‘stop’ requests that GPU profiling be disabled on all GPUs controlled by the inference server.

message ProfileResponse

Response message for Profile gRPC endpoint.

RequestStatus request_status

The status of the request, indicating success or failure.

message HealthRequest

Request message for Health gRPC endpoint.

string mode

The requested health action: ‘live’ requests the liveness state of the inference server; ‘ready’ requests the readiness state of the inference server.

message HealthResponse

Response message for Health gRPC endpoint.

RequestStatus request_status

The status of the request, indicating success or failure.

bool health

The result of the request. True indicates the inference server is live/ready, false indicates the inference server is not live/ready.

message InferRequest

Request message for Infer gRPC endpoint.

string model_name

The name of the model to use for inferencing.

int64 version

The version of the model to use for inference. If -1 the latest/most-recent version of the model is used.

InferRequestHeader meta_data

Meta-data for the request profiling input tensors and requesting output tensors.

bytes raw_input(repeated)

The raw input tensor data in the order specified in ‘meta_data’.

message InferResponse

Response message for Infer gRPC endpoint.

RequestStatus request_status

The status of the request, indicating success or failure.

InferResponseHeader meta_data

The response meta-data for the output tensors.

bytes raw_output(repeated)

The raw output tensor data in the order specified in ‘meta_data’.