grpc_service.proto¶
-
service
GRPCService
¶ Inference Server GRPC endpoints.
-
rpc Status(StatusRequest) returns (StatusResponse)
Get status for entire inference server or for a specified model.
-
rpc Health(HealthRequest) returns (HealthResponse)
Check liveness and readiness of the inference server.
-
rpc Infer(InferRequest) returns (InferResponse)
Request inference using a specific model. [ To handle large input tensors likely need to set the maximum message size to that they can be transmitted in one pass.
-
rpc StreamInfer(stream InferRequest) returns (stream
-
InferResponse)
Request inferences using a specific model in a streaming manner. Individual inference requests sent through the same stream will be processed in order and be returned on completion
-
rpc ModelControl(ModelControlRequest) returns
-
(ModelControlResponse)
Request to load / unload a specified model.
-
rpc SharedMemoryControl(SharedMemoryControlRequest) returns
-
(SharedMemoryControlResponse)
Request to register / unregister a specified shared memory region.
-
-
message
StatusRequest
¶ Request message for Status gRPC endpoint.
-
string
model_name
¶ The specific model status to be returned. If empty return status for all models.
-
string
-
message
StatusResponse
¶ Response message for Status gRPC endpoint.
-
RequestStatus
request_status
¶ The status of the request, indicating success or failure.
-
ServerStatus
server_status
¶ The server and model status.
-
RequestStatus
-
message
HealthRequest
¶ Request message for Health gRPC endpoint.
-
string
mode
¶ The requested health action: ‘live’ requests the liveness state of the inference server; ‘ready’ requests the readiness state of the inference server.
-
string
-
message
HealthResponse
¶ Response message for Health gRPC endpoint.
-
RequestStatus
request_status
¶ The status of the request, indicating success or failure.
-
bool
health
¶ The result of the request. True indicates the inference server is live/ready, false indicates the inference server is not live/ready.
-
RequestStatus
-
message
ModelControlRequest
¶ Request message for ModelControl gRPC endpoint.
-
enum
Type
¶ Types of control operation
-
enumerator
Type
::
UNLOAD
= 0¶ To unload the specified model.
-
enumerator
Type
::
LOAD
= 1¶ To load the specified model. If the model has been loaded, it will be reloaded to fetch the latest change.
-
enumerator
-
string
model_name
¶ The target model name.
-
enum
-
message
ModelControlResponse
¶ Response message for ModelControl gRPC endpoint.
-
RequestStatus
request_status
¶ The status of the request, indicating success or failure.
-
RequestStatus
Request message for managing registered shared memory regions in TRTIS.
Register a shared memory region.
The name for this shared memory region.
The identifier for this system shared memory region.
The name of the shared memory region that holds the input data (or where the output data should be written).
The identifier for this system shared memory region.
The name of the system shared memory region that holds the cudaIPC handle.
The offset of the cudaIPC handle from the start of the shared memory region. start = offset, end = offset + size;
Size of the cudaIPC handle in the shared memory block, in bytes.
Types of shared memory identifiers
The identifier for this system shared memory region.
The identifier for this CUDA shared memory region.
The offset from the start of the shared memory region. start = offset, end = offset + size;
Size of the memory block, in bytes.
Unregister a specified shared memory region.
The name for this shared memory region to unregister.
Unregister all shared memory regions.
Get the status of all active shared memory regions.
Types of control operations for shared memory
-
Register register
To register the specified shared memory region.
To unregister the specified shared memory region.
To unregister all active shared memory regions.
Get the status of all active shared memory regions.
-
Response message for SharedMemoryControl gRPC endpoint.
-
message
Status
¶ Status of all active shared memory regions.
The list of active/registered shared memory regions.
-
RequestStatus
request_status
¶ The status of the request, indicating success or failure.
The status of all active shared memory regions.
-
message
InferRequest
¶ Request message for Infer gRPC endpoint.
-
string
model_name
¶ The name of the model to use for inferencing.
-
int64
version
¶ The version of the model to use for inference. If -1 the latest/most-recent version of the model is used.
-
InferRequestHeader
meta_data
¶ Meta-data for the request: input tensors, output tensors, etc.
-
bytes
raw_input
(repeated)¶ The raw input tensor data in the order specified in ‘meta_data’.
-
string
-
message
InferResponse
¶ Response message for Infer gRPC endpoint.
-
RequestStatus
request_status
¶ The status of the request, indicating success or failure.
-
InferResponseHeader
meta_data
¶ The response meta-data for the output tensors.
-
bytes
raw_output
(repeated)¶ The raw output tensor data in the order specified in ‘meta_data’.
-
RequestStatus