grpc_service.proto¶

service GRPCService¶

Inference Server GRPC endpoints.

rpc Status(StatusRequest) returns (StatusResponse): Get status for entire inference server or for a specified model.

rpc Health(HealthRequest) returns (HealthResponse): Check liveness and readiness of the inference server.

rpc Infer(InferRequest) returns (InferResponse): Request inference using a specific model. [ To handle large input tensors likely need to set the maximum message size to that they can be transmitted in one pass.

rpc StreamInfer(stream InferRequest) returns (stream
InferResponse): Request inferences using a specific model in a streaming manner. Individual inference requests sent through the same stream will be processed in order and be returned on completion

rpc ModelControl(ModelControlRequest) returns
(ModelControlResponse): Request to load / unload a specified model.

rpc SharedMemoryControl(SharedMemoryControlRequest) returns
(SharedMemoryControlResponse): Request to register / unregister a specified shared memory region.

message StatusRequest¶

Request message for Status gRPC endpoint.

string model_name¶: The specific model status to be returned. If empty return status for all models.

message StatusResponse¶

Response message for Status gRPC endpoint.

RequestStatus request_status¶: The status of the request, indicating success or failure.

ServerStatus server_status¶: The server and model status.

message HealthRequest¶

Request message for Health gRPC endpoint.

string mode¶: The requested health action: ‘live’ requests the liveness state of the inference server; ‘ready’ requests the readiness state of the inference server.

message HealthResponse¶

Response message for Health gRPC endpoint.

RequestStatus request_status¶: The status of the request, indicating success or failure.

bool health¶: The result of the request. True indicates the inference server is live/ready, false indicates the inference server is not live/ready.

message ModelControlRequest¶

Request message for ModelControl gRPC endpoint.

enum Type¶

Types of control operation

enumerator Type::UNLOAD = 0¶: To unload the specified model.

enumerator Type::LOAD = 1¶: To load the specified model. If the model has been loaded, it will be reloaded to fetch the latest change.

string model_name¶: The target model name.

Type type¶: The control type that is operated on the specified model.

message ModelControlResponse¶

Response message for ModelControl gRPC endpoint.

RequestStatus request_status¶: The status of the request, indicating success or failure.

message SharedMemoryControlRequest¶

Request message for managing registered shared memory regions in TRTIS.

message Register¶: Register a shared memory region.

string name¶: The name for this shared memory region.

message SystemSharedMemoryIdentifier¶: The identifier for this system shared memory region.

string shared_memory_key¶: The name of the shared memory region that holds the input data (or where the output data should be written).

message CUDASharedMemoryIdentifier¶: The identifier for this system shared memory region.

string shared_memory_key: The name of the system shared memory region that holds the cudaIPC handle.

uint64 offset¶: The offset of the cudaIPC handle from the start of the shared memory region. start = offset, end = offset + size;

uint64 byte_size¶: Size of the cudaIPC handle in the shared memory block, in bytes.

oneof shared_memory_types¶: Types of shared memory identifiers

SystemSharedMemoryIdentifier system_shared_memory¶: The identifier for this system shared memory region.

CUDASharedMemoryIdentifier cuda_shared_memory¶: The identifier for this CUDA shared memory region.

uint64 offset: The offset from the start of the shared memory region. start = offset, end = offset + size;

uint64 byte_size: Size of the memory block, in bytes.

message Unregister¶: Unregister a specified shared memory region.

string name: The name for this shared memory region to unregister.

message UnregisterAll¶: Unregister all shared memory regions.

message GetStatus¶: Get the status of all active shared memory regions.

oneof shared_memory_control¶

Types of control operations for shared memory

Register register: To register the specified shared memory region.

Unregister unregister¶: To unregister the specified shared memory region.

UnregisterAll unregister_all¶: To unregister all active shared memory regions.

Status status¶: Get the status of all active shared memory regions.

message SharedMemoryControlResponse¶: Response message for SharedMemoryControl gRPC endpoint.

message Status¶

Status of all active shared memory regions.

SharedMemoryRegion shared_memory_region¶: The list of active/registered shared memory regions.

RequestStatus request_status¶: The status of the request, indicating success or failure.

Status shared_memory_status¶: The status of all active shared memory regions.

message InferRequest¶

Request message for Infer gRPC endpoint.

string model_name¶: The name of the model to use for inferencing.

int64 version¶: The version of the model to use for inference. If -1 the latest/most-recent version of the model is used.

InferRequestHeader meta_data¶: Meta-data for the request: input tensors, output tensors, etc.

bytes raw_input(repeated)¶: The raw input tensor data in the order specified in ‘meta_data’.

message InferResponse¶

Response message for Infer gRPC endpoint.

RequestStatus request_status¶: The status of the request, indicating success or failure.

InferResponseHeader meta_data¶: The response meta-data for the output tensors.

bytes raw_output(repeated)¶: The raw output tensor data in the order specified in ‘meta_data’.