grpc_service.proto

service GRPCService

Inference Server GRPC endpoints.

rpc Status(StatusRequest) returns (StatusResponse)

Get status for entire inference server or for a specified model.

rpc Health(HealthRequest) returns (HealthResponse)

Check liveness and readiness of the inference server.

rpc Infer(InferRequest) returns (InferResponse)

Request inference using a specific model. [ To handle large input tensors likely need to set the maximum message size to that they can be transmitted in one pass.

rpc StreamInfer(stream InferRequest) returns (stream
InferResponse)

Request inferences using a specific model in a streaming manner. Individual inference requests sent through the same stream will be processed in order and be returned on completion

rpc ModelControl(ModelControlRequest) returns
(ModelControlResponse)

Request to load / unload a specified model.

rpc SharedMemoryControl(SharedMemoryControlRequest) returns
(SharedMemoryControlResponse)

Request to register / unregister a specified shared memory region.

rpc Status(RepositoryRequest) returns (RepositoryResponse)

Get status associated with the model repository.

message StatusRequest

Request message for Status gRPC endpoint.

string model_name

The specific model status to be returned. If empty return status for all models.

message StatusResponse

Response message for Status gRPC endpoint.

RequestStatus request_status

The status of the request, indicating success or failure.

ServerStatus server_status

The server and model status.

message HealthRequest

Request message for Health gRPC endpoint.

string mode

The requested health action: ‘live’ requests the liveness state of the inference server; ‘ready’ requests the readiness state of the inference server.

message HealthResponse

Response message for Health gRPC endpoint.

RequestStatus request_status

The status of the request, indicating success or failure.

bool health

The result of the request. True indicates the inference server is live/ready, false indicates the inference server is not live/ready.

message ModelControlRequest

Request message for ModelControl gRPC endpoint.

enum Type

Types of control operation

enumerator Type::UNLOAD = 0

To unload the specified model.

enumerator Type::LOAD = 1

To load the specified model. If the model has been loaded, it will be reloaded to fetch the latest change.

string model_name

The target model name.

Type type

The control type that is operated on the specified model.

message ModelControlResponse

Response message for ModelControl gRPC endpoint.

RequestStatus request_status

The status of the request, indicating success or failure.

message SharedMemoryControlRequest

Request message for managing registered shared memory regions in TRTIS.

message Register

Register a shared memory region.

string name

The name for this shared memory region.

message SystemSharedMemoryIdentifier

The identifier for this system shared memory region.

string shared_memory_key

The name of the shared memory region that holds the input data (or where the output data should be written).

uint64 offset

This is the offset of the shared memory block from the start of the shared memory region. start = offset, end = offset + byte_size;

message CUDASharedMemoryIdentifier

The identifier for this system shared memory region.

bytes raw_handle

The raw serialized cudaIPC handle.

int64 device_id

The GPU device ID on which the cudaIPC handle was created.

oneof shared_memory_types

Types of shared memory identifiers

SystemSharedMemoryIdentifier system_shared_memory

The identifier for this system shared memory region.

CUDASharedMemoryIdentifier cuda_shared_memory

The identifier for this CUDA shared memory region.

uint64 byte_size

Size of the shared memory block, in bytes.

message Unregister

Unregister a specified shared memory region.

string name

The name for this shared memory region to unregister.

message UnregisterAll

Unregister all shared memory regions.

message GetStatus

Get the status of all active shared memory regions.

oneof shared_memory_control

Types of control operations for shared memory

Register register

To register the specified shared memory region.

Unregister unregister

To unregister the specified shared memory region.

UnregisterAll unregister_all

To unregister all active shared memory regions.

Status status

Get the status of all active shared memory regions.

message SharedMemoryControlResponse

Response message for SharedMemoryControl gRPC endpoint.

message Status

Status of all active shared memory regions.

SharedMemoryRegion shared_memory_region

The list of active/registered shared memory regions.

RequestStatus request_status

The status of the request, indicating success or failure.

Status shared_memory_status

The status of all active shared memory regions.

message InferRequest

Request message for Infer gRPC endpoint.

string model_name

The name of the model to use for inferencing.

int64 version

The version of the model to use for inference. If -1 the latest/most-recent version of the model is used.

InferRequestHeader meta_data

Meta-data for the request: input tensors, output tensors, etc.

bytes raw_input(repeated)

The raw input tensor data in the order specified in ‘meta_data’.

message InferResponse

Response message for Infer gRPC endpoint.

RequestStatus request_status

The status of the request, indicating success or failure.

InferResponseHeader meta_data

The response meta-data for the output tensors.

bytes raw_output(repeated)

The raw output tensor data in the order specified in ‘meta_data’.

message RepositoryRequest

Request message for Repository gRPC endpoint.

oneof request_type

Types of the repository request

bool index

Request for the index of the model repository.

message RepositoryResponse

Response message for Repository gRPC endpoint.

RequestStatus request_status

The status of the request, indicating success or failure.

oneof response_type

Types of the repository reponse, which is one-to-one mapping to the repository request type.

bool index

The index of the model repository.