
service GRPCService

Inference Server GRPC endpoints.

rpc Status(StatusRequest) returns (StatusResponse)

Get status for entire inference server or for a specified model.

rpc Profile(ProfileRequest) returns (ProfileResponse)

Enable and disable low-level GPU profiling.

rpc Health(HealthRequest) returns (HealthResponse)

Check liveness and readiness of the inference server.

rpc ModelControl(ModelControlRequest) returns

Request to load / unload a specified model.

rpc SharedMemoryControl(SharedMemoryControlRequest) returns

Request to register / unregister a specified shared memory region.

rpc Infer(InferRequest) returns (InferResponse)

Request inference using a specific model. [ To handle large input tensors likely need to set the maximum message size to that they can be transmitted in one pass.

rpc StreamInfer(stream InferRequest) returns (stream

Request inferences using a specific model in a streaming manner. Individual inference requests sent through the same stream will be processed in order and be returned on completion

message StatusRequest

Request message for Status gRPC endpoint.

string model_name

The specific model status to be returned. If empty return status for all models.

message StatusResponse

Response message for Status gRPC endpoint.

RequestStatus request_status

The status of the request, indicating success or failure.

ServerStatus server_status

The server and model status.

message ProfileRequest

Request message for Profile gRPC endpoint.

string cmd

The requested profiling action: ‘start’ requests that GPU profiling be enabled on all GPUs controlled by the inference server; ‘stop’ requests that GPU profiling be disabled on all GPUs controlled by the inference server.

message ProfileResponse

Response message for Profile gRPC endpoint.

RequestStatus request_status

The status of the request, indicating success or failure.

message HealthRequest

Request message for Health gRPC endpoint.

string mode

The requested health action: ‘live’ requests the liveness state of the inference server; ‘ready’ requests the readiness state of the inference server.

message HealthResponse

Response message for Health gRPC endpoint.

RequestStatus request_status

The status of the request, indicating success or failure.

bool health

The result of the request. True indicates the inference server is live/ready, false indicates the inference server is not live/ready.

message ModelControlRequest

Request message for ModelControl gRPC endpoint.

enum Type

Types of control operation

enumerator Type::UNLOAD = 0

To unload the specified model.

enumerator Type::LOAD = 1

To load the specified model. If the model has been loaded, it will be reloaded to fetch the latest change.

string model_name

The target model name.

Type type

The control type that is operated on the specified model.

message ModelControlResponse

Response message for ModelControl gRPC endpoint.

RequestStatus request_status

The status of the request, indicating success or failure.

message InferSharedMemoryRegion

The meta-data for the shared memory region from which to read the input data and/or write the output data.

string name

The name for this shared memory region.

string shm_key

The name of the shared memory region that holds the input data (or where the output data should be written).

uint64 offset

The offset from the start of the shared memory region. start = offset, end = offset + size;

uint64 byte_size

Size of the memory block, in bytes.

message SharedMemoryControlRequest

Request message for managing registered shared memory regions in TRTIS.

enum Type

Types of control operations for shared memory

enumerator Type::REGISTER = 0

To register the specified shared memory region.

enumerator Type::UNREGISTER = 1

To unregister the specified shared memory region.

enumerator Type::UNREGISTER_ALL = 2

To unregister all active shared memory regions.

Type type

The control type that states whether to register/unregister the specified shared memory region. Unregister all active shared memory regions or get the list of active shared memory regions

InferSharedMemoryRegion shared_memory_region

The shared memory region to register or unregister. All fields are needed to REGISTER the shared memory region. Only ‘name’ is needed to UNREGISTER the shared memory region. No fields are needed to UNREGISTER_ALL.

message SharedMemoryControlResponse

Response message for SharedMemoryControl gRPC endpoint.

RequestStatus request_status

The status of the request, indicating success or failure.

message InferRequest

Request message for Infer gRPC endpoint.

string model_name

The name of the model to use for inferencing.

int64 version

The version of the model to use for inference. If -1 the latest/most-recent version of the model is used.

InferRequestHeader meta_data

Meta-data for the request profiling input tensors and requesting output tensors.

bytes raw_input(repeated)

The raw input tensor data in the order specified in ‘meta_data’.

message InferResponse

Response message for Infer gRPC endpoint.

RequestStatus request_status

The status of the request, indicating success or failure.

InferResponseHeader meta_data

The response meta-data for the output tensors.

bytes raw_output(repeated)

The raw output tensor data in the order specified in ‘meta_data’.