server_status.proto

message StatDuration

Statistic collecting a duration metric.

uint64 count

Cumulative number of times this metric occurred.

uint64 total_time_ns

Total collected duration of this metric in nanoseconds.

message StatusRequestStats

Statistics collected for Status requests.

StatDuration success

Total time required to handle successful Status requests, not including HTTP or gRPC endpoint termination time.

message HealthRequestStats

Statistics collected for Health requests.

StatDuration success

Total time required to handle successful Health requests, not including HTTP or gRPC endpoint termination time.

message ModelControlRequestStats

Statistics collected for ModelControl requests.

StatDuration success

Total time required to handle successful ModelControl requests, not including HTTP or gRPC endpoint termination time.

message SharedMemoryControlRequestStats

Statistics collected for SharedMemoryControl requests.

StatDuration success

Total time required to handle successful SharedMemoryControl requests, not including HTTP or gRPC endpoint termination time.

message RepositoryRequestStats

Statistics collected for Repository requests.

StatDuration success

Total time required to handle successful Repository requests, not including HTTP or gRPC endpoint termination time.

message InferRequestStats

Statistics collected for Infer requests.

StatDuration success

Total time required to handle successful Infer requests, not including HTTP or GRPC endpoint handling time.

StatDuration failed

Total time required to handle failed Infer requests, not including HTTP or GRPC endpoint handling time.

StatDuration compute

Time required to run inferencing for an inference request; including time copying input tensors to GPU memory, time executing the model, and time copying output tensors from GPU memory.

StatDuration queue

Time an inference request waits in scheduling queue for an available model instance.

enum ModelReadyState

Readiness status for models.

enumerator ModelReadyState::MODEL_UNKNOWN = 0

The model is in an unknown state. The model is not available for inferencing.

enumerator ModelReadyState::MODEL_READY = 1

The model is ready and available for inferencing.

enumerator ModelReadyState::MODEL_UNAVAILABLE = 2

The model is unavailable, indicating that the model failed to load or has been implicitly or explicitly unloaded. The model is not available for inferencing.

enumerator ModelReadyState::MODEL_LOADING = 3

The model is being loaded by the inference server. The model is not available for inferencing.

enumerator ModelReadyState::MODEL_UNLOADING = 4

The model is being unloaded by the inference server. The model is not available for inferencing.

enum ModelReadyStateReason

Detail associated with a model’s readiness status.

string message

The message that explains the cause of being in the current readiness state.

message ModelVersionStatus

Status for a version of a model.

ModelReadyState ready_state

Current readiness state for the model.

ModelReadyStateReason ready_state_reason

Supplemental information regarding the current readiness state.

map<uint32, InferRequestStats> infer_stats

Inference statistics for the model, as a map from batch size to the statistics. A batch size will not occur in the map unless there has been at least one inference request of that batch size.

uint64 model_execution_count

Cumulative number of model executions performed for the model. A single model execution performs inferencing for the entire request batch and can perform inferencing for multiple requests if dynamic batching is enabled.

uint64 model_inference_count

Cumulative number of model inferences performed for the model. Each inference in a batched request is counted as an individual inference.

uint64 last_inference_timestamp_milliseconds

The timestamp of the last inference request made for this model, given as milliseconds since the epoch.

message ModelStatus

Status for a model.

ModelConfig config

The configuration for the model.

map<int64, ModelVersionStatus> version_status

Duration statistics for each version of the model, as a map from version to the status. A version will not occur in the map unless there has been at least one inference request of that model version. A version of -1 indicates the status is for requests for which the version could not be determined.

enum ServerReadyState

Readiness status for the inference server.

enumerator ServerReadyState::SERVER_INVALID = 0

The server is in an invalid state and will likely not response correctly to any requests.

enumerator ServerReadyState::SERVER_INITIALIZING = 1

The server is initializing.

enumerator ServerReadyState::SERVER_READY = 2

The server is ready and accepting requests.

enumerator ServerReadyState::SERVER_EXITING = 3

The server is exiting and will not respond to requests.

enumerator ServerReadyState::SERVER_FAILED_TO_INITIALIZE = 10

The server did not initialize correctly. Most requests will fail.

message SharedMemoryRegion

The meta-data for the shared memory region registered in the inference server.

string name

The name for this shared memory region.

string shared_memory_key

The name of the shared memory region that holds the input data (or where the output data should be written).

uint64 offset

This is the offset of the shared memory block from the start of the shared memory region. start = offset, end = offset + byte_size;

int64 device_id

The GPU device ID on which the cudaIPC handle was created.

oneof shared_memory_types

Types of shared memory identifiers

SystemSharedMemory system_shared_memory

The status of this system shared memory region.

CudaSharedMemory cuda_shared_memory

The status of this CUDA shared memory region.

uint64 byte_size

Size of the shared memory block, in bytes.

message ServerStatus

Status for the inference server.

string id

The server’s ID.

string version

The server’s version.

ServerReadyState ready_state

Current readiness state for the server.

uint64 uptime_ns

Server uptime in nanoseconds.

map<string, ModelStatus> model_status

Status for each model, as a map from model name to the status.

StatusRequestStats status_stats

Statistics for Status requests.

HealthRequestStats health_stats

Statistics for Health requests.

ModelControlRequestStats model_control_stats

Statistics for ModelControl requests.

SharedMemoryControlRequestStats shm_control_stats

Statistics for SharedMemoryControl requests.

RepositoryRequestStats repository_stats

Statistics for Repository requests.

message SharedMemoryStatus

Shared memory status for the inference server.

SharedMemoryRegion shared_memory_region(repeated)

The list of active/registered shared memory regions.

message ModelRepositoryIndex

Index of the model repository monitored by the inference server.

message ModelEntry

The basic information for a model.

string name

The model’s name.

ModelEntry models(repeated)

The list of models in the model repository.