server_status.proto¶

message StatDuration¶

Statistic collecting a duration metric.

uint64 count¶: Cumulative number of times this metric occurred.

uint64 total_time_ns¶: Total collected duration of this metric in nanoseconds.

message StatusRequestStats¶

Statistics collected for Status requests.

StatDuration success¶: Total time required to handle successful Status requests, not including HTTP or gRPC endpoint termination time.

message HealthRequestStats¶

Statistics collected for Health requests.

StatDuration success¶: Total time required to handle successful Health requests, not including HTTP or gRPC endpoint termination time.

message ModelControlRequestStats¶

Statistics collected for ModelControl requests.

StatDuration success¶: Total time required to handle successful ModelControl requests, not including HTTP or gRPC endpoint termination time.

message SharedMemoryControlRequestStats¶

Statistics collected for SharedMemoryControl requests.

StatDuration success¶: Total time required to handle successful SharedMemoryControl requests, not including HTTP or gRPC endpoint termination time.

message RepositoryRequestStats¶

Statistics collected for Repository requests.

StatDuration success¶: Total time required to handle successful Repository requests, not including HTTP or gRPC endpoint termination time.

message InferRequestStats¶

Statistics collected for Infer requests.

StatDuration success¶: Total time required to handle successful Infer requests, not including HTTP or GRPC endpoint handling time.

StatDuration failed¶: Total time required to handle failed Infer requests, not including HTTP or GRPC endpoint handling time.

StatDuration compute¶: Time required to run inferencing for an inference request; including time copying input tensors to GPU memory, time executing the model, and time copying output tensors from GPU memory.

StatDuration queue¶: Time an inference request waits in scheduling queue for an available model instance.

enum ModelReadyState¶

Readiness status for models.

enumerator ModelReadyState::MODEL_UNKNOWN = 0¶: The model is in an unknown state. The model is not available for inferencing.

enumerator ModelReadyState::MODEL_READY = 1¶: The model is ready and available for inferencing.

enumerator ModelReadyState::MODEL_UNAVAILABLE = 2¶: The model is unavailable, indicating that the model failed to load or has been implicitly or explicitly unloaded. The model is not available for inferencing.

enumerator ModelReadyState::MODEL_LOADING = 3¶: The model is being loaded by the inference server. The model is not available for inferencing.

enumerator ModelReadyState::MODEL_UNLOADING = 4¶: The model is being unloaded by the inference server. The model is not available for inferencing.

enum ModelReadyStateReason¶

Detail associated with a model’s readiness status.

string message¶: The message that explains the cause of being in the current readiness state.

message ModelVersionStatus¶

Status for a version of a model.

ModelReadyState ready_state¶: Current readiness state for the model.

ModelReadyStateReason ready_state_reason¶: Supplemental information regarding the current readiness state.

map<uint32, InferRequestStats> infer_stats¶: Inference statistics for the model, as a map from batch size to the statistics. A batch size will not occur in the map unless there has been at least one inference request of that batch size.

uint64 model_execution_count¶: Cumulative number of model executions performed for the model. A single model execution performs inferencing for the entire request batch and can perform inferencing for multiple requests if dynamic batching is enabled.

uint64 model_inference_count¶: Cumulative number of model inferences performed for the model. Each inference in a batched request is counted as an individual inference.

uint64 last_inference_timestamp_milliseconds¶: The timestamp of the last inference request made for this model, given as milliseconds since the epoch.

message ModelStatus¶

Status for a model.

ModelConfig config¶: The configuration for the model.

map<int64, ModelVersionStatus> version_status¶: Duration statistics for each version of the model, as a map from version to the status. A version will not occur in the map unless there has been at least one inference request of that model version. A version of -1 indicates the status is for requests for which the version could not be determined.

enum ServerReadyState¶

Readiness status for the inference server.

enumerator ServerReadyState::SERVER_INVALID = 0¶: The server is in an invalid state and will likely not response correctly to any requests.

enumerator ServerReadyState::SERVER_INITIALIZING = 1¶: The server is initializing.

enumerator ServerReadyState::SERVER_READY = 2¶: The server is ready and accepting requests.

enumerator ServerReadyState::SERVER_EXITING = 3¶: The server is exiting and will not respond to requests.

enumerator ServerReadyState::SERVER_FAILED_TO_INITIALIZE = 10¶: The server did not initialize correctly. Most requests will fail.

message SharedMemoryRegion¶

The meta-data for the shared memory region registered in the inference server.

string name¶: The name for this shared memory region.

string shared_memory_key¶: The name of the shared memory region that holds the input data (or where the output data should be written).

uint64 offset¶: This is the offset of the shared memory block from the start of the shared memory region. start = offset, end = offset + byte_size;

int64 device_id¶: The GPU device ID on which the cudaIPC handle was created.

oneof shared_memory_types¶: Types of shared memory identifiers

SystemSharedMemory system_shared_memory¶: The status of this system shared memory region.

CudaSharedMemory cuda_shared_memory¶: The status of this CUDA shared memory region.

uint64 byte_size¶: Size of the shared memory block, in bytes.

message ServerStatus¶

Status for the inference server.

string id¶: The server’s ID.

string version¶: The server’s version.

ServerReadyState ready_state¶: Current readiness state for the server.

uint64 uptime_ns¶: Server uptime in nanoseconds.

map<string, ModelStatus> model_status¶: Status for each model, as a map from model name to the status.

StatusRequestStats status_stats¶: Statistics for Status requests.

HealthRequestStats health_stats¶: Statistics for Health requests.

ModelControlRequestStats model_control_stats¶: Statistics for ModelControl requests.

SharedMemoryControlRequestStats shm_control_stats¶: Statistics for SharedMemoryControl requests.

RepositoryRequestStats repository_stats¶: Statistics for Repository requests.

message SharedMemoryStatus¶

Shared memory status for the inference server.

SharedMemoryRegion shared_memory_region(repeated)¶: The list of active/registered shared memory regions.

message ModelRepositoryIndex¶

Index of the model repository monitored by the inference server.

message ModelEntry¶

The basic information for a model.

string name¶: The model’s name.

ModelEntry models(repeated)¶: The list of models in the model repository.