server_status.proto

message StatDuration

Statistic collecting a duration metric.

uint64 count

Cumulative number of times this metric occurred.

uint64 total_time_ns

Total collected duration of this metric in nanoseconds.

message StatusRequestStats

Statistics collected for Status requests.

StatDuration success

Total time required to handle successful Status requests, not including HTTP or gRPC endpoint termination time.

message ProfileRequestStats

Statistics collected for Profile requests.

StatDuration success

Total time required to handle successful Profile requests, not including HTTP or gRPC endpoint termination time.

message HealthRequestStats

Statistics collected for Health requests.

StatDuration success

Total time required to handle successful Health requests, not including HTTP or gRPC endpoint termination time.

message InferRequestStats

Statistics collected for Infer requests.

StatDuration success

Total time required to handle successful Infer requests, not including HTTP or gRPC endpoint termination time.

StatDuration failed

Total time required to handle failed Infer requests, not including HTTP or gRPC endpoint termination time.

StatDuration compute

Time required to run inferencing for an inference request; including time copying input tensors to GPU memory, time executing the model, and time copying output tensors from GPU memory.

StatDuration queue

Time an inference request waits in scheduling queue for an available model instance.

enum ModelReadyState

Readiness status for models.

enumerator ModelReadyState::MODEL_UNKNOWN = 0

The model is in an unknown state. The model is not available for inferencing.

enumerator ModelReadyState::MODEL_READY = 1

The model is ready and available for inferencing.

enumerator ModelReadyState::MODEL_UNAVAILABLE = 2

The model is unavailable, indicating that the model failed to load or has been implicitly or explicitly unloaded. The model is not available for inferencing.

enumerator ModelReadyState::MODEL_LOADING = 3

The model is being loaded by the inference server. The model is not available for inferencing.

enumerator ModelReadyState::MODEL_UNLOADING = 4

The model is being unloaded by the inference server. The model is not available for inferencing.

message ModelVersionStatus

Status for a version of a model.

ModelReadyState ready_statue

Current readiness state for the model.

map<uint32, InferRequestStats> infer_stats

Inference statistics for the model, as a map from batch size to the statistics. A batch size will not occur in the map unless there has been at least one inference request of that batch size.

uint64 model_execution_count

Cumulative number of model executions performed for the model. A single model execution performs inferencing for the entire request batch and can perform inferencing for multiple requests if dynamic batching is enabled.

uint64 model_inference_count

Cumulative number of model inferences performed for the model. Each inference in a batched request is counted as an individual inference.

message ModelStatus

Status for a model.

ModelConfig config

The configuration for the model.

map<int64, ModelVersionStatus> version_status

Duration statistics for each version of the model, as a map from version to the status. A version will not occur in the map unless there has been at least one inference request of that model version. A version of -1 indicates the status is for requests for which the version could not be determined.

enum ServerReadyState

Readiness status for the inference server.

enumerator ServerReadyState::SERVER_INVALID = 0

The server is in an invalid state and will likely not response correctly to any requests.

enumerator ServerReadyState::SERVER_INITIALIZING = 1

The server is initializing.

enumerator ServerReadyState::SERVER_READY = 2

The server is ready and accepting requests.

enumerator ServerReadyState::SERVER_EXITING = 3

The server is exiting and will not respond to requests.

enumerator ServerReadyState::SERVER_FAILED_TO_INITIALIZE = 10

The server did not initialize correctly. Most requests will fail.

message ServerStatus

Status for the inference server.

string id

The server’s ID.

string version

The server’s version.

ServerReadyState ready_state

Current readiness state for the server.

uint64 uptime_ns

Server uptime in nanoseconds.

map<string, ModelStatus> model_status

Status for each model, as a map from model name to the status.

StatusRequestStats status_stats

Statistics for Status requests.

ProfileRequestStats profile_stats

Statistics for Profile requests.

HealthRequestStats health_stats

Statistics for Health requests.