server_status.proto¶
-
message
StatDuration
¶ Statistic collecting a duration metric.
-
uint64
count
¶ Cumulative number of times this metric occurred.
-
uint64
total_time_ns
¶ Total collected duration of this metric in nanoseconds.
-
uint64
-
message
StatusRequestStats
¶ Statistics collected for Status requests.
-
StatDuration
success
¶ Total time required to handle successful Status requests, not including HTTP or gRPC endpoint termination time.
-
StatDuration
-
message
HealthRequestStats
¶ Statistics collected for Health requests.
-
StatDuration
success
¶ Total time required to handle successful Health requests, not including HTTP or gRPC endpoint termination time.
-
StatDuration
-
message
ModelControlRequestStats
¶ Statistics collected for ModelControl requests.
-
StatDuration
success
¶ Total time required to handle successful ModelControl requests, not including HTTP or gRPC endpoint termination time.
-
StatDuration
Statistics collected for SharedMemoryControl requests.
Total time required to handle successful SharedMemoryControl requests, not including HTTP or gRPC endpoint termination time.
-
message
RepositoryRequestStats
¶ Statistics collected for Repository requests.
-
StatDuration
success
¶ Total time required to handle successful Repository requests, not including HTTP or gRPC endpoint termination time.
-
StatDuration
-
message
InferRequestStats
¶ Statistics collected for Infer requests.
-
StatDuration
success
¶ Total time required to handle successful Infer requests, not including HTTP or GRPC endpoint handling time.
-
StatDuration
failed
¶ Total time required to handle failed Infer requests, not including HTTP or GRPC endpoint handling time.
-
StatDuration
compute
¶ Time required to run inferencing for an inference request; including time copying input tensors to GPU memory, time executing the model, and time copying output tensors from GPU memory.
-
StatDuration
queue
¶ Time an inference request waits in scheduling queue for an available model instance.
-
StatDuration
-
enum
ModelReadyState
¶ Readiness status for models.
-
enumerator
ModelReadyState
::
MODEL_UNKNOWN
= 0¶ The model is in an unknown state. The model is not available for inferencing.
-
enumerator
ModelReadyState
::
MODEL_READY
= 1¶ The model is ready and available for inferencing.
-
enumerator
ModelReadyState
::
MODEL_UNAVAILABLE
= 2¶ The model is unavailable, indicating that the model failed to load or has been implicitly or explicitly unloaded. The model is not available for inferencing.
-
enumerator
ModelReadyState
::
MODEL_LOADING
= 3¶ The model is being loaded by the inference server. The model is not available for inferencing.
-
enumerator
ModelReadyState
::
MODEL_UNLOADING
= 4¶ The model is being unloaded by the inference server. The model is not available for inferencing.
-
enumerator
-
enum
ModelReadyStateReason
¶ Detail associated with a model’s readiness status.
-
string
message
¶ The message that explains the cause of being in the current readiness state.
-
string
-
message
ModelVersionStatus
¶ Status for a version of a model.
-
ModelReadyState
ready_state
¶ Current readiness state for the model.
-
ModelReadyStateReason
ready_state_reason
¶ Supplemental information regarding the current readiness state.
-
map<uint32, InferRequestStats>
infer_stats
¶ Inference statistics for the model, as a map from batch size to the statistics. A batch size will not occur in the map unless there has been at least one inference request of that batch size.
-
uint64
model_execution_count
¶ Cumulative number of model executions performed for the model. A single model execution performs inferencing for the entire request batch and can perform inferencing for multiple requests if dynamic batching is enabled.
-
uint64
model_inference_count
¶ Cumulative number of model inferences performed for the model. Each inference in a batched request is counted as an individual inference.
-
uint64
last_inference_timestamp_milliseconds
¶ The timestamp of the last inference request made for this model, given as milliseconds since the epoch.
-
ModelReadyState
-
message
ModelStatus
¶ Status for a model.
-
ModelConfig
config
¶ The configuration for the model.
-
map<int64, ModelVersionStatus>
version_status
¶ Duration statistics for each version of the model, as a map from version to the status. A version will not occur in the map unless there has been at least one inference request of that model version. A version of -1 indicates the status is for requests for which the version could not be determined.
-
ModelConfig
-
enum
ServerReadyState
¶ Readiness status for the inference server.
-
enumerator
ServerReadyState
::
SERVER_INVALID
= 0¶ The server is in an invalid state and will likely not response correctly to any requests.
-
enumerator
ServerReadyState
::
SERVER_INITIALIZING
= 1¶ The server is initializing.
-
enumerator
ServerReadyState
::
SERVER_READY
= 2¶ The server is ready and accepting requests.
-
enumerator
ServerReadyState
::
SERVER_EXITING
= 3¶ The server is exiting and will not respond to requests.
-
enumerator
ServerReadyState
::
SERVER_FAILED_TO_INITIALIZE
= 10¶ The server did not initialize correctly. Most requests will fail.
-
enumerator
The meta-data for the shared memory region registered in the inference server.
The name for this shared memory region.
The name of the shared memory region that holds the input data (or where the output data should be written).
This is the offset of the shared memory block from the start of the shared memory region. start = offset, end = offset + byte_size;
The GPU device ID on which the cudaIPC handle was created.
Types of shared memory identifiers
The status of this system shared memory region.
The status of this CUDA shared memory region.
Size of the shared memory block, in bytes.
-
message
ServerStatus
¶ Status for the inference server.
-
string
id
¶ The server’s ID.
-
string
version
¶ The server’s version.
-
ServerReadyState
ready_state
¶ Current readiness state for the server.
-
uint64
uptime_ns
¶ Server uptime in nanoseconds.
-
map<string, ModelStatus>
model_status
¶ Status for each model, as a map from model name to the status.
-
StatusRequestStats
status_stats
¶ Statistics for Status requests.
-
HealthRequestStats
health_stats
¶ Statistics for Health requests.
-
ModelControlRequestStats
model_control_stats
¶ Statistics for ModelControl requests.
-
SharedMemoryControlRequestStats
shm_control_stats
¶ Statistics for SharedMemoryControl requests.
-
RepositoryRequestStats
repository_stats
¶ Statistics for Repository requests.
-
string
Shared memory status for the inference server.
The list of active/registered shared memory regions.
-
message
ModelRepositoryIndex
¶ Index of the model repository monitored by the inference server.
-
ModelEntry
models
(repeated)¶ The list of models in the model repository.
-
ModelEntry